In my small home-office, I have hard drives, flash drives, and solid-state drives, which use FAT32, NTFS, exFAT, Btrfs, Ext3, and Ext4 file systems, and are connected to the computers with CIFS, NFS, HTTPS, ssh, and ftp over the Internet and Gigabit Ethernet with a variety of authentication systems based on Lightweight Directory Access Protocol (LDAP) and Active Directory (AD). And, this, mind you, is a simple, small-business network.
Is it any wonder then that companies, far, far larger then my little operation, want to abstract their storage concerns away with software-defined storage (SDS)? I think not!
Just like server virtualization has revolutionized the data center and made the cloud possible, so SDS hopes to transform how we administer storage.
What is SDS?
As the example of my own business showed, most corporate storage today is made up of multiple sites using a patchwork of storage platforms. Making matters ever more difficult to manage, these storage platforms use a wide variety of file formats and communicate with a mix of networking protocols. Adding insult to injury, there are no common application programming interfaces (APIs), never mind management and monitoring tools, for storage across vendors.
To manage storage silos efficiently, you can either look forward to paying an ever-growing number of storage administrators to execute repetitive, manual jobs or you can look to storage virtualization.
With software-defined storage, the name of the game is both to virtualize storage and to separate the control of storage from the messy details of how data is actually stored. These two things do not go together in all SDS solutions, so when considering an SDS approach be certain that what you're getting will do the job you want.
When it works right, software-defined storage will make it possible to place your files or data objects where it makes the most sense. It will also enable you to use storage devices that match your resources' capacity, performance, latency, reliability and costs to your workloads.
So, for example, when low-cost storage elements will do the job, you can move your data to them based on your workload and service-level requirements without having to worry about the hardware specifics. Conversely, if what you need is speed, speed, and more speed, never mind the expense, you can easily set your applications to use the fastest possible storage resources. With the proper SDS, users can move files and data to meet their needs via 'auto-tiering' without needing to call on an administrator.
SDS can also be used to pool multiple storage devices that work like one monstrous storage device. For instance, Amazon Simple Storage Service (S3) stores data in multiple facilities and on multiple devices within each facility. What storage technologies are being used exactly? You don't know, and Amazon doesn't think you need to know. All you need to know is that Amazon S3's service-level agreement (SLA) guarantees a 99.9 percent monthly uptime for your data.
If you don't need quite so much data insurance, Amazon also offers Reduced Redundancy Storage (RRS) and your data won't be replicated as many times. And, if you don't need real-time access to your data, there's always Amazon Glacier.
None of Amazon's S3 options, once set up, require an operator or administrator for users to store, edit, and access data. SDS is meant to enable you to do the same kind of tasks on your data centers and private cloud storage equipment.
SDS can also be used to minimize physical storage waste. For example, if your storage looks anything like mine, you probably have files duplicated hither and yon. SDS has the promise of being able to rationalize your data stores so you're not wasting terabytes on pointless copies and hopelessly archaic file versions.
You can also use SDS to transparently share storage resources between client operating systems without any fuss. With SDS, storage can look like whatever a PC or smart device operating system expects without users needing to worry about the fine details.
In short, SDS attempts to make storage allocation, provisioning, and use as easy as it is to spin up new servers whenever you want them today on Amazon Web Services (AWS), Microsoft Azure, or Google Compute Engine. Just as the cloud cut down your total cost of ownership (TCO) for servers, so SDS promises to slice down your storage bills.
The intersection of open source and SDS
It's still early days for SDS so, as you might suppose, everyone is talking about their own take on the technology. There are SDS appliances, hardware/software crossover efforts such as those from EMC, programs that run on top of open-source software like IBM's Elastic Storage, and VMware's approach, which cross-link virtual machines and SDS And, as is always the case, there are companies rebranding their same old vertical, proprietary programs as SDS.
What open-source software brings to the table is the virtues it's used to take over so many other server and data-center markets. First, as Sage Weil, the founder and chief architect of Ceph, a major open-source SDS, said at the OpenStack Paris conference in November 2014, "Software Defined Storage means different things to different people, largely depending on what they're building or selling. The common element is providing storage services that are independent of the hardware. Open-source SDS gives them with hardware vendor independence. Deploying proprietary software locks you into a single vendor." In short, open-source gives companies freedom of choice. This, in turn, leads to lower costs.
In addition, because open-source enables anyone to improve the code, thus the overall quality of the programs has improved with the growth of users and developers. The end result is more powerful and flexible programs.
Specifically, there are three main open-source SDS contenders. On the cloud side, there's OpenStack with Cinder and Swift. Apart from OpenStack's approaches, there are Ceph and GlusterFS. Both of these put a software abstraction layer over commercial off-the-shelf (COTS) storage devices. None of these programs are mutually exclusive.
Starting with OpenStack, Cinder is a block storage API that you can use with Linux's logical volume manager (LVM) to present storage resources to end users, or to provide block storage access to the OpenStack Nova Compute Project. It's meant to virtualize pools of block storage devices without requiring any knowledge of where or how the storage is deployed.
Swift is Cinder's equivalent on the object/blob storage side. In Swift, data is stored as binary objects rather than Cinder's familiar files.
Ceph is a distributed object and block storage file system. Thus, as you might guess, Ceph can be used to manage both Cinder and Swift. And, indeed, that's just what Red Hat, which bought Inktank, Ceph's parent company, is doing. By using Ceph via the Ceph block device for Cinder and the Ceph Object Gateway for Swift, Red Hat presents a single open-source approach to OpenStack's storage options.
GlusterFS, which is also owned by Red Hat, is most often thought of these days as a partner with big data's Hadoop. It's more than that,though: it's an open source, distributed file system. It can be used to aggregate storage resources into a a single global namespace.
By this point, you may have realized that all of these SDS programs are at a pretty low level. You'd be right in thinking that. These are building blocks better suited for developers rather than users or system administrators.
In addition, open-source SDS systems have started being offered in 2014. Like, their proprietary cousins, they are still relatively wet behind the ears.
For example, there are programs from companies such as SwiftStack, Nexenta, and Mirantis, which offer products that bridge the gap between developers and operations. Some, like SwiftStack, focus on one element of a SDS solution while Nexenta offers SDS-specific programs. Still other vendors, including Mirantis, with Mirantis OpenStack, are offering soup-to-nuts cloud and storage stacks.
At this point, the most mature of the everything-and-the-kitchen-sink programs, in my opinion, is Red Hat Storage Server 3. With it, it uses GlusterFS to control everything from your local COTS storage servers to cloud storage on OpenStack and Amazon Web Services (AWS).
Red Hat, with its major OpenStack role and ownership of Ceph and GlusterFS, is clearly in the leader's spot, but it's not the only open-source company that's a contender. SUSE has its Ceph-based SUSE Storage and Canonical is integrating Ceph into its Ubuntu Linux.
At this point, it's really too early to tell which technology or vendor will end up winning in the SDS wars. Red Hat, however, clearly has a strong lead.
I expect 2015 to be a year with strong competition from both the open-source and proprietary vendors for the enterprise's SDS dollar. IDC expects SDS to be the fastest growing storage segment in the coming year and I see no reason to doubt this prediction. This is being driven by the incredible growth of data. By IDC's count cloud, analytics, mobile, social and security data collection is growing at a rate of approximately 2.5 billion GBs of data per day.
Storage, in short, is in the middle of a revolution and its name is software-defined storage. While many vendors, such as IBM, Datacore, and Falconstor offer SDS stacks for their own hardware, companies are spending less on high-end storage. That means companies must be turning more to cheaper COTS storage, which will benefit the open-source approaches.
Back in 2011, the creator of the first popular web browser, Marc Andreessen, said software was eating the IT world. He was right, and in 2015 the bite is coming to storage.