special report Storage hardware can't keep indefinitely storing more bits in the same amount of space. When will we run out of disk space, and what will we do when it happens?
In a feature a couple months ago, we discussed how ever-increasing volumes of data are making storage harder to manage, and the various components of a long-term vision that will eventually see storage managed as a service. However, the growth of storage needs has a much more practical and immediate effect on the plumbing layer than it does on the management layer.
The short-term solution to increasing storage capacity is to add more storage boxes. However, with storage volumes doubling every year or two, this solution is hardly going to be popular in space-starved server rooms or with customers of hosting providers who charge by the rack unit. Nor will the accounts department be happy if the cost of buying storage doubles every year. Consequently, vendors of disk, tape, and optical storage devices have roughly kept pace with the growth of storage by increasing storage density -- the amount of stuff that can be stored in the same amount of space -- at approximately the same rate. The per-megabyte price of storage has also been keeping its end of the bargain, halving in about the same time as storage needs and storage capacities double. But there are questions whether this exponential increase in density can continue for much longer before it reaches some fundamental limits. So what are these limits and what technologies will replace tape, disk, and optical when they run out of steam?
Running out of space
"There are some interesting technologies out into the future, but really they're just iterations of technologies we've already got today," says Ian Selway, product manager for network storage solutions at HP. "Until someone makes either atomic or holographic storage a solution, it's just doing more in smaller form factors than we're doing today."
"I've been consistently stunned how much they can get into a standard magnetic disk. There's been talk about moving to other forms for years, but there doesn't seem to be a breakthrough on the horizon, and there doesn't seem to be a problem with increasing the densities [of magnetic disk] for the next five or six years," says Kevin McIsaac, research director at industry analyst the META Group. "And who knows how they'll figure out how to leverage magnetic storage going further. Even tape still has a pretty important useful life because it's very, very low cost."
Even though the storage industry has so far always found ways of increasing the density of data stored on magnetic disks and tapes, it will eventually hit a wall. "One of these days we're going to reach a limit called the superparamagnetic effect, and that's a problem caused by having the bits so close together that I can no longer differentiate between one bit and the adjacent ones," explains Rob Nieboer, storage strategist at StorageTek.
This doesn't mean there's an ominous superparamagnetic effect of Damocles hanging over the storage industry. The semiconductor industry, for example, has overcome some of its "fundamental" limits in recent years and Moore's Law -- the doubling of density every 18 months or so -- has continued unabated. Nieboer remains confident that the storage industry will find ways to sidestep the superparamagnetic effect when it becomes a problem.
"Something on the not-too-distant horizon is called perpendicular recording, where I actually have the bits stand on end perpendicular to the surface of the disk, and I can get a higher areal density that way and pretty good raw data rates. That's probably two to four years out at least," says Nieboer. "That may also lend itself to a thing called 3D recording where you go multiple layers of vertical bits, and you may or may not even have to spin the disk."
Nieboer is more sceptical about the practical value of holographic recording, where bits are recorded in a three-dimensional space. "Holographic storage has been 18 months away for at least the last 10 years that I've been aware of. In fact, it's been in development for more than 20 years and it's never yet become viable," he says.
A more contentious idea is the use of solid-state memory as a substitute for disk. With no moving parts, solid state is much faster and much more reliable than disk storage, but it remains significantly more expensive than disk.
"Everything I've read and understood about solid state memories, they will never have the recording capacities at a sensible cost for the amount of information people are talking about," says Mark Heers, marketing manager at storage vendor EMC. "I think it will always be a niche solution."
Moore's Law can be relied upon to reduce the cost of memory over time, but at the same time, the amount of data people need to store is accelerating at approximately the same rate. This means solid state storage will always remain as expensive -- compared with the amount of it needed -- as it is today. "It's going to scale down in dollars per gigabyte, but people will always be storing more and more," explains Heers.
On the other hand, Dilip Kandlur, head of IBM's storage systems research division, sees one niche application of solid state being quite popular: as an intermediary between disk systems and main memory. "As processing speeds increase, we are coming to a point where the access times to memory are increasing relative to the processor cycles in the CPU. This is a phenomenon called the memory wall, that memory appears to be further and further away. While you have very large main memories in computing systems, you would probably benefit from having non-volatile solid-state memory placing an additional level in the hierarchy, to be able to give you better access and performance," Kandlur says. The lack of moving parts allows truly random access to data, which could provide significant performance advantages for transactional systems, he adds.
So long, tape?
For about as long as pundits have been predicting that holographic storage is just around the corner, they have also been predicting the demise of tape in the day-to-day running of a datacentre. Greg Bowden, national business manager at systems integrator Dimension Data, says there is definitely a convergence between disk and tape that may eventually lead to less emphasis on tape use. "There's a lot of solutions out there that look almost identical to the ones 10 years ago. Software's changed, the tape drives have increased performance and capacity, but the premise of it hasn't," he says. But new technologies coming into play now will "reinvigorate how customers protect their data, from snapshot technology to the mixing of tape and disk to bringing archiving solutions into place," he explains.
On the other hand, Nieboer believes tape faces a much smaller challenge than disk in coping with burgeoning storage requirements. "In disk I've got a relatively small surface area that I've got to put an increasing number of bits on. In tape I have a piece of half-inch media that's hundreds or thousands of feet long," he says. By 2005, Nieboer predicts tapes will have half a terabyte of uncompressed capacity, meaning at least a terabyte of storage per tape with compression. "A couple of years later you'll get a terabyte uncompressed that goes to two terabytes with compression. Tape has a long life ahead of it," he assures us.
Nieboer is more sceptical about the future of optical storage in the datacentre, despite technologies such as Blu-Ray and ultra dense optical increasing the storage capacity of optical storage. "Optical disk is really not participating in the datacentre anymore. The old optical [WORM disks and the like] is dead as a doornail in the datacentre, and the new optical [CDs and DVDs] isn't there yet in terms of datacentre readiness," he explains. "To me it's more a distribution model of data rather than a storage device."
See ya, SCSI?
A big move on the horizon is away from expensive SCSI disks towards less expensive alternatives. "I think the big move over the next few years will be from people using SANs based on SCSI or Fibre Channel attached disks to ATA, serial ATA, and serial SCSI," says McIsaac. This is good news for the bean counters, since Serial ATA can give around 75 percent of the performance of SCSI disks at around half the price, McIsaac says. "There will be a more rapid decline in the cost of storage driven by the move towards ATA and serial ATA over the next 12 to 18 months," he predicts.
One company that is very pleased with this trend is 3ware, which makes RAID controllers for ATA and serial ATA drives, and was recently acquired by semiconductor company AMCC. "We created a switched architecture for storage and each ATA drive gets its own dedicated port on our switched fabric," says Peter Herz, 3ware's CEO. "It solves the reliability problem because any drive can die, including in a way that destroys the interface, but it will only take the one port down, not the entire array. From a performance perspective it dedicates full bandwidth to each drive."
3ware initially aimed its products at the server market, expecting the industry would jump at the chance to ditch the overpriced SCSI disks. "We were dead wrong. The profits made on SCSI are so obscene that vendors are not really interested in fixing that problem," says Herz.
Conspiracy theories aside, SCSI disks fulfill one very important function in the datacentre environment: fast data access speeds. "The most important thing in the transaction processing environment is you want to do lots and lots of small transactions, so you want the fastest possible rotation rate disk drives [to get to the data as quickly as possible]," says Herz.
3ware's ATA-based systems have been a lot more successful in the emerging area of streaming data, such a video surveillance systems, video-on-demand systems for hotels, and scientific research applications with extremely large datasets. "In streaming applications, the rotational rate of the drive actually doesn't matter, you want a very high data rate, and you want to be able to aggregate that into as much bandwidth as possible," explains Herz. Because of the scale of storage required by streaming applications, "the systems -- if they were based on SCSI or fibre channel -- would be unaffordable."
Just as important as the drives themselves are the ways the drives are connected to each other. In the previous feature, we discussed a concept from IBM's research called Ice Cube, where storage "bricks" containing disk drives, controllers, and associated software can be stacked in three dimensions like Lego blocks. This is an extreme example of a trend towards greater redundancy and modularity in storage systems, aiming for a "scale out" approach of combining many small units, as opposed to a "scale up" approach of buying bigger and bigger boxes.
"We've been working on a variety of technologies trying to address the idea of scale out for storage: how you do build large storage systems as a collection of smaller components, focusing on having a level of manageability and reliability that meets or exceeds that of larger storage systems," says Kandlur. "This gives you the option of growing your systems more gracefully and matching the requirements of different applications in a much more scalable manner." But as with the difference between SCSI and ATA disks, Kandlur thinks this approach may not be best suited to high-performance transactional systems. "We see this is being appropriate for several classes of applications; perhaps not for transaction processing which will probably still go with the scale-up systems," he explains.
Currently storage can be connected to the systems it serves either directly using SCSI or ATA, over the network using Ethernet (network attached storage or NAS), or in a storage area network (SAN) using fibre channel. However, these distinctions are beginning to blur, with direct attached being the first to go. "Over the next few years you will effectively see the complete migration to storage networks away from the direct attached model in most datacentres," says Kandlur. The difference between NAS and SAN will also become less of an issue. "The argument over NAS vs SAN is moving away from a major decision to a provisioning issue," says McIsaac.
The emerging of the IP-based storage protocol iSCSI -- which allows storage data to be transported over IP networks, even across the Internet -- is considered by analysts unlikely to become a serious competitor to Fibre Channel, but will find niche applications such as transporting data between campuses linked in a WAN. "Meanwhile Fibre Channel becomes the dominant backbone," says McIsaac.
To cope with increasing data volumes, connectivity will have to expand greatly over the next few years. "You will also see the introduction of 10Gbps links, probably going to 40Gbps within 10 years. This will match the requirements coming from the applications," says Kandlur.
Blocks or objects?
Traditionally hard drives have stored information -- and storage interconnects have transmitted it -- in same-sized blocks of data. Another likely trend is a move away from "block storage, which has been dominated by the SCSI interface, to something that is a higher level of abstraction, what we would call object-based access," says Kandlur. An object is a variable-length piece of data stored in a block device such as a database record or an e-mail message, he explains. "The storage system would have better ability to provide higher-level operations on these objects."
The problem with block-based storage is that objects consist of a large number of blocks, and without any level of abstraction, keeping track of where all those blocks are can take up significant resources on the host system trying to access that data. "This can be a limitation to scalability and providing a finer grain of access control to these objects," says Kandlur. "If you go to an object-based model, the storage system can provide you with a finer level of security and access control -- it can be at the level of an object or collection of objects." This object-based model is a precursor to the virtualisation and information lifecycle management concepts we discussed previously.
Keeping the disks spinning
In the previous feature, we discussed how storage management will eventually evolve from a separate discipline, as it is today, into just another part of the enterprise's IT management system. "Why have storage as a separate discipline, why not just have it managed in the same way you manage your network infrastructure and your server infrastructure?" says Selway. McIsaac believes part of this evolution will be the management of storage plumbing -- fibre channel, iSCSI, etc -- becoming part of the networking manager's role, rather than the storage manager's. "By about 2008 or 2009, the networking guys will probably manage the storage networks as well," he says. "It may even be that you still have separate networks but they may be all in one device. Switches will handle both IP traffic and fibre channel at the same time, but they'd be one big set of switches."
The switches themselves are likely to have increasing levels of intelligence built in. "The major switch players -- Brocade, Cisco, McData -- have either developed or acquired intelligent switches," says McIsaac. "Rather than being the old-fashioned low-level, low-layer switches that just pass packets through and route them, these new ones are more like the intelligent routers that you have in IP. They can break open the Fibre Channel or iSCSI packets and do some interesting inspection of the data, and then might do routing in the switch based on the data itself." Currently even the lowest level operations such as replication and backup must go through a convoluted path from one storage device, through the switch to the server running the storage management software, then back through the switch to another storage device. The advantage of putting more intelligence into the switch is "some of the things that are done in the disk array or in the virtualisation software can eventually moved into the network. Some of the simpler stuff like replication actually gets moved into the switch without having to bother the host," explains McIsaac.
More of the same
The future of storage hardware at first glance seems to be going in all sorts of directions, but the underlying trend is the usual one of bigger, better, faster, cheaper. Although there are fundamental difficulties to overcome, improvements to existing technologies are likely to fill the gaps for the foreseeable future. Eventually, either holographic or atomic storage, or some new technology will take over. For the time being, though, disk, tape, and optical are where it's at.
This article was first published in Technology & Business magazine.
Click here for subscription information.