special report Can storage management in the future be as easy as setting a few policies and flicking a switch? We look at the steps needed to get there.
Did you know...
Between five and 20 percent of backup jobs fail every night.
Between 60 and 70 percent of storage management effort is devoted to backup and restore.
Recovering a single file from tape takes at least one hour and up to 24 hours.
The majority of restore requests are for individual files or table spaces.
More than 80 percent of restore requests are made within 48 hours of the data loss.
Less than three percent of data loss is due to a disaster event.
Backup and restore costs just under US$6000 per TB of storage per year.
Source: META Group research
Your storage is a mess. You don't know how much of it there is or where it is, but you do know everything's running out of space and it's costing too much. You spend hours a day running mirrors,
snapshots, and backups, and you're never quite sure what would happen if you needed to restore something. You've started putting storage on the network, but each network attached storage (NAS) or storage area network (SAN) has a new management interface someone in the IT department has to learn how to use. And now the CEO is breathing down your neck because he or she just read in an airline magazine that your company needs to keep every single spam, personal conversation, porn, and other bit of junk e-mail received in the last seven years.
Vendors tell you that within six or seven years your storage will run so smoothly it will be like plugging into a power point. It will manage itself automatically, apportion storage to applications, make room when space is running out, decide on its own what needs to be backed up and to where, and if space starts to run low, you'll just plug in a few more disks. And you'd love to believe it, but... The problems with storage today
Despite the cost per GB or TB coming down as technology improves, the amount of data organisations need to store is going up faster than the cost is coming down. Even worse, our ability to manage data is not growing fast at all. If TCO studies are to be believed, and hardware is a small fraction of the overall cost of any IT system, then the increasing amount of data and the complexity of the systems they're stored in adds up to a whopping great bill for staff and other resources managing storage.
Another problem is although individual hard drives are getting bigger, they're not getting any less likely to kick the bucket at the least opportune moment. "As you store more and more data on a single device, the disk drive's mean time to failure has not been increasing at the same rate as its capacity," says Dilip Kandlur, head of IBM's storage systems research division.
As a result, a simple RAID scheme is no longer sufficient protection for your most critical data. "There is an increasing need to provide additional levels of protection... to go towards replication and other methods of protection to make sure your data is available in the context of multiple failures," says Kandlur. Of course, someone has to manage all these additional layers of protection, and someone has to decide which servers or applications are worth spending this additional money and management effort to keep running. And that person is currently very, very busy.
Where we want to go: storage as a service
In an ideal world, everything would be the opposite. We'd like storage to be cheaper to buy and to
run. In fact, we'd rather the storage ran itself once we gave it some guidelines about what to store
and how much money and effort to spend storing it.
"Where storage really ends up is a service that gets provisioned and managed by policy and people
just start plugging into it like you do the network," says Kevin McIsaac, research director at
industry analyst the META Group. "It's probably going to take until about 2010 to be at that
level-just a utility in the datacentre."
Vendors may argue the point endlessly, but when it comes down to the basics, storage has become a
commodity. But commoditisation is only a step in the direction of becoming a service. "The difference
between a commodity and a service is you no longer care about how a service is provisioned," says
McIsaac. "You're more interested in the service quality than the infrastructure mechanisms."
"People don't understand how much storage they've got, where it is, how well it's being utilised, who's using it, is it being backed up, did the backups work, and can I recover?" Rob Nieboer, Storage Strategist, StorageTek
But how do you get from 2004 to 2010 and from storage as an enormous pain in the neck to storage as a service?
Taking an inventory
The first stage in the journey is understanding what storage you have now and what you're likely to need in the future.
"[People] don't understand how much storage they've got, where it is, how well it's being utilised, who's using it, is it being backed up, did the backups work, and can I recover?" says Rob Nieboer, storage strategist at reseller StorageTek. "Let's start with understanding where you are today, and what the value of different data are to the business."
"Customers who are doing the most work on forecasting their capacity are getting the best deals out of the vendors," adds Greg Bowden, national business manager at systems integrator Dimension Data. If the only capacity planning you do is on a spreadsheet provided by a salesperson from a storage vendor, the vendor will be glad to help you overestimate what you need. But finding out what you have can be a complicated process. "Often our storage assets are in different locations or connected to different hosts and when we're running our businesses day to day, it's not always the thing that is top of mind to know what we've got on our assets," says Bowden.
Virtualisation and process
However, once you know what you've got, the aim is to forget about it. You may have so many terabytes in a SAN, hundreds of gigabytes in NAS boxes, or gigabytes lying fallow in various servers, but keeping track of where the free space is and manually shuffling data from one location to another does not fit most admins' definition of fun.
"One of the keys to building a service is you need to hide the details of the implementation through virtualisation... so you can actually swap around the underlying pieces of storage to suit your needs," says McIsaac. Adding new storage to the mix is another hassle that can be reduced by virtualisation; adding more disks or a new storage device simply enlarges the pool from which storage is allocated. "And you need to drive some process into the storage administrators so they do things in a repeatable way, because the only way to provide a service is a bunch of standard repeatable processes that sit over the technology."
The lessons learned from enterprise resource planning and systems management software apply equally to storage: there are tools available to automate these processes, thereby improving quality and reducing costs, but the key is understanding and defining those processes beforehand. "If you haven't figured out all the processes, there's no point in the automation," says McIsaac.
Virtualisation on its own does not greatly reduce the workload of managing all the storage. "The amount of investment that goes into managing systems has been growing as a component of the total expenses on computing," says IBM's Kandlur. "What we're trying to do is build towards a more autonomic storage system... There's a lot of work to be done on interoperability and integrated management to get to the point where you could easily add storage from an application's perspective."
Standards: reducing complexity
If storage is to be virtualised and managed by policy, storage devices from many different vendors will need to talk to each other and to management software. "That's a big step from where we were-every time you bought a different brand, you had to learn a different management product," says Mark Heers, chairman of the storage networking industry association (SNIA) Australia and New Zealand. [Heers also works as marketing manager at storage vendor EMC, and is quoted later in this article as a representative of EMC.]
To ease the pain of managing storage, the SNIA has developed the storage management initiative specification (SMI-S), a standard aimed at allowing storage devices and management software to interoperate. "Standards are a by-product of things becoming common or popular to do. SAN standards have come out now because SANs are almost de rigueur for most medium and large customers now," says Heers.
Vendors are already beginning to release products compliant with the current SMI-S, but the capabilities are limited. "You can allocate storage to a specific server, report on that storage to find out how much is being used. I wouldn't describe it as having the depth that more sophisticated customers would look for," admits Heers. However, the SNIA will release new specifications every six to nine months, and these will offer increasing levels of sophistication.
"At the moment, the policies are fairly simplistic... I think they'll get a lot more sophisticated and more in line with business requirement." Mark Heers, Chairman of the Storage Networking Industry Association (SNIA) Australia and New Zealand
The problem with standards, of course, is that they're never quite as standard as the name implies. Vendors invariably add their own proprietary extensions meaning that while a "standards compliant" device can do all the basic things the standard requires, to get to the really cool functionality, you'll need to buy products from that vendor.
So in the end, does the existence of a standard actually prevent vendor lock-in? "For the majority of organisations who just want to do things in a normal way... the standard will suit them just fine," Heers says.
Those organisations looking to do something extra may need to look to the vendor-specific capabilities, however "it's going to be a trade-off because... going beyond the standard limits their choice or adds to their training costs. They know they're going beyond the normal standards and may be locking themselves into a reduced field of vendors, but that may give them a business benefit because it allows them to do something the standards don't encompass today," Heers explains.
Tiering Not all storage is created equal. Different types of storage have different transfer rates and performance capabilities, different levels of availability, different backup and restore times, and of course different costs. And different kinds of data will require varying degrees of availability, performance, and restore time depending on how critical the data are to the business and how frequently they're used or accessed. It makes sense to store data from your critical applications on high-performance storage that's mirrored locally and offsite, but much less sense to go to that much trouble and expense storing someone's three-year-old e-mails that may never be seen again unless they get used as evidence in a court case.
"Customers are looking at tiering their storage, saying 'I need this level of retention for this sort of information, therefore we're going to use this level of performance and cost infrastructure'," says Bowden.
Part of a tiering strategy involves an intermediary layer of storage between your disk systems and tape backups called near-line storage. These devices consist typically of ATA disks-rather than the more expensive SCSI disks in high-performance storage units-and are most frequently used to store backups or snapshot images for several days before they are transferred to tape to be stored offsite. Since most restoration requests are made within 48 hours of data loss, storing backups on a near-line device for 72 hours or more could markedly reduce restore times over tape, according to a recent META Group paper. Furthermore, it would greatly reduce the amount of time staff spend hunting around for backup tapes in order to do restores. In the future, near-line storage may also be used as the primary location for less critical data that may still need to be accessed relatively frequently.
Tape still has an important part to play in a tiered architecture, because of its incredibly low cost. "As good as the cost improvements are in disk, we need other cost levels to keep stuff that we've got to keep for regulatory or compliance reasons but that we may not access very frequently," says Nieboer. "People have to start looking at storage holistically, and that includes recognising that there's a place for tape."
Compliance and policy
Adding to admins' woes is the fact that companies-and especially government agencies-are now being forced by law to store just about everything. "Customers now have a whole lot of legislative and compliance issues to start addressing," says Bowden. "We're starting to see more customers trying to understand what their obligations are from a compliance perspective, what the legal issues are around records retention, and putting in place a policy."
Of course, keeping records is not always a burden; sometimes is can be a very useful insurance policy. "We're seeing in legal cases that the people with the best records have more chance of succeeding than somebody who has been demonstrated to have insufficient or incomplete records," says Bowden.
In light of these requirements, the decisions about which data to store and what resources to allocate to storing them need to be driven by policy rather than technical issues. Today, the decision to archive data most often comes from IT saying "I've got a capacity problem, therefore I'm going to archive everything older than six months," says Bowden. But customers should be asking "What do these records mean for our organisation, and therefore how should we treat them?" The answer to that question can be presented as policy to the IT department: "Here's the policy, this is how I should treat e-mail, this is how I should treat my SAP records, this is how I should treat paper documents," says Bowden.
Information Lifecycle management
Combining these concepts of automation, tiering, and policy-driven storage leads to what vendors are now pitching under the title of information lifecycle management (ILM). Roughly this means matching data's importance to an organisation with the performance level and cost of storage it's being kept on. While this can easily be done at a broad level with whole databases or applications, the ILM vision vendors are currently pitching works at a much finer granularity, even down to individual infrequently-used records in a database or e-mail messages.
"We're really going to have to look at how we exploit different layers of the storage hierarchy to drive down cost," says Nieboer. "I'm not sure how we'll get there unless we automate storage management, because we can't rely on human intervention to be timely or effective enough in managing storage."
"Information lifecycle management is in danger of being over-hyped" Ian Selway, Product Manager for Network Storage Solutions at HP
But automating the process of identifying which bits of data are important and which less so requires a level of intelligence beyond that of current backup or storage management software. "The management tools are just starting to fall into place that allow us to seamlessly move information through a number of different tiers," says Mark Heers, wearing his EMC hat. [Heers wants to make it clear his comments in this part of the article are not necessarily the views of the SNIA.]
If this process were to be done manually, the management costs would be far greater than the savings of moving records from expensive storage to less expensive. Once the tools exist to identify automatically the less important database records or e-mails, for example, they could be moved to another location and replaced with a "stub" or pointer saying "this record has been moved-have a look over there instead".
"Management is of course very important for that and the ability to code policies and that's really the area that will develop in the next year or two," says Heers. "At the moment, the policies are fairly simplistic, it tends to be 'it's this big and this old, therefore it needs to move'. I think they'll get a lot more sophisticated and more in line with business requirement."
Nieboer agrees the concept needs to be refined well beyond merely moving older records around to cheaper storage-that's just archiving by another name. "I think it's a whole strategy of matching my storage costs and deployment to the value of specific information to the business. I will spend more money storing, managing, and replicating data that has high value to the business, and far less storing and protecting data that has less value." The most important factor isn't the age of the data, but "the implicit value of the data to the business".
Key to this development is the addition of metadata-information about each record or e-mail that can help identify its importance-which is something content management software has excelled in. In this light it's not surprising EMC recently acquired content management company Documentum and IBM-which already has strong content management experience-is another leading proponent of ILM. Metadata will allow the ILM software to make judgments like "that's an important document so it should stay here, but that's less important so it can be shunted off somewhere else almost immediately", says Heers.
Like all new technologies, "ILM is in danger of being over-hyped," says Ian Selway, product manager for network storage solutions at HP. "But it's about taking your storage environment and tying it to the business processes." ILM will require a big shift in the way companies think about their IT infrastructure, Selway says. Companies will need to ask themselves, "How does this infrastructure relate to my business processes, and as my processes change, how can I reconfigure that infrastructure to meet the needs of the business?" And any company that mentions ILM and product in the same sentence is probably yanking your chain, Selway warns. "ILM is 90 percent process driven and 10 percent technology."
ILM may be a pretty important milestone on the journey to storage as a service-and there's no question effective ILM is still at least a few years away-but there's still further to go.
"I don't know if lifecycle management is the final story," says META Group's McIsaac. "It's certainly a more mature one than talking about the raw storage, but I don't know if it goes far enough to talk about storage as a pure service, managed by policy and expanded through virtualisation."
"Where I think we're actually heading, we're talking about the adaptive enterprise, on-demand, grid computing, that's where technology is taking us towards," says Selway. "It's the ability of organisations to rapidly change their infrastructure to meet the changing business requirements."
The long-term vision currently being pitched by the major IT players, particularly those with services arms, is for IT as a whole to become a service that can be virtualised and provisioned dynamically. Storage is a major part of this vision, but not as something to be viewed as separate from the rest of IT. "Why have storage as a separate discipline, why not just have it managed in the same way you manage your network infrastructure and your server infrastructure?" says Selway.
To offer storage as a service, when we finally get there, IT needs to present it in terms that make sense to the business, says McIsaac, "Hours of operation, mean time to failure, mean time to recovery, and cost... that's the hand-off between IT and the business: how much storage do you want, what quality do you want, and here's the price."
Sit back and relax
Storage today is an enormous hassle. If vendor visions are to be believed, storage in six or seven years' time will be a matter of putting your feet up and watching the blinking lights. The roadmap is in place, and the steps required to get there have been identified. Will it all work out? We'll just have to wait and see.
Executive summary: serving up storage
Storage management is currently a big pain point for IT departments, but a long-term goal of the industry is for storage to do most of the management itself and be presented as a service. What are the steps required to get there?
Taking stock. It's important to know how much storage your organisation has now and making realistic assumptions in order to plan for future requirements. This can help you get the best deal from vendors.
Virtualisation. Management software is working towards hiding the complexity of your physical storage architecture and presenting storage as a pool that can be expanded or reallocated easily. This also enables a set of repeatable processes to be put in place to improve the quality and decrease the cost of management.
Standards. To make virtualisation effective, storage needs to play nicely with products and management software from other vendors. Although embryonic, the storage networking industry association's standards are a step in the right direction.
Tiering. Not all data needs to be available 24x7 with instantaneous 101 percent availability. Storage can be tiered to provide different levels of availability, restore time, performance, and cost.
Policies. With numerous legislative and governance requirements on storage, storage decisions need to be driven by policies that ensure data is protected adequately to meet all the requirements.
Information lifecycle management. The combination of all the above steps and content management technologies allows data Ã¢â‚¬" down to individual database records or e-mailsÃ¢â‚¬"to be kept in an appropriate tier of storage based on the data's value to the organisation.
Storage as a service. With all these steps in place, storage can eventually become a part of the "on-demand" or "adaptive enterprise" visions currently being pitched by vendors, where it is allocated dynamically to suit changing business requirements.
Can the storage industry keep fitting more and more data on the same-sized disks? Stay tuned in the coming months as we discuss the future of storage hardware.