ILM: Getting intimate with data

ILM is the future of storage (or so we're told). But what is it? How do you get it? The details may still be a bit sketchy but that doesn't necessarily mean you can afford to put considerations to the side.



ILM is the future of storage (or so we're told). But what is it? How do you get it? The details may still be a bit sketchy but that doesn't necessarily mean you can afford to put considerations to the side.


Contents
Introduction
The long road to ILM
E-mail is the word
More than just messages
Learning to let go
Screensound builds an ILM
10 things to know about ILM

More than three decades of experience in a field would normally qualify a firm as an expert, but when it comes to ILM (information lifecycle management), even StorageTek is a relative babe in the woods. Although the general ideas behind ILM do indeed fall along the lines StorageTek CIO Phillip Belcher talks about, turning them into actionable IT strategy is proving far more complicated than many customers want to believe.

The problem extends across the entire storage industry, which had, in recent years, enjoyed a strong resurgence on the back of the ongoing explosion in enterprise data and the healthy consulting and technology fees that accompanied once-mystical storage area networks (SANs).

Commoditisation, however, has taken its toll: storage continues to decline, and even SANs -- whose early reliance on expensive Fibre Channel gear once meant cozy margins for resellers -- have become cheaper after ratification of standards like iSCSI and iFCP, which let Fibre Channel devices communicate using dirt-cheap Gigabit Ethernet equipment over standard network cabling.

With SANs now a relatively unremarkable, and therefore less profitable, way of consolidating enterprise data, the storage industry has been on the prowl for the Next Big Thing. By all accounts, the vendors have found it in ILM -- it's impossible to have a conversation with a storage hardware provider, or one of the many companies providing storage management solutions without it being mentioned.

Most of ILM's components are still evolving, with storage companies assuming stewardship of the sector and buying fiercely in recent years to bulk out their offerings. Many of those acquisitions are only now bearing fruit in the form of loosely integrated products often bearing little relation to competing offerings. Conflicting vendor messages, lack of standards, or even consistent product sets, and market positioning have muddied the ILM picture so much that it's still not clear who can describe it adequately and who can't.

"ILM has been identified, especially by the storage industry, as being where they're going to get their income from," says Andrew Manners, director of network storage with HP. "There's dramatic marketing going on which isn't necessarily matched to capabilities. It shouldn't be a storage play, but should be a consultancy led play. Unless [storage vendors are] working with some of the market consultancies, they don't have a play."

The long road to ILM


Contents
Introduction
The long road to ILM
E-mail is the word
More than just messages
Learning to let go
Screensound builds an ILM
10 things to know about ILM

Ever since it entered the industry's radar screen, ILM has been sold for its business benefits. With data management volumes growing and increasing legislative pressure being added for good record keeping, companies face increasingly onerous requirements for storing, managing and retrieving data. By tracking data from its point of creation until it is disposed of -- the generally accepted mission statement of ILM -- the technology has been envisioned as a way of meeting these requirements and improving overall operating efficiency.

Executives' eyes may glaze over when IT talks about gigabytes and terabytes, but they instantly appreciate the risks of poor data management when contemplating an inability to locate the data necessary to comply with a court order. In one widely cited product liability case, Linnen v AH Robbins Co, the US-based defendant spent more than AU$1.4 million to search 823 backup tapes for e-mails related to 15 employees in question.

Few companies could do much better, so these sorts of costs loom large in the minds of strategy setters. Data management has become fundamental for good corporate governance, which is why managers invariably pull out the cheque books once the risks of not having ILM are pointed out to them.

The process of moving towards ILM is an excellent opportunity for executives to revisit and delineate their business requirements and then to communicate them to the IT organisation, which faces the task of building an information management infrastructure that can meet those requirements.

Many IT executives may have hoped that the data consolidation made possible by SANs would have been enough, but that was just the beginning. If SANs are the highways for enterprise data, ILM is the network of road signs.

"A lot of people said the SAN is going to help lower costs and give greater efficiencies," says EMC marketing director Clive Gold. "Unfortunately, they've learned that the SAN is just the way you plumb things together. It's the software that gives you benefits."

Before you have signs, you will need the places for them to point to. That's where StorageTek and its peers -- IBM, HP, Hitachi Data Systems, HP, EMC, and Network Appliance -- have taken the initiative, seizing the opportunity to define the ILM market in their terms. Yet while there have been some moves in the right direction -- for example, EMC's Centera storage arrays, which will be discussed shortly -- on the whole, most activity in this sector is still chest thumping.

Depending on who you're talking to, ILM is a way of interconnecting storage components; a method for better managing e-mails; an opportunity for data introspection through a major consulting engagement; a new architecture for physical hard drives; a way of modelling business processes in the way storage systems handle data; and a method for aging and retiring information over time. And while storage vendors may have taken the lead in seizing the ILM space, they can't make it happen on their own.

There is no such thing as ILM in a box. It is not a technology, and it is not a product. It is a process, a philosophy for shaping data handling processes according to business requirements and its broad scope extends well beyond the confines of the IT organisation.

"It's the sort of thing that starts a conversation," says Belcher, noting a recent StorageTek survey in which just 17.1 percent of Australian customers claimed to have implemented an ILM-like solution. "We will talk with customers as to why an organisation would consider and decide against an ILM strategy. It's one thing to have a whole lot of technology sitting there, but it's another thing to architect it and have it in sync with what the business requires. We look at every solution on a situation by situation [basis], and we're finding our professional services business is growing significantly."

E-mail is the word


Contents
Introduction
The long road to ILM
E-mail is the word
More than just messages
Learning to let go
Screensound builds an ILM
10 things to know about ILM

In other words, customers are paying handsomely to figure out what ILM is. As in any new market segment, would-be ILM contenders are throwing their opinions into the pot and hoping that the final definition comes up favourable to their own product lines.

The situation is analogous to that of the security industry several years ago, when a broad range of individual products, including firewalls, user authentication, intrusion detection systems, virus scanning, and the like, were sold individually and poorly integrated with each other. It was far from obvious how companies could build an effective security perimeter, and in many ways that discussion is still going on.

Patterns of acquisitions provide insight into the direction of any new market segment. In the ILM world, the first volley in the product consolidation war has been the scramble for e-mail archiving solutions. Veritas snapped up e-mail management vendor KVS and now markets its Enterprise Vault; StorageTek offers its own Email Accelerator; EMC's purchase of Legato netted it EmailXtender and the similar DiskXtender; and HP has repackaged the fruits of its acquisition of Persist Technologies into RISS (Reference Information Storage System).

E-mail archiving systems not only reduce redundancies in the e-mail database that could blow out the size of that database, but separate the content of e-mails from user's inboxes. Messages are normally stored within an individual user's mailbox space, but in a managed e-mail environment the messages are moved into a central repository and the users' copies replaced with a small pointer to the message within the database.

This approach is a great example of ILM's possibilities: by centralising, organising and, indexing the e-mail message store, an e-mail management system can both prevent users from deleting critical information, and facilitate quick retrieval of messages by keyword.

It isn't a surprise that ILM spending sprees generally begin with e-mail: surveys repeatedly show that managing e-mail has become critical to effective governance. Last year, a Vanson Bourne survey of 100 retail; transport and distribution; finance; manufacturing; and other companies revealed that more than 70 percent of finance, retail, and manufacturing companies had no way to stop users deleting their important e-mails.

At the same time, the problem was expanding in scope: 64 percent of those surveyed reported e-mail storage requirements had increased by up to 40 percent in the previous year, while 20 percent said the volume of e-mail managed had increased by more than 65 percent. Forty-nine percent of the companies reported having had to search back-up media for e-mails at some time in the last three years, but only 19 percent said they would even be able to restore an e-mail that was more than 12 months old. These are worrying statistics given that adequate governance now demands the ability to locate e-mails well into the future.

E-mail management makes use of a critical and fundamental aspect of ILM: storage doesn't have to be top-shelf stuff. Given ongoing increases in data volumes, most companies have spent considerably to keep buying more and more hard disks -- typically, the expensive Fibre Channel drives associated with Fibre Channel-based SANs.

Recognising that this becomes problematic in the long term, storage vendors now offer arrays built out of less expensive Serial ATA drives, which are made in much larger volumes and offer better price/performance than the high-end Fibre Channel drives. Such "nearline" disks may not operate as quickly as Fibre Channel, but that may be fine for storing e-mails more than 60 days -- which can be automatically moved from high-speed SAN drives to a slower storage array.

More than just messages


Contents
Introduction
The long road to ILM
E-mail is the word
More than just messages
Learning to let go
Screensound builds an ILM
10 things to know about ILM

Methods for transporting data between storage tiers have long existed on mainframes using HSM (Hierarchical Storage Management), where storage was limited and expensive. Moving HSM into the less structured open systems world has proven complicated: applications take many different approaches to file access, while a multitude of content types and conflicting business objectives have threatened policy consistency.

Moving data from one tier of storage to another -- and, eventually, to tape for long-term archiving -- requires highly detailed, careful housekeeping. Conventional applications expect their data to be in a certain place, and if an ILM system is busily shuffling old data off to slower disks the applications will quickly become irate. Since the system won't work if the user has to hunt for the file, e-mail management systems must seamlessly change relevant links to reflect the content's new location. That requires ILM capabilities to be built deep within the enterprise application platform.

As ILM is taken to its logical conclusion, every document and piece of information produced in a business -- not just e-mail -- will need to be analysed, indexed, and moved according to policies that reflect its relative importance to the business (not just its age, as in HSM). It's a massively complex endeavour because the operating environment needs to know exactly where every piece of information is at any given time.

This means ILM will require that consistent data indexing and access techniques be integrated at every level of the enterprise architecture. In a fully implemented ILM environment, indexing will become a core function of data storage -- not something that's done once a day or week.

"It's not just about the spinning media that we're going to store things on," says Veritas strategic technical architect Simon Elisha. "It's really rewrapping the HSM discipline and marrying it together with the compliance discipline. Integration with e-mail systems has always been a core focus, and the next step is to take a level of integration and focus on data placement aspects, research, and recovery of data."

That Microsoft has backed away from plans to include its WinFS file system with Longhorn in 2006, confirms the complexity of this task -- WinFS's biggest selling point is its ability to continually index content, and to track that content between storage tiers.

Since implementing this capability at the operating system level remains complex, storage vendors will likely control the ILM agenda in the short term. So far, the most progressive vendor has been EMC, whose Centera content addressed storage array tracks every data element from the moment it hits the drives. Each piece of information is marked with a unique hash code based on the content of the information. This hash sticks with the data as it's moved between fast disk, slow disk, and tape. Applications no longer worry about which specific disk or network volume contains a file; they simply ask for the file from the Centera cluster, which checks its hash index and pulls the information from wherever it's been tucked away. This is similar to how e-mail management tools virtualise the messages, indexing them in their own way and managing user requests for specific content.

The approach may be sound, but Centera is only one solution, and is tied to a specific vendor's products. In the longer term, the industry needs to both standardise methods for archiving, indexing, and policy creation so that enterprises can implement ILM solutions that just work. This requires standardised ways of describing and prioritising both content and the ways in which it is stored -- something that may in the long term be driven by peak body the Storage Networking Industry Association (SNIA).

"We're trying to work with vendors to explain where the drivers are, and explain to users what vendors are thinking about," says Ray Dunn, chairman of the SNIA's Storage Management Forum. "Information has different value for end users depending on how it represents the business. End users need to be able to think higher in the stack and think about their applications."

In other words, think not only about your storage infrastructure but also your content and how it's structured. Content management developers have joined storage vendors and infrastructure suppliers to contribute their own perspectives on the evolving field: knowledge and content management provider OpenText, for one, recently snapped up e-mail management vendor IXOS to enter the market.

Learning to let go


Contents
Introduction
The long road to ILM
E-mail is the word
More than just messages
Learning to let go
Screensound builds an ILM
10 things to know about ILM

"In the future, all storage will be bought under the ILM banner," says HP's Manners. "Today, it's about getting the alignment of the storage with the business: a lot of organisations we're talking to just don't have that sort of information about how to go about this. The boardroom has a plan that IT is trying to live up to today, but it's not building an architecture for the future."

A major part of that architecture involves a large mindshift: implicit with the idea that information has a lifecycle is the idea that it will die at some point, usually after it's no longer statutorily required. That's a fundamental shift in the way data is viewed, and in the way that IT works. Having spent decades building systems big and fast enough to process and store all the information that businesses produce, IT planners must now contemplate the systematic deletion of that information.

Introspection and assessment of internal information priorities will soon highlight exactly what information can go and when. It's a hard process, but companies will be better off for it in the end. It has become clear that ILM is a combination of all of these activities and others that aren't become totally clear yet and vendor's approaches to ILM may be far from illuminating, and even dangerously limited in scope for customers seeking to capitalise upon it.

There promising signs -- last month StorageTek unveiled OpenSMS (Open System Managed Storage), a proposed standard it believes will point the direction for future ILM efforts. OpenSMS includes OpenHSM for interacting with managed online file systems, and OpenTMS (Open Tape Management System), a removable media management solution that includes source code from StorageTek's ReelLibrarian software.

Whether StorageTek's standard gains traction, or simply paves the way for competing standards from other companies, remains to be seen. Either way, it will be some time before such standards become practical.

One thing is clear -- given the expectations around ILM and the promise of its underlying philosophy, every business should at least know what ILM is about and how its use might benefit them. E-mail management systems are the only actionable component of ILM available so far, so they should be an initial focus.

"I don't think you can talk about ROI from an ILM perspective, but [you can] from a bits and pieces perspective," says Phil Sergeant, research director for servers and storage with Gartner Asia-Pacific. "A lot of organisations really don't know who's creating and using information within their organisation, so the starting-off process requires a more intimate understanding of their information. They need to understand it from a performance and security perspective."

Screensound builds an ILM


Contents
Introduction
The long road to ILM
E-mail is the word
More than just messages
Learning to let go
Screensound builds an ILM
10 things to know about ILM

When you're building a massive archive of content that needs to be accessed for generations, you tend to think a lot about how that content will be accessed over time. For Canberra-based audio-video archivist ScreenSound, a part of the Australian Film Commission, this issue came to the fore when the implications of a comprehensive digital archiving strategy became clear.

ScreenSound long ago began managing its collection using MAVIS (Merged A/V Information System), an inhouse system that tracks massive volumes of physical assets as they're cycled through various storage facilities and other locations.

With MAVIS recently enhanced to manage digital content as well -- ScreenSound has archived over 3TB of digital audio content in the last three years alone -- the situation has changed substantially.

Volumes of data stored are increasing, with the coming addition of film archiving expected to further push ScreenSound's storage demands through the roof. Supporting this demand is a StorageTek L700 automated tape library, which will eventually provide online storage up to 200TB.

ScreenSound has worked carefully to use metadata standards that will help it track its growing library of content. Once content is described appropriately it can be moved between storage locations and tiers according to priority thanks to the capabilities of VERITAS Storage Migrator (VSM).

This setup has effectively split the responsibilities for various parts of ILM: MAVIS manages individual asset information, tracking tapes, disks and files from the moment they're acquired. For its part, VSM handles the mechanics of moving data throughout its lifetime.

10 things to know about ILM


Contents
Introduction
The long road to ILM
E-mail is the word
More than just messages
Learning to let go
Screensound builds an ILM
10 things to know about ILM

  • ILM is not a product. It is made up of many products, most of which don't even exist yet so don't be fooled by product suite masquerading as ILM.


  • Storage is only one part of ILM. Avoid storage-only ILM plays unless they offer future expandability into ILM frameworks.


  • Business, not IT, has the wheel. ILM absolutely will not work without the involvement of business leaders.


  • Deletion is necessary. It may seem counterintuitive, but ILM is fundamentally about deleting data when it's the right time.
  • ILM isn't set in stone. But you can start planning.


  • Know your e-mail. Sort this first.


  • Know your apps. Once e-mail has been taken care of, think long and hard about which applications are producing what data.


  • ILM must be deep, very deep. ILM requires constant indexing and reindexing so data can be tracked effectively and retrieved.


  • ILM will require consistent standards. For describing service levels, application priorities, data retention priorities, etc.


  • You probably need ILM.


This article was first published in Technology & Business magazine.
Click here for subscription information.