In search of the perfect backup solution

Finding the perfect back-up system is harder than it looks. CDP could be offer the answer, if only it didn't suffer from the 4Ds - desirability-driven definition dilution

The perfect back-up product is easy to define. It should cover all data on a system, automatically adjusting to protect new services and storage as they're added. It should have no impact on normal operations, neither slowing down the systems it supports nor requiring extra management effort. No matter what the mishap, from a mistyped command to a missile attack, the perfect backup will be able to recover to the precise point before things went wrong — and, needless to say, that recovery process will be both swift and painless. Easy to define, perhaps, but impossible to provide — you might as well wish for the perfect storage system in the first place.

However, one up and coming back-up technology has the tantalising promise to come close on at least some of these fronts. If you listen to the vendors, Continuous Data Protection (CDP) is a low-impact, high performance idea with near-miraculous properties. It doesn't matter what you do to your data, you can roll back to exactly where you want and carry on as if nothing had happened — and you don't have to spend all your time monitoring, planning and interrupting normal operations to create images.

Continuous Data Protection is a digital video recorder for storage. Starting with a known good image of the data to be safeguarded, it sits in the background monitoring and copying all changes to that data as they happen — not copying the working store, but maintaining its own independent database of deltas. If something goes wonky at 4:05pm, you can take your original backup and the CDP records and tell the system to reconstitute all changes up as far as, say, 4:04:59 pm. The result is a working system recovered from a point in time just before the problem hit: it doesn't matter if every last crumb of data on the live system had crumbled to binary dust, you've turned back the clock. It's like watching the big match on Sky Plus or TiVo — with the live action recorded as it happens, you can hit rewind at any time and replay the items of interest without having to decide beforehand when or what to record.

This might seem miraculous — and any systems administrator with some experience beneath their belt will think of times when CDP would indeed have been as welcome as a Tardis. However, even Time Lords are subject to the no-free-lunch rule. CDP requires lots of extra storage, which is dependent not only on the size of the data sets being backed up but also the rapidity with which they change. It also needs to continually monitor every movement of data, which impacts on the performance of the host system, and while the idea of asynchronously copying all transactions to backup takes up just one line on a PowerPoint slide it can in reality produce a lot of extra network traffic — especially significant if your best practice involves keeping backup storage in a different location to the live system.

In particular, while CDP vendors are fond of quoting an average overhead of around 2 to 4 percent performance impact on a system running a CDP agent this relies heavily on data reads incurring almost no penalty and thus keeping the average down. Writes trigger CDP action and take more resources, and a system which does much more writing than reading will incur a higher penalty than the headline figure. There are also issues with maintaining proper synchronisation across multiple servers being protected by a single CDP system, especially on heavily loaded installations with IO bottlenecks — some vendors introduce local caching to reduce this problem, and have monitoring and transmission processes running asynchronously.

Actual implementations of CDP vary widely. It can be introduced anywhere in the data flow, from the application down to block level — the latter having the advantage that that it is OS agnostic and does not care what it is protecting. However, systems with greater knowledge of the overlying system can offer greater flexibility in recovery: pulling an entire system back in time to fix a single file corruption may well be more disruptive than the problem it's fixing. So some finesse in recovery management is valuable.

CDP is also what can be described as a 4D term — one subject to desirability-driven definition dilution. This is the rule that the more exciting a new idea is, the more people will claim it for their products regardless of strict applicability. Recent examples include grid computing, which is in danger of becoming a catch-all term for plugging computers together, and Ultrawideband, which started off as a radical redefinition of wireless technology and now seems doomed to refer to plugging lots of low-powered old fashioned channels together.

With CDP, the definition dilution stems from its affinity to previous technologies such as snapshotting. If you don't record every change, but take a snapshot of changes every ten minutes or so that's nearly the same, right? Microsoft's new Data Protection Manager thinks so — before the public beta it was trailed as being CDP; now it's "near-CDP" because it takes a lot of snapshots. Snapshots aren't the same thing at all. They depend on the database being in a stable state — quiesced — when they happen and this requires a degree of awareness from applications: they must be able to be told to put themselves into a known good state and wait for the snapshot to complete.

CDP is also quite a young technology, and is currently at the exciting stage in any idea's life when it's mostly available from start-ups. This means that terminology, interface standards and performance expectations are still thrillingly flexible. However, in February 2005 the Storage Networking Industry Association formed a CDP special interest group to bring together the major players in the field with the intention of creating agreed terms and interfaces. Companies involved include Alacritus, EMC, Hitachi Data Systems, InMage, Mendocino, Mimosa Systems, NetApp, TimeSpring, Revivio, Scentric, Storactive, Sun, Veritas and XOsoft.

There is no doubt that CDP is a powerful concept with the potential to move us much closer to the ideal backup system. It is even possible to foresee such functionality sinking into the hardware of storage devices themselves, with integrated disk controllers outputting network-ready channels of change information — perhaps building towards the ultimate concept of self-healing distributed storage networks. For now, those charged with looking after expensive data should be asking their vendors about CDP — while normal scepticism should not be suspended, this time they really do have a good idea to sell.