Tragicomic fixations: refusing to know what you don't know

Among other things ZFS renders most commercial backup and recovery solutions for Unix obsolete - something a lot of people know, but aren't yet ready to act on.

One of the saddest, and funniest, real life stories I know involves a big dollar project, undertaken by a major international programming house under contract to a Canadian telco, to develop a line printer facility for NCR's System VR4. It's hard to imagine, but after this project was reviewed, funded, contracted, and started, it actually took a turn for the worse - and indeed by the time we got asked to adjudicate contractor cost over runs the usual finger pointing over the project's failure was in full flame.

I was reminded of this a couple of months ago when some people I know got all excited about a great new freeware product they were going to launch for Solaris - a script that would let a sysadmin reliably turn off all unnecessary external network connectivity with one command.

And then yesterday I got an email pitching a data backup and recovery solution for both Solaris and Linux - a product freshly ported from Windows by "the leader in enterprise data storage solutions." Apparently it backs up files, restores files, rejoices in a (Windows Vista required) GUI client, and "replaces the ugliness of tar" with an easy to use integrated "data vaulting solution."

My first reaction was to think that if I were citing ugly I'd go with cpio - and that tar isn't compatible between Linux and Solaris anyway (gtar is)- but the more interesting thing is that products like this exploit the customer's ignorance in much the same way Linux desktop virus checkers do.

What's most interesting about it, however, is that the backup strategies this product appeals to no longer apply to Solaris, soon won't apply to the BSDs, and will, sooner or later, lose relevance to Linux as well.

What's going on is that the standard process for enterprise Unix backups is really two rather unrelated processes.

One is the data processing derived disaster recovery backup expected by (data processing trained) auditors: you make daily tapes, send them off site for safe-keeping, update the colorful powerpoints defining your recovery plan every time you upgrade Powerpoint, and hope you never, ever, have to use any of it because the plans never survive the first senior IT manager to arrive on site and some critical chunk of code or data will prove unrecoverable anyway because it either didn't get written to tape or depends on something that wasn't written to tape or simply because the tape is unreadable.

The other one is, of course, the real one. Typically it provides for some form of processing or data continuity but is more oriented toward the kind of recovery request sysadmins face every day: some user deleted something - or ran his test program against the live data - and you could roll back the clock, please?

Until recently the best way to do this was to copy everything on the production systems to a RAID set maintained solely as a backup and then back that up to tape or another remote machine during regular working hours. This takes more effort than a commercial solution to get working because you need to customize the copy process for every application or environment you're responsible for, but makes handling recovery requests so trivial that you can practically script them by user name, gives you a quick recovery capability if your data center does blow up, and makes it easy to find out what happened when the boss's pet idiot gets root access to your most critical database server just long enough to emit an "# chmod 0000 /etc/init.d "

But technology marches on, and prices continue to fall. For about the same money you'd pay for a good commercial backup and recovery package you can implement a ZFS "thumper" as your general backup server - and not taking a close look at the ZFS time machine? that's just about as smart as expressing your disapproval of an executive decision to buy Unix by paying people who don't know what a model script is a couple of hundred thousand bucks to add a $9,000 Sperry line printer to a machine running AT&T System VR4.