Why backup isn't enough

You can back up your data daily - even hourly - and still lose data. Here's how it happened to me.

You can back up your data daily - even hourly - and still lose data. Here's how it happened to me.

My backup regimen I run 3 backups:

  • Time Machine. Part of Mac OS X, Time Machine backs up changed files in my user account every hour. Like too much of Apple's tech these days, the UI is great, the underlying tech not so much.
  • Carbon Copy Cloner. My 300 GB VelociRaptor system disk and a 2 drive, 2 TB RAID 0 are cloned nightly. 2 copies of everything.
  • Online backup. My critical documents are backed up to an online provider.

Given that most people don't back up at all, I should be golden, right? But I'm not.

The limits of backup As explained in How Microsoft puts your data at risk our Windows, Mac and (most) Linux file systems are, in a word, junk. Architected decades ago in a world of costly storage and puny CPUs they will slowly hose your data.

Flaky hardware, inconsistent error handling, bit rot, phantom writes and more give consumer file systems problems they can't handle. And backup can't handle them either.

In fact, backup just spreads the corruption.

Let's say you you have a large PDF. You read it, make a couple of annotations, save and then close it. Unbeknownst to you and your file system the save corrupts the file. A bad write perhaps.

The corrupted file now gets saved by Time Machine in the next hour. Then cloned to the backup drive that night. Now I've got 3 corrupted files.

But wait! Time Machine keeps old versions for months, as do some of the online backup providers. I still have a good copy.

Until several months pass. Then all I've got are corrupted copies. That's what happened to me.

The textbook answer If you backup a corrupted file you still have a corrupted file. Big company IT shops have dealt with this problem for decades and they have the answer: archiving.

They take copies of files and place them in a read-only archive. If the file is later corrupted on the active storage, they go to the archive and pull out the - hopefully - uncorrupted copy.

But if few people backup even fewer make archives: PC archiving software is geeky; the storage requirements are large; and the perceived benefit is low.

The Storage Bits take Those negatives around archiving aren't changing. Home users will rarely archive even 2 decades from now, despite backing up. It is just too boring and expensive.

Which gets back to yesterday's post Apple's weak tech-fu. We need file systems that make data corruption rare.

As we store more files for longer times and look at them less often, data corruption will become more visible. Let's get in front of this problem today.

Comments welcome, of course.