Practical strategies for protecting your data

In my series on RAID technology and why it makes sense for home users, fellow blogger Robin Harris took me to task saying that RAID won't ever fly in the home.  Robin's main gripe with RAID is that it doesn't address the backup along with other complaints such as the cost, reliability, and complexity of RAID technology.
Written by George Ou, Contributor

In my series on RAID technology and why it makes sense for home users, fellow blogger Robin Harris took me to task saying that RAID won't ever fly in the home.  Robin's main gripe with RAID is that it doesn't address the backup along with other complaints such as the cost, reliability, and complexity of RAID technology.  Robin further argues that RAID gives people the false impression that they don't need to backup.  I have to admit that when I first read Robin's counter arguments, I was a bit upset with him but I've since given it some thought.  While I still think he got many things wrong, Robin does bring up a lot good points that need to be debated so I'm going to address his arguments point by point.

What is a backup?  If we go with the classic dictionary definition of backup, RAID redundancy is a form of backup because it provides insurance against a physical disk failure.  From an IT (Information Technology) department's standpoint, RAID redundancy doesn't meet the criteria for "backup" because it isn't off-line which means it's easier to get accidentally deleted or corrupted and it lacks geographic diversity because it's in the same physical building.  For this reason, RAID redundancy is called "availability" so that a failure of a drive doesn't force you to go down for hours while you recover your data.  Only a copy of the data that is physically moved off-site by someone picking up the tape or portable hard drive or replicated to a remote site can be considered a true backup.  The problem is that consumers won't ever go to this expense or trouble to backup their data and they're at the mercy of a:

  • Data corruption or deletion
  • Disk failure
  • Physical disaster like flood or fire.

The first two issues are probably going to be the most common reasons people lose data.  Most of the time, users will lose data to a disk failure or data corruption or deletion due to accidental or malicious Malware threats.  Almost every survey that has ever been taken shows that the vast majority of consumers don't backup their data and the few who do backup don't do it regularly.  If I had to make an educated guess, I would say more than 90% of consumers don't backup regularly and nothing that I say or anyone else can say is going to change that.  Out of the remaining few that do backup their data, most of them probably aren't doing off-site backup.  The question becomes what do we do about this?

Robin Harris thinks that people should just forget about RAID redundancy and just backup their data and keep it off-site.  I can't argue with Robin that off-site backup is the IDEAL solution; the problem is that I don't think it's a realistic option for most people, at least not for all of your data.  It isn't just the cost issue associated with backups, it's the fact that it requires action and most people are naturally lazy.  If it isn't seamless, cheap, and easy for them, they simply won't do it.  What's needed is a hybrid approach that's tailored to people's needs and I'm going to explain how to determine the best approach to protecting one's data.

<Next page - Criticisms of RAID technology>

Criticisms of RAID technology

Robin Harris criticized RAID technology for being expensive, complex, and sometimes even unreliable if the entire array fails because of a RAID glitch.  The criticism that RAID is expensive expired two years ago when cheap on-board RAID Level 5 capable controllers like Intel's ICH7R became an embedded motherboard technology.  RAID controllers use to be horribly expensive and slow but now the cost of a RAID controller is essentially free and it provides insane performance levels.

The criticism that RAID is complex still stands for the most part though it's not as complex as Robin Harris may lead you to believe.  On Intel's ICH RAID controllers, it lacks the ability to give you a drive-failure light though the software will tell you what drive number has failed.  Of course this means you'll have to properly verify and label your drives accordingly so that you don't pull the wrong one if a drive fails.  Pulling the wrong drive can cause serious data corruption or possibly worse.  Furthermore the labeling on the Intel motherboard didn't match the actual port numbers and I had to manually figure out the port numbers.  It might be trivial for Intel to update their firmware to flip the drive I/O activity on so I would highly suggest to Intel that they address this usability issue.

The expensive add-on controllers that cost a few hundred dollars will have the warning lights but they're generally too expensive for consumers.  On the other hand, the Intel RAID controller does let you replace the motherboard and reconnect your RAID drives in any order and it will mount so that's definitely a life saver if a motherboard fails or you want to upgrade to a newer motherboard with a newer ICH controller.  Robin feels that RAID storage units should automatically lock all drives to prevent accidental extraction and only unlock the failed drive.  Ultimately I think Robin is right if this technology is going to be accepted by the main stream as idiot-proof but I think most power users should know how to pull a drive that's flashing red.

As for the worst case scenario where a user might lose the entire RAID volume, that's extremely unusual and I've never seen this happen on a hardware RAID controller.  Even in the old days when we used the Operating System to perform RAID, users could backup their configuration.  As I already mentioned with the Intel ICH RAID technology, you can pretty much replace the entire motherboard and plug in the drives in any order and the RAID will automatically mount.  Ideally there should be some kind of industry RAID standard so that consumers can replace their RAID controller or motherboard with any manufacturer and automatically mount existing RAID configurations.  So while Robin has many good points and the storage industry needs to improve many areas of usability, I don't think he's right in declaring that "RAID doesn't work" or that it's completely unusable by consumers.

<Next page - Backup versus RAID>

Backup versus RAID

To give a good analogy, we can think of RAID redundancy (especially RAID Level 5) as a spare tire for your car that can save you in most situations whereas an off-site backup solution is a spare car parked a few blocks from your home that will save you in a lot more situations.  The spare car parked off-site will even save you if your entire home burned down.  The problem is that a spare car is awfully expensive compared to the spare tire strategy and you have to find off-site parking for it.  You'll either have to keep it at a friend's or family's house or you'll have to rent storage space somewhere.  The point is that a full data backup set requires double the capacity to store data with 100% overhead whereas RAID redundancy gives you protection from disk failures with only 20% overhead (in the case of 5 drives being used in a RAID Level 5 volume). 

The "spare tire" RAID Level 5 strategy means you buy 5 hard drives but you only get to use 4 of them for storing your data.  The instant one of the drives fail, the "spare" saves you from losing any data and your storage is still accessible to you.  That solves the failed drive problem but now we have to deal with accidental deletions and data corruption from faulty software or Malware?  Well Windows Vista has file journaling technology and so does the new Windows Home Server or Linux NAS (Network Attached Storage) devices.  That means you get to go back in time and find a previous version of the file that wasn't deleted or wasn't corrupted.  This strategy may not always be 100% full proof but it is extremely simple and transparent to the user and it will protect them from accidental deletion or corruption most of the time.  The bottom line is that RAID and file journaling technology gives us a fair amount of protection against the two most common reasons people lose data and they're 100% seamless to the user.  Even more importantly, they offer CONTINUOUS protection whereas backups and archives have gaps between when you do them.

In reality, we can't be certain that we won't experience a fire or some other disaster and we can't be 100% certain that RAID with file journaling will always prevent file corruption or loss.  If all we're talking about is backing up some email files and some photographs, then having a full backup copy of the data stored off-site is relatively trivial.  The problem is that people have hundreds or even thousands of gigabytes of data from their video files (download, ripped, or home movies) and it isn't trivial to make a full backup of everything.  So realistically we're probably going to have to go with a hybrid approach where everything that is precious and irreplaceable like home photos, home movies, documents, emails should get archived off site but your other bulk data needs the cheaper form of "spare tire" insurance.  Worst case those DVD rips and video downloads can be ripped and downloaded again, but you can't just rip or download those family photos or family movies again.

<next page - How to backup and archive data>

How to backup and archive data

So what do we do with our precious memories and precious data that can't be replaced by re-downloading and re-ripping?  If it's less than 8 or even 16 GBs then it's trivial to archive.  Just take two or four blank DVDs at $0.10 a piece and burn all that data to the DVD (double layer DVD blanks are $1.60 per blank so they're a lot more expensive).  DVD burners (if you don't already own one) are less than $30 for internal models.  If it's family photos and home movies, give a copy of it to other family members living elsewhere so that they can enjoy it while acting as an off-site backup for you.  If you've got sensitive personal data then you're going to have to use some kind of encryption software which makes maters much more complex.

If we're talking about a few hundred gigabytes that need to be archived, then blank DVDs will be a nightmare to manage even though it's the cheapest media per GB (3 cents per GB).  Blank Blu-ray and HD-DVD media are priced at more than $1 per GB so they're too expensive for most people not to mention the fact that you need a $600 optical drive.  Tape devices are slow, very expensive, and extremely unfriendly so they're completely out of the question.  Your cheapest and simplest option would be to either use a portable hard drive.  An external USB model is limited to 30 MB/sec performance at best.  Internal SATA Hard Drive with a hot-swap tray or an external eSATA hard drive which runs as fast as an internal hard drive (70 MB/sec) but you'll have to install an eSATA adapter in the back of your PC.  Hard drives can now be purchased for less than $0.25 cents per gigabyte and it's the easiest and fastest medium to work with.  The only word of caution for using hard drives is that you put it in to a padded bag and keep it away from strong magnets.  In any case, hard drives are sealed and far more resilient than expensive and slow tape drives.

To help you figure out how much money and backup capacity you need to allocate, here's a table on how much of your "stuff" you can backup on a single 500 GB hard drive based on the current discount price of $120 per drive.  Of course this just shows you what a single 500 GB hard drive can archive and you can easily add archival drives as your data grows.  Eventually 1000 GB (1 TB) hard drives will become the cost per GB leader and you'll be able to use those for future archiving.

Note: RAW images are the most space-efficient and highest quality format for storing images and this table should be useful for professional photographers and consumers alike.  TIFF images take three times (possibly more depending on color depth used) the capacity while being inferior to RAW images. 

Item Size Number cost/unit
16 megapixel RAW images 16 MB 31250 $0.0038
8 megapixel RAW images 8 MB 62500 $0.0019
Hours of DV or HDV video 12.6 GB 39.68 $3.02
Typical DVD movie 6 GB 83.33 $1.44
Typical DIVX movie 1.3 GB 384.62 $0.31
So with the portable hard drive at our disposal, we will have to "drag and drop" all of our critical irreplaceable data on to the portable medium and we have to physically carry it to our off-site storage location which could be some storage space we rented, a friend's house, or Mom's house.  Of course we'll return the favor and keep a copy of their data so we can essentially be off-site backups for each other.  Theoretically you can use the Internet to backup data for each other but uploading tens or hundreds of gigabytes of data at 384 kbps is excruciatingly slow.

But this brings up some privacy concerns.  Encryption might not be something that everyone thinks of or cares about, but this is a great solution if you don't even want the possibility of other people peeping inside your archived data.  The nice thing about hard drives (internal or portable) is that you can just turn on Windows XP or Vista EFS and enable it on a folder called "private" and everything inside it will be encrypted.  All your private stuff goes in there and your public stuff doesn't.  You should make a backup EFS key on a USB key because that's the only way you'll ever get access to your backup data if your primary computer is destroyed.

Now none of these solutions are pain free and they all involve some work, but I'm giving you the easiest and cheapest solution possible.  Hopefully this will get you to think about your own precious data and evaluate what's important to you.  I'm going to start archiving all of my home movies and photos on to a single 500 GB drive because memories are precious.

[poll id=26]

[poll id=27]

<Return home>

Editorial standards