NAS Wars 2017: Behind-the-scenes and final RAID review results
How valuable is your data? If your storage drive crashed, would it ruin your day? Your week? Your entire career? Only you can answer those questions for yourself and your organization. But I'll tell you, personally, I need my files -- not only to get my day-to-day job done, but to reference older information and even look at personal keepsakes (like all my digital photos).
Over the years, I've discussed storage strategies a lot. I've talked about the traditional 3-2-1 backup approach, and what I call 3-2-1 off and away backups. I've talked about cloud backup, and how having most of my critical data in the cloud helped me keep going during last year's hurricane evacuation. I've even taken you through an in-depth torture testing of seven different RAID storage devices to find the winners (and losers).
For years, I never had a critical drive failure that would showcase the benefit of RAID. Recently, I've had two. My RAID arrays saved my bacon.
My storage architecture in Florida was in flux when Hurricane Irma hit in September. I had just completed that full RAID comparison series I mentioned earlier, and I'd decided I wanted to consolidate from six RAID arrays (all different generations of technology) to two or three boxes. But I hadn't yet done so.
At the time, I had four RAIDs storing mostly old data, things that I had numerous backups of, including full cloud backups. They were, essentially, my local cold storage. They were fine. Even if I had to re-download all of that data from the cloud, it wouldn't matter. I didn't need anything on there in any kind of rush.
But I did have two RAID arrays for my hot data. I had moved most of my day-to-day work and management files to a four-drive NAS RAID from Synology, the 916+. I had reviewed it back in January 2017. I liked it so much, I had decided to make it my main network NAS. I was using Synology's spectacular Cloud Sync system, so I had all the data on this box cleanly backed up.
The second mission-critical RAID array was (and still is) a direct-attached Drobo, connected via USB 3 to my iMac. This has all my video editing files and the very, very extensive media asset library I use for my presentations. We're talking upwards of half a million indexed images on this thing.
It. Was. Not. Backed. Up.
That was my fault. My excuse was that I had stopped using CrashPlan when they dropped their consumer backup service, but that's just an excuse. My other excuse was that there was so much data on that RAID that uploading it to another cloud service provider would take forever, and I hadn't yet chosen a CrashPlan replacement.
Think about the stupidity of that. I had so much data on my RAID that it was to hard to back up. It boggles the mind thinking about that logic. If I had so much data on there, then I sure couldn't afford to lose it.
My justification was that while, in aggregate, my media asset library was mission critical, no individual file on it was that important. I could, theoretically, re-download, re-install, and re-index all those files. But. Still. The fact was, it was something on my to-do list that I didn't do.
The fact that I had a stack of review RAIDs just sitting in the product review lab (aka, my garage) didn't seem to cross my mind. I never even backed up that RAID to a local spare RAID box. Like I said, it boggles.
So, then, keep in mind these two RAIDS: the Synology NAS containing most of my business data, and the direct-attached Drobo containing most of my media production resources.
About a week before Irma was due to hit Florida, my family and I evacuated. I've told this story before, but we wound up moving to Oregon, which is where I am now. I stayed in Oregon while my wife flew back to Florida and worked with movers to pack up the house and office. That packing included packing up the RAIDs.
I don't know if the RAIDs weren't packed right, or they were bounced around by the movers when they were lifted and transported, or whether the moving truck hit a pothole in transit. It could have been anything.
What I do know is that when I unpacked these two RAIDs and powered them up, each had a failed drive. This was problematic on the Synology, but I did have good backups. If the Drobo hadn't recovered, it would have been quite bad. That, if you recall, had all my not-backed-up media resources.
The Drobo is a five-drive array, and the Synology is a four-drive array. While it's possible to configure both arrays to recover when two drives fail at the same time, that uses up an extra drive worth of redundant storage. I've never had two drives in the same array fail at once. Since I generally implement a 3-2-1 off and away strategy, I figured one drive of redundancy was good enough.
I was very fortunate. I did have two drives fail, but they were in two different arrays. In each array, I removed the failed drive, slid in a good, replacement drive, and waited. And waited. And waited. It takes quite a while to rebuild a failed 4TB drive.
I was very, very lucky
Both RAIDs performed exactly how they were supposed to. I'd given them both top marks in my testing for RAID performance, so I wasn't that worried, but still. It was scary until both arrays showed successful rebuilds. And yes, I did, in fact, make a backup of my media asset library. Should have done that six months ago, but at least it's now been done.
This is why you use RAID. Yes, there are backups. But RAID is your first line of defense, and in my case, both RAID boxes did exactly what they were supposed to do. They insured against data loss in the event of drive failure. When a drive did fail in each box, they restored the data, and kept on going.
When it comes to disaster recovery and business continuation, we often look at what are theoretical failures, some time in a foggy future. But these examples, both of my relying on cloud sync after an evacuation and on RAID to recover my production files, show DR and BC are practices we do because real life does intrude. When it does, we need to be prepared.
My personal practice was flawed. But because I did have multiple levels of data protection in place, I was still able to recover everything and keep on going. That's why we build multiple levels of protection into our plans. It's not only to protect against cascading failures. It's also to protect against the time when a to-do item hasn't been done, and you still need to get your data back.