The RAID5 delusion

The RAID5 delusion

Summary: RAID5 isn't for protecting your data – it is for keeping your applications running when there is a failure. And it is no substitute for backup.

SHARE:
TOPICS: Storage, Cloud
25

Case in point
I spoke to the head of small company – about 25 employees – who had suffered a RAID5 drive failure. The 4TB RAID was used for file sharing.

A drive failed, reconstruction failed and vendor phone support was disastrous. All data was lost.

But the worst of it was that there was no backup. They believed that RAID5 would protect their data. They were wrong.

What RAID5 is for
RAID5 does offer some data protection assuming it works. But it's main purpose is to protect access to your data. This is why it is popular in enterprise applications where maintaining data access during a failure is of vital concern.

But these arrays are always backed up so that if there is a catastrophic array failure – a not uncommon occurrence – the data is still recoverable despite the interruption in service.

That's how it played out with the small company. After the drive fail they still had access to their data. But when they replaced the drive the rebuild did not go as expected. They were stuck.

If they had stopped there and made a backup they probably could've saved all their data. But they thought the RAID was there to protect their data. Oops.

The Storage Bits take
Most enterprise RAID today do not use RAID5 because the likelihood of a second failure during the rebuild – increasingly lengthy because of growing drive size – means that the likelihood of a second failure during rebuild is too high for comfort. Instead they use RAID6 - and hyperscale Internet services use even fancier erasure codes that can survive 4 failures.

Note that this does not mean a second drive has to fail: it can be as simple as an unrecoverable read error on a remaining drive that totally pooches the rebuild. Then you have to go to your backup - assuming you have a backup.

RAID was a wonderful advance 25 years ago. But the catchy name is no substitute for a backup and archive strategy.

Comments welcome, as always. That said, how do you archive your personal and small business data?

 

Topics: Storage, Cloud

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

25 comments
Log in or register to join the discussion
  • Even RAID6 Is On Borrowed Time

    I think former ZDNet blogger George Ou spelled it out several years ago: as hard drives hit 10 terabytes or so, the probability of hitting an I/O error during a RAID (re)build gets close to 100%.

    This is why we have next-generation filesystems like ZFS, BTRFS and HAMMER: they take the redundancy and failure-tolerance of RAID to the next level, making actual RAID unnecessary.
    ldo17
    • Woww

      what Kimberly responded I cannot believe that a mom can earn $5887 in a few weeks on the network. have you read this
      http://goo.gl/wfqw8f
      HenryWell
  • Unemployed IT Person

    I'm assuming their IT guy is now looking for work? Obviously whoever it was doesn't have a clue about IT!!!
    nuttyp
    • Businesses dont listen to IT people

      what a laugh - it doesnt matter what IT consultants recommend - its what the business will do. Many of them ignore IT advice at their own peril. Backup is an unnecessary afterthought to most of them (where in their minds all they need is a single usb drive attached to their server to "backup" to - the notion of media rotation completely lost on them).

      They get what they deserve!
      rocker2000
    • You have to wonder about some IT staff

      At the last place I worked, the previous IT person's solution for adding dual monitor setups was to hook a Matrox DualHead2Go Analog Edition to each system, despite the fact the older systems all had slots, and the newer ones natively supported dual monitors.

      Worse, that unit only supported 4:3 display resolutions (and only up to 1280x1024), and the new monitors were 1920x1080 16:9 models.

      Then there was the case of their network. The office had a gigabit network, and all the computers had gigabit network adapters, but when their new IP-based phone systems were installed, each computer was looped through the 10/100 switch in the base of the phone.
      lonniemcclure
  • Raid 5 and Raid 6

    I don't know what hardware and storage vendors you are using, but in 20+ years in IT, I have deployed thousands of storage / server solutions. I have had TWO, count them TWO failures of full Raid Arrays. That's it. I've used IBM, HP, and Dell servers at various points in my career. Very Early on in my career I was building servers with whitebox parts and still didn't have issues. I've used HP and EMC SANs and still never had this issue with Raid 5 or Raid 6. All of that, and two failures.
    So, I'm wondering what you are doing that would cause SO many Raid 5 failures, and so many rebuild failures. You have to be doing something wrong, or using the wrong parts (Consumer grade sata or god forbib..old IDE). I did a project deploying 700 Netware 4 servers (if that dates me) and in three years had only a handful of bad disks, and NEVER a raid failure.
    So Readers Take Heart. RAID 5, 6 etc is not as unreliable as this writer makes it out to be.

    Of course I agree with the message. BACKUP, BACKUP, and BACKUP. I always suggest at least one, if not two, onsite backups, and an offsite backup with a geographic redundant backups / storage provider.
    gregaaa4
    • Re: so many rebuild failures

      You have to understand that hard drives cannot have a zero error rate.

      Also, in practice, "consumer grade" versus "enterprise grade" makes no difference except in cost. Their failure rate is the same.
      ldo17
      • Really

        Tell me more about SAS drives being the same junk as SATA drives.
        dcdavy
        • why not.

          As you can plug a SAS drive into a SATA controller, and a SATA drive into a SAS controller...

          There is no manufacturing difference. The most there is happens on the formatter card attached to the physical drive.

          Now you can pay more for a longer warranty, it does make replacing drives a bit cheaper when they fail... but last time I looked the MTBF between the two was the same.
          jessepollard
          • almost

            SATA drives can be placed in a SAS enclosure/adaptor, but SAS drives cannot be placed in a SATA enclosure/adaptor. The data/power portions of the interface are linked on SAS. It might work if you took a Dremel to your adaptor, but... Why? SATA reliability has come a long way, but in my experience are not even close to the reliability of good SAS drives.

            I am not an expert, but I understand that SCSI system calls are different than SATA as well.
            kevin.t.kerrigan
          • curious

            Considering that SATA and SAS use the same physical layer, what exactly you think is different?

            The command set and SAS being bidirectional and also dual-port being an option etc, but ... the physical interface is identical.

            Nothing of this has any relation to reliability however.
            danbi
          • SAS and SATA

            They are very different technologies. SAS Enterprise grade hard drives are made with "Sterer" stuff. The components are not the same. the MTBF is NOT the same. So you've been misled. Yes, there are a number of Enterprise level servers which support a combination of SAS and SATA. Do not delude youself though. The performance and reliability of these aren't even close. Even "Enterprise" SATA drives from no where near the level of performance and reliability a SAS drive offers.
            I've seen too many small WhiteBox Vendors put Consumer Grade SATA drives, not even Enterprise SATA (Like Western Digital Raptors for example) into server. Those drives have a higher instance of hardware failure when placed under the typical load. Even then, I've still not seen the "Catastrophic" failure this author bemoans. I can't believe that in 20+ years I have been nothing but lucky. I don't think it's just possible.
            Spec it right, configure it right, maintain it right, back it up right, and you just will not the the Armageddon scenarios I'm reading about here. I am currently responsible for about 500+ servers in my current position and about 100 SANS (HP, EMC, and Scale). That number is growing monthly. I STILL do not see these kinds of failures. Some of the hardware I'm dealing in some environments with is as much as 10 years old.
            I just must be the luckiest guy in the world for data failures.
            gregaaa4
    • Re: RAID 5, 6 etc is not as unreliable

      Unfortunately, it is.

      I have been a believer like you --- trusting "expensive" "enterprise" class stuff to do the right thing. Until the day I had to spend three days and nights recovering a RAID array -- it would have required about as much to restore from backup and the backup was not complete -- but that was irrelevant, as the more serious problem was the system was unavailable.

      Since that day, no system I am responsible goes live without using (only) ZFS for storage and being fully redundant.

      To those who buy the "enterprise" crap story, it cannot be emphasized enough how absurdly insecure all these systems are, because they don't do the most obvious single thing at all: end-to-end data integrity check.
      danbi
  • Raid 1

    I always used RAID1. It might not have been as cost effective, but at least you had a full copy, and even if it had special drivers, there was usually the possibility to be able to use hard disk tools to recover it in another machine.

    Unfortunately I inherited customers with RAID5 and experienced the loss of everything when a second drive failed before the rebuild. The interruption to business, because it was an old machine, running old software, was such that by the time we rebuilt everything onto a newly purchased modern server, a company on the edge had gone past the point of no return. And it cost me a lot of time that was not paid for before the company went under.

    So if you have a small business with a RAID5 consider your prospects of being paid if it fails.
    tony85
    • Morale of the story

      Convince all your customers that moving off RAID5 *now* is a very good idea. Chances are today's single drive capacity will be way larger than their entire array, much cheaper and also much faster.

      RAID5 was one of the worst things to happen to data storage...
      danbi
  • Rotate Backup Media

    Relying on any raid configuration to protect data is foolish. Design all servers with hot swappable SSD's and rotate the backup SSD's daily. Test the data set often to verify data integrity.
    Shawn Richeson
    • Re: Rotate Backup Media

      True. RAID 1 is good because if your primary disk fails you can boot up off the second drive (automatically) and keep on trucking. Of course, the first thing you should do after such an event is backup all your data immediately, then replace the failed disk and see if the rebuild works. It always has for me (but only 2 such events in the past 10 years). As to enterprise drives, they are not only faster (10K rpm) but they have a longer MTBF than consumer class drives. But they still fail occasionally. I use 2 rotated USB 3.0 disks and a cloud backup daily.
      flboffin
      • Re: after such an event is backup all your data immediately

        You mean you weren't doing it BEFORE!?
        ldo17
  • ....how do you archive your personal and small business data?

    By using the cloud of course..

    is not that supposed to be the way of the future...
    ahanse
  • No Sympathy here

    We've been telling people for 2+ decades to backup their data. If they don't, I have no sympathy for them at all, none, nada, ziltch. This seems like a pretty wasted article though. Is anyone here not aware of what RAID is and isn't? If not, you don't belong in the IT field, that's for darn sure.
    ccs9623