Has RAID5 stopped working?

Has RAID5 stopped working?

Summary: Years ago I wrote a piece called "RAID5 stops working in 2009." It's now 2013. Was I right?

TOPICS: Storage, Hardware

To recap the earlier post, the issue is the unrecoverable read error (URE) rate of SATA drives used in consumer storage arrays. With an unrecoverable read error rate of 10^-14, you could expect a failed block read once every 12.5 TB or so.

If you had a 8 drive array with 2 TB drives with one failure your chance of having a unrecoverable read error would be near 100%. That second unreadable block during a RAID5 recovery is enough to destroy the RAID group and wipe out all the data on it. Not good!

Even with a four drive RAID5 - and 2TB drives - you would have around a 40% chance of a rebuild failure. Better, but not good enough.

The combination of the increasing capacity of SATA drives, the constant unrecoverable read error rate and the number of drives in the RAID stripe that led to the prediction that RAID5 would no longer be viable in 2009.


A couple of years ago I started seeing consumer drives spec'd at 10^-15, a rational response to the RAID5 problem. With a tenth of the URE rate consumer RAID5 arrays would be fine.

But reviewing current 3.5" SATA drive specs from HGST, Seagate and WD and guess what? They are all back to 10^-14.

Which means that consumer RAID5 arrays can't be trusted to store your data reliably. Quickly, yes. In large chunks, yes. More simply than individual USB drives, yes.

But not more reliably than a single drive.

And yet

Yet RAID is not only about availability. Its other advantages are important and, for most, possibly more important.

  • Performance. Striping data across multiple drives can dramatically increase bandwidth for large file apps like video editing. 
  • Capacity. Putting 4-12 drives in a RAID gives a large virtual disk that is much larger than any single drive.
  • Management. After the often painful setup process - and until something breaks - RAID arrays are simpler to manage than individual disks.

Storage Bits take

It seems that people use small RAID5 arrays more for convenience rather that data availability. Either that or they really don't understand how vulnerable their data is - as one business found out recently - and prefer the bliss of the RAID5 delusion. 

Many are still using small RAID5 arrays with 10^-14 error rates - me too! - and RAID5 seems to work fine. But adjustments should be made to account for the unchanged error rates.

  • Always maintain a minimum of 2 copies any data stored on a RAID - 1 on the RAID and 1 elsewhere.
  • Where there is a drive failure pull any unbacked up data - latest documents that aren't backed up - off the RAID before replacing the failed drive.

Since RAID arrays are more complex than individual drives, they are more likely to fail. But until they do they are more convenient, faster and larger than any single drive.

Comments welcome, of course. Few consumers use RAID arrays. If you do, do you think more people should?

Topics: Storage, Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Use A Next-Generation Filesystem

    Something like BTRFS, ZFS or HAMMER will give you all those advantages of performance, capacity and management, with greater scalability than RAID can manage.
    • RAID can manage infinite scalability

      RAID 10, that is.
      • Re: RAID can manage infinite scalability

        If it could, this article wouldn't have a point.
  • There is only one level of RAID that should be used

    I think that RAID 10 (aka 0+1) should be the only one used.
    • I think RAID 10 makes more sense

      for consumers than other levels of RAID. I run 4 SSD drives in RAID 10 and the speed is very nice.
  • SSDs killed RAID5/0

    The main reason for RAID5/0 is performance. With SSDs continually decreasing in price and able to max out SATA6gb links, the performance aspect is dead as even RAID5/0 cannot match them.
  • RAID-5 is terrible

    Yeah, I used to use RAID-5. It bit me a couple of times. Never again. Everything I do now is RAID-1 (mirrored). It's simpler, faster and safer (RAID-5 might be faster for reads but it can be horrendously slow for writes, having to calculate parity for everything).

    I expect that by the time I need storage bigger than 4TB, there will be single drives available in bigger sizes that I can mirror, so I don't need that aspect of RAID-5.

    One more advantage of RAID-1 - In a worst-case scenario, I can simply take one of those drives and plug it to any machine with a SATA port and get data off of it. No need trying to rebuild RAID-5 metadata and recover the array first.
  • Of course it has

    Enterprise storage arrays like ZFS or Windows Storage Spaces now are easily reachable. Windows 8 Pro even includes Storage Spaces. Pop in as many drives as you like. When one fails, throw in a spare and RMA the other. The days of relying on mass manufacturing techniques to make 'never fail' storage has long gone.
    • unless

      You require performance. Storage spaces is fine, however for performance RAID 10 is king.
  • . . . . my RAID

    My RAID never-ever-ever-ever failed me.
    It's ALWAYS killed ants and flys, and wasps, and . . . .
  • Mr.Harris' Apocalypsis

    Mr.Harris's Recurring RAID5 Doomsday Nightmare is hinging on RAID brains being as dumb as they were 10 years ago, and users always being dumb and not doing backups . Why, isn't it time to move on and look into RAID5 engines that handle UREs more intelligently? (Hint: MSS, md, ZFS, probably more)
    Alex Gerulaitis
    • md won't help

      md is a software implementation of pure old stupid hardware raid... one horrible piece of software - at least from a prosumer view-point and compared to zfs on linux.
  • RAID5 is dead. Long live RAID5.

    Look, RAID5 will live on for years. Yes, yes, I know that RAID10 is better. It is better if there is an unlimited budget and physical drive slots are unlimited as well.

    RAID1/0 is more expensive. Also, if there is a need for a large logical drive that cannot be accomodated with RAID10. (For example, I need 20TB but I only have 6 slots) With a limited number of drive slots, some folks may be forced to into RAID5 to get the amount of storage that they need. RAID6 may be a better alternative or a RAID5 array with a hot spare, but that still doesn't alleviate the issue. Individual drives won't replace RAID 5 since a failure of that drive is catastrophic. (Except for a Hadoop infrastructure where software handles the loss of a drive)

    We can't talk about SSDs yet as even the cheapest MLC enterprise class drives are still far more expensive than spindled hard drives. So here we are.

    So right now, RAID5 still fills a niche that can't be completely addressed by other alternatives. (Inexpensive, slot limited applications) I don't like it, but it is a necessary evil.
  • Ignorance is bliss

    Rebuild times on some of these 3TB based RAID5 arrays, even with a decent controller full scans of each disk even in parallel takes forever, too long when it comes to availability. Another reason to avoid this configuration.

    Raid 10 has been a darling of mine for years now.

    Also @Salonikios I not sure RAID5 really fills the niche since the risks of failure make the design somewhat as safe as JBOD which will give you the space on n-1 disks, More cost effective. Though may as well RAID5 and gamble, its a good bet though sectors usually a good, likely to win most of the time.
  • Yes, RAID5 stopped working long ago

    I spend time on Linux raid related irc channels and on the linux-raid mailing list. We very frequently get people with RAID5 that have a failed drive and then have one or several UREs on the remaining drives. You were 100% right that RAID5 stopped working 5 years ago. I always recommend people to go for RAID6 for this reason.
  • RAID-5 does not need to die when encountering UREs

    There are more than enough RAID controllers these days which can ignore UREs on RAID-5 rebuilds. The one I'm using right now (running RAID-6 though), an older 3ware 9650SE-8LPML can do it already, option's called "Overwrite ECC". Sure it ain't perfect as it means sacrificing at least one HDD sector, but corrupting one or a few files is surely better than losing the whole array.

    RAID-5/6 controllers aren't that stupid anymore, at least not if you configure them properly. So it didn't stop working entirely. It's just not as safe regarding data integrity - and fault tolerance of course.

    If your data is SO critical that you can't afford to ignore a few UREs at rebuild time, then you better run RAID-6 with one or multiple backups of that data anyway!
    • use zfs...

      with zfs there is even a software raid solution with this capability... i could never understand why md raid still misses that capability... zfs as an added bonus, can tell you exactly which file (or part of that file) is unrecoverable, so you can restore from a backup. oh and it can effortlessly & safely recover from temporarily dropped disks...

      in a nutshell: if using RAID6 (raidz2) zfs is immune to URE's. (unless 2 of the remaining disks have an error in the same block. which is beyond unlikely for the next century. a probability of 10^-28 (once for every ~144 Yottabytes read (1YB = 1000 Billion TB :))))