Why RAID 6 stops working in 2019

Why RAID 6 stops working in 2019

Summary: Three years ago I warned that RAID 5 would stop working in 2009. Sure enough, no enterprise storage vendor now recommends RAID 5. Now it's RAID 6, which protects against 2 drive failures. But in 2019 even RAID 6 won't protect your data. Here's why.

SHARE:
TOPICS: Storage, Hardware
61

Three years ago I warned that RAID 5 would stop working in 2009. Sure enough, no enterprise storage vendor now recommends RAID 5.

They now recommend RAID 6, which protects against two drive failures. But in 2019 even RAID 6 won't protect your data. Here's why.

The power of power functions I said that even RAID 6 would have a limited lifetime.

. . . RAID 6 in a few years will give you no more protection than RAID 5 does today. This isn’t RAID 6’s fault. Instead it is due to the increasing capacity of disks and their steady URE rate.

Late last year Sun engineer, DTrace co-inventor, flash architect and ZFS developer Adam Leventhal, did the heavy lifting to analyze the expected life of RAID 6 as a viable data protection strategy. He lays it out in the Association of Computing Machinery's Queue magazine, in the article Triple-Parity RAID and Beyond, which I draw from for much of this post.

The good news: Mr. Leventhal found that RAID 6 protection levels will be as good as RAID 5 was until 2019.

The bad news: Mr. Leventhal assumed that drives are more reliable than they really are. The lead time may be shorter unless drive vendors get their game on. More good news: one of them already has - and I'll tell you who that is.

The crux of the problem RAID arrays are groups of disks with special logic in the controller that stores the data with extra bits so the loss of 1 or 2 disks won't destroy the information (I'm speaking of RAID levels 5 and 6, not 0, 1 or 10). The extra bits - parity - enable the lost data to be reconstructed by reading all the data off the remaining disks and writing to a replacement disk.

The problem with RAID 5 is that disk drives have read errors. SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14. Which means that once every 200,000,000 sectors, the disk will not be able to read a sector.

2 hundred million sectors is about 12 terabytes. When a drive fails in a 7 drive, 2 TB SATA disk RAID 5, you’ll have 6 remaining 2 TB drives. As the RAID controller is reconstructing the data it is very likely it will see an URE. At that point the RAID reconstruction stops.

Here's the math: (1 - 1 /(2.4 x 10^10)) ^ (2.3 x 10^10) = 0.3835

You have a 62% chance of data loss due to an uncorrectable read error on a 7 drive RAID with one failed disk, assuming a 10^14 read error rate and ~23 billion sectors in 12 TB. Feeling lucky?

RAID 6 RAID 6 tackles this problem by creating enough parity data to handle 2 failures. You can lose a disk and have a URE and still reconstruct your data.

Some complain about the increased overhead of 2 parity disks. But doubling the size of RAID 5 stripe gives you dual disk protection with the same capacity. Instead of a 7 drive RAID 5 stripe with 1 parity disk, build a 14 drive stripe with 2 parity disks: no more capacity for parity and protection against 2 failures.

Digital nirvana, eh? Not so fast, my friend.

Grit in the gears Mr. Leventhal points out is that a confluence of factors are leading to a time when even dual parity will not suffice to protect enterprise data.

Consider:

  • Long rebuild times. As disk capacity grows, so do rebuild times. 7200 RPM full drive writes average about 115 MB/sec - they slow down as they fill up - which means about 5 hours minimum to rebuild a failed drive. But most arrays can't afford the overhead of a top speed rebuild, so rebuild times are usually 2-5x that.
  • More latent errors. Enterprise arrays employ background disk-scrubbing to find and correct disk errors before they bite. But as disk capapcities increase scrubbing takes longer. In a large array a disk might go for months between scrubs, meaning more errors on rebuild.
  • Disk failure correlation. RAID proponents assumed that disk failures are independent events, but long experience has shown this is not the case: 1 drive failure means another is much more likely.

Simplifying: bigger drives = longer rebuilds + more latent errors -> greater chance of RAID 6 failure.

Mr. Leventhal graphs the outcome:

Courtesy of the ACM

Courtesy of the ACM

By 2019 RAID 6 will be no more reliable than RAID 5 is today.

The Storage Bits take For enterprise users this conclusion is a Big Deal. While triple parity will solve the protection problem, there are significant trade-offs.

21 drive stripes? Week long rebuilds that mean arrays are always operating in a degraded rebuild mode? Wholesale move to 2.5" drives? Functional obsolescence of billions of dollars worth of current arrays?

Home users can relax though. Home RAID is a bad idea: you are much better off with frequent disk-to-disk backups and an online backup like CrashPlan or Backblaze.

What is scarier is that Mr. Leventhal assumes disk drive error rates of 1 in 10^16. That is true of the small, fast and costly enterprise drives, but most SATA drives are 2 orders of magnitude less: 1 in 10^14.

With one exception: Western Digital's Caviar Green, model WD20EADS, is spec'd at 10^15, unlike Seagate's 2 TB ST32000542AS or Hitachi's Deskstar 7K2000 (pdf).

Comments welcome, of course. Oddly enough I haven't done any work for WD, Seagate or Hitachi, although WD's indefatigable Heather Skinner is a pleasure to work with. I did work at Sun years ago and admire what they've been doing with ZFS, flash, DTrace and more.

Topics: Storage, Hardware

About

Robin Harris has been a computer buff for over 35 years and selling and marketing data storage for over 30 years in companies large and small.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

61 comments
Log in or register to join the discussion
  • DIY

    I've been happy with basic INTEL motherboard RAID 1 for home use ... given that my precious files are also backed up to the cloud.

    Work in progress is on RAID 1 and I have an AKASA Duodock into which I can plug naked SATA disks. When the RAID array reports 'degraded' I simply copy the remaining operational drive to a cold spare. 'Tested' this out for real over the two hard disk failures I've had in the last 6 months.

    However, many OEM PC's only support two drives so I'm thinking of repurposing two old machines in larger cases (6/8 drives) for virtualised storage. Interested to hear your comments on:

    http://talkback.zdnet.com/5208-12695-0.html?forumID=1&threadID=75503&messageID=1468605&tag=content;col1

    from your colleague Kusnetzky's thread.

    That way bulk storage can be secured whilst (some) WIP can be on faster SSD's.
    jacksonjohn
  • RAID with regular backups is the most secure option ...

    I am using HARDWARE RAID with SCHEDULED backups on my home system. The regular backups actually make the RAID itself more secure. They do that by forcing all critical areas of the drives to be read continually, thus eliminating sleeper problems where an little used area of disk deteriorates and you only find it on a rebuild. When you do have a disk drive problem, you want to find it as quickly as possible. If you are relying on RAID alone, you run the risk of sleeper problems UNLESS you are using a scheduled RAID utility to regularly bad block the drives. Backups, of course, offer an additional value over that solution since they provide ... a backup of your data as well.

    But having said that, ZFS type filesystems offer the ultimate solution in that they can detect and correct problems quickly without needing long rebuild times. However, there again, if your drives aren't being read on a regular basis as in regular complete backups, even ZFS type systems will risk sleeper problems where one or more clusters deteriorate and then the other drive fails leaving you sunk. I see a high level of risk with "lazy" disk drives. As with the human body, a little exercise pays huge dividends in the long run.
    George Mitchell
    • RE: Why RAID 6 stops working in 2019

      Dreadfully alert <a href="http://flvto.com/">youtube converter free</a>, attempt before parallel en route for <a href="http://flvto.com/">youtube to mp3 converter free</a> afterwards to <a href="http://flvto.com/">youtube to mp3</a>|<a href="http://flvto.com/">youtube mp3</a>
      convert youtube to mp3
  • RE: Why RAID 6 stops working in 2019

    Running Linux at home, I just do a backup of our 'home' directories once a month. I've been through 2 HDD failures since I started this and after a new drive and Linux install, I just copy the 'home' directory back in and the machine is back up and running.

    Never did feel that RAID was necessary for home use.
    1djk1
  • RE: Why RAID 6 stops working in 2019

    Another reason to look at Xiotech.
    sdhill@...
  • RE: Why RAID 6 stops working in 2019

    In my experience file storage makes more sense for most
    home users. Why? Because they understand files and file
    interfaces are simpler. Block devices don't compute for
    many people and iSCSI drivers + setup are usually non-
    trivial.

    Whether you go block or file though the key to personal
    data preservation is to maintain multiple copies of your
    data. Any single device can and will fail. You can't let it
    take your only copy when it does.

    Robin
    R Harris
  • Drobo?

    Any thoughts on Drobo's "BeyondRAID" technology and how it might compare to traditional RAID5 or RAID6?
    bmgoodman
    • I bought a Drobo S and am trying it out.

      Better than other home RAID systems I've looked at. But I'm still
      considering how well it works for civilians.

      Expect a report in the next couple of months.

      Robin
      R Harris
    • My comments on using Drobo

      One thing up-front: You can't use Drobo for your boot drive so it's not a solution for a primary drive. On the other hand, a primary drive can use RAID. (Not making a comment as to the viability of using RAID on a primary -- just the facts here.)

      So the question is, how will you use the Drobo?

      There are a few ways:

      1. Direct-connect to PC via USB or eSATA.

      2. Network-connected via ethernet with DroboShare device.

      3. iSCSI over network. (I don't think all Drobo units have this.)

      So for the home user it comes down to direct-connect (one PC/Mac) or ethernet (all PCs/Macs on network).

      The Drobo itself is wonderfully simple to use, and is very well-constructed. It has a nice magnetic cover for accessing the drives and big bright lights to clearly point out good and bad status.

      It is simple to add or delete drives. Just plug 'em in.

      The Drobo software for managing the device is OK, but not as good as the device it is managing. My Drobo has the DroboShare device, and it requires me to keep tabs on three different types of updates: the Drobo firmware, the DroboShare firmware, and the Drobo management software. When I first plugged everything in, it required me to check for updates a number of times before everything was up to date.

      You have a bit of a learning curve with regard to keeping an eye on disc usage. When you setup the Drobo, you give it a target size for your array, not the actual amount of storage you have today. So you might only have 2TB in actual disc storage, but you can create an 8TB array. It's mostly a good thing, once you get the hang of it.

      The biggest drawback I have seen, as someone who attaches the Drobo via ethernet (DroboShare) is that it is not the speediest storage in the world. Sometimes it can be very slow. (NOT due to running out of storage space. It has a feature that automatically slows down writes as it gets close to filling up, but I am only about one-quarter full in terms of actual on-hand disc space.)

      Many times when I try accessing the Drobo there is an initial pause, and then it reacts with typical performance. It might have something to do with the way it monitors the network, I'm not sure. But the pause is a little too long for my likes.

      My "rough estimate" of data transfer speed is that it takes maybe twice as long to copy a bunch of files to the Drobo than to copy the same files to a Windows Server file share. (I'm sure Robin can do much more accurate testing than my "gut feel" described here.)

      On the plus side, the plug-and-play aspect to network storage was great with the Drobo. It worked the first time, with absolutely no tweaking on my part. (Aside from learning how to use the sometimes-confusing Drobo management software.)

      The DroboShare unit has a bit of a bonus for techies: You can use it to run small applications. There are a handful of apps on the Drobo site, and some are mildly useful. I played around with the feature for about a day, and came to the conclusion that it is a cool idea, but because it would require me to learn a new technology ecosystem, and the scope of such is limited to just Drobo users, it makes more sense to build the apps to sit on some other platform.

      That said, it does provide an opportunity for industrious people to build apps that can turn the Drobo into a functional stand-alone server that can handle more than mere file sharing.

      I like the Drobo for its ease-of use, peace-of-mind, and at-a-glance view of its status. It's great for storing shared files, bulk storage, backups, and other things that are not ultra performance-sensitive.

      However, for those looking for a "data drive" on their PC that will be accessed frequently, I'm not so sure. Since my Drobo is set up as network storage I have not tested extensively in a direct-connect scenario. My guess is that there are faster direct-storage units out there, but Robin could test that in his upcoming review.
      Speednet
      • Bozoshare?

        What in the world?! A 500$ empty shell?! You actually have to buy the drives separately? There's a whole litter of suckers born every second! I mean I could build a couple of fairly snappy domain servers and simply use them as backup servers for that kind of $$! Throw in the added benefit of clustering my entire network twice over and these overpriced empty boxes look even more silly. I suppose its an ok solution for someone who doesn't know much about computers and networks, but I wouldn't want to insult myself by implementing such an overpriced gadget, it just doesn't seem to be cost effective to me.
        Johnny4.0
  • And DataCenter storage designers go big

    What is really bad is that all the people who design the SAN systems always want to consolidate down to a single large 20 or 30 TB system in order to save costs - and never plan for the week long rebuilds!
    RAID's original meaning is still true. It would be better to have 15 SANs of 2 TB each than a single 30 TB one even with the cost of extra hardware - you now have no single point of failure and alternate plans and methods of recovery if one goes down. 5000 people unable to work for one week is $5 Million - vs cost of extra hardware - which really will cost more?
    TAPhilo
    • You're forgetting something about SANs

      The disk groups tend to be 20 - 30 TB but they are typically sub grouped in sets of 6 - 11 disks. These sub groups are RAID 5 and then sets of these sets are then organized in a RAID 5 so you get a mesh of disks. You can tolerate several disk failures as you don't get two in the same sub group of disks. If properly laid out you can actually tolerate the failure of an entire shelf as long as no two disks in the shelf are in the same sub grouping. Add in the protection of RAID 6 and you are quite well protected.

      Additionally SAN designers are NOT USING SATA except for temporary storage or dedupe backups. They are typically using 300/600 GB Fiber Channel disks in the big SANS. The spindle speed is typically 10,000 or 15,000 RPM for fast rebuilds, fast scrubbing, and fast access.

      I assume, Robin, that you are targeting consumer RAID and very small business RAID. Even small business is typically purchasing 10K and 15K enterprise class SAS disks that are on the order of 146 - 300 GB with UREs every 10^15.

      The situation is very different in the enterprise where people are typically not storing huge iTunes and movie libraries.
      Freddy McGriff
      • Learned somethign new -- again!

        Thanks for additonal info - our SAN people don't explain anything on how / why they do what they do.
        TAPhilo
  • ZFS --> Mainstream in future

    I often wonder why the industry is waiting to move forward with ZFS?
    Licensing issues? I was hoping Apple would license and move forward
    with ZFS. Now its not on the radar.

    Perhaps things will change and ZFS will again gain some support for
    future use in mainstream OS file systems.
    lundp@...
  • What about MTBF

    Great article. But I believe you are missing one point. In the SATA drive real world arena I question if the Mean Time Between Failures (MTBF) is as high as the manufacturers are saying. A search of any site that allows customer feedback (Bestbuy, Egghead, Tiger Direct, etc.) and you will see far too many complaints of drives arriving DOA or failing within the first year.
    Keeping Current
    • Yes, the MTBF but a different point about it

      Robin complains that the MTBF is too reliable e.g. all disks will start failing around the same time.

      I guess this could be circumvented by not using one single batch of one single kind of disk of one specific manufacturer.

      I've seen a bulk of disk fail around the same time.

      But to come back to the point you are making. 2019 is a bit too optimistic, i concur ;)
      TedKraan
      • You are ignoring sympathetic failures then

        A sympathetic failure in a disk array happens when a drive fails, say because the spindle bearing is wearing out and overheats (or the actuator is being overworked and overheats), and causes the drive next to it to fail as a result. Using drives from different vendors may mitigate this a little if the spindles are offset, but on the other hand it may also make them more vulnerable if some sensitive component on the second drive is near the heat source on the first.
        914four
        • Good point

          In short, compact isn't a good thing in the server room :)
          TedKraan
    • I use MTBF to compare...

      drives. There is no way the average drive will last as long as the MTBF rating. But drives with higher MTBF are probably going to last longer - relatively speaking. Most SATA drives come in around 1.2M while a high quality, enterprise SAS drive may come in as high as 2M. I wouldn't expect either to last that long but I would expect the 2M to be considerably more reliable.
      bjbrock@...
      • MTBFs mean very little...

        ...unless you compare duty cycles as well. Your Fibre Channel or Enterprise SAS drive may have an MTBF of 1M hours @ 85% duty cycle, were the SATA drive may have an MTBF of 1.2M hours @ 20% duty cycle. Which is more reliable? Well, it really depends on what you are doing to your drives. Push your SATA drive to 85% utilization and you'll see MTBF drop significantly. The trick to getting longer life from your drives is to use them for their intended purpose; if you are using SATA in an enterprise environment make sure they aren't being overworked.
        914four