Why RAID 5 stops working in 2009

By | July 18, 2007, 6:18am PDT

The storage version of Y2k? No, it’s a function of capacity growth and RAID 5’s limitations. If you are thinking about SATA RAID for home or business use, or using RAID today, you need to know why.

RAID 5 protects against a single disk failure. You can recover all your data if a single disk breaks. The problem: once a disk breaks, there is another increasingly common failure lurking. And in 2009 it is highly certain it will find you.

Disks fail
While disks are incredibly reliable devices, they do fail. Our best data - from CMU and Google - finds that over 3% of drives fail each year in the first three years of drive life, and then failure rates start rising fast.

With 7 brand new disks, you have ~20% chance of seeing a disk failure each year. Factor in the rising failure rate with age and over 4 years you are almost certain to see a disk failure during the life of those disks.

But you’re protected by RAID 5, right? Not in 2009.

Reads fail
SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14. Which means that once every 100,000,000,000,000 bits, the disk will very politely tell you that, so sorry, but I really, truly can’t read that sector back to you.

One hundred trillion bits is about 12 terabytes. Sound like a lot? Not in 2009.

Disk capacities double
Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we’ll have 2 TB drives.

With a 7 drive RAID 5 disk failure, you’ll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an URE.

So the read fails. And when that happens, you are one unhappy camper. The message “we can’t read this RAID volume” travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected - you thought! - data is gone. Oh, you didn’t back it up to tape? Bummer!

So now what?
The obvious answer, and the one that storage marketers have begun trumpeting, is RAID 6, which protects your data against 2 failures. Which is all well and good, until you consider this: as drives increase in size, any drive failure will always be accompanied by a read error. So RAID 6 will give you no more protection than RAID 5 does now, but you’ll pay more anyway for extra disk capacity and slower write performance.

Gee, paying more for less! I can hardly wait!

The Storage Bits take
Users of enterprise storage arrays have less to worry about: your tiny costly disks have less capacity and thus a smaller chance of encountering an URE. And your spec’d URE rate of 10^15 also helps.

There are some other fixes out there as well, some fairly obvious and some, I’m certain, waiting for someone much brighter than me to invent. But even today a 7 drive RAID 5 with 1 TB disks has a 50% chance of a rebuild failure. RAID 5 is reaching the end of its useful life.

Update: I’ve clearly tapped into a rich vein of RAID folklore. Just to be clear I’m talking about a failed drive (i.e. all sectors are gone) plus an URE on another sector during a rebuild. With 12 TB of capacity in the remaining RAID 5 stripe and an URE rate of 10^14, you are highly likely to encounter a URE. Almost certain, if the drive vendors are right.

As well-informed commenter Liam Newcombe notes:

The key point that seems to be missed in many of the comments is that when a disk fails in a RAID 5 array and it has to rebuild there is a significant chance of a non-recoverable read error during the rebuild (BER / UER). As there is no longer any redundancy the RAID array cannot rebuild, this is not dependent on whether you are running Windows or Linux, hardware or software RAID 5, it is simple mathematics. An honest RAID controller will log this and generally abort, allowing you to restore undamaged data from backup onto a fresh array.

Thus my comment about hoping you have a backup.

Mr. Newcombe, just as I was beginning to like him, then took me to task for stating that “RAID 6 will give you no more protection than RAID 5 does now”. What I had hoped to communicate is this: in a few years - if not 2009 then not long after - all SATA RAID failures will consist of a disk failure + URE.

RAID 6 will protect you against this quite nicely, just as RAID 5 protects against a single disk failure today. In the future, though, you will require RAID 6 to protect against single disk failures + the inevitable URE and so, effectively, RAID 6 in a few years will give you no more protection than RAID 5 does today. This isn’t RAID 6’s fault. Instead it is due to the increasing capacity of disks and their steady URE rate. RAID 5 won’t work at all, and, instead, RAID 6 will replace RAID 5.

Originally the developers of RAID suggested RAID 6 as a means of protecting against 2 disk failures. As we now know, a single disk failure means a second disk failure is much more likely - see the CMU pdf Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You? for details - or check out my synopsis in Everything You Know About Disks Is Wrong. RAID 5 protection is a little dodgy today due to this effect and RAID 6 - in a few years - won’t be able to help.

Finally, I recalculated the AFR for 7 drives using the 3.1% AFR from the CMU paper, using the formula suggested by a couple of readers - 1-96.9 ^# of disks - and got 19.8%. So I changed the ~23% number to ~20%.

Comments welcome, of course. And I got home despite a blow out on the Scottsdale’s 101N in 110 degree heat. I thought of it as a Bikram Tire Changing Asana.

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Robin Harris has been messing with computers for over 30 years and selling and marketing data storage for over 20 in companies large and small.

Disclosure

Robin Harris

Robin Harris is a president of TechnoQWAN, a consulting and analyst firm in northern Arizona. He also writes StorageMojo.com, a blog which accepts advertising from companies in the storage industry, and has a 25 year history with IT vendors. He has many industry contacts, many of whom are friends and all of whom he has opinions about. Robin has relationships with many companies in the technology industry. Every company he writes about may have sought to influence his opinion through carefully-crafted marketing messages and self-serving white papers, gifts ranging from desk calendars, t-shirts, lunches and trips as well as analyst or consulting assignments. He also invests in some technology companies. He may accept payment for services in stock as well. Robin discloses financial investments in or client relationships with companies named in Storage Bits. To help readers sort out the gold from the dross in his writings, Robin tries to communicate his reasons as clearly as he can. If you agree, you are intelligent and discerning. If you disagree, well, you disagree. In all cases, Robin encourages readers to subject everything they read, see or hear on the internet or from politicians to some simple questions: * What assumptions are implicit in the world view and judgments of the author? * What, if any, is the factual basis for the opinions the author expresses? * Is it reasonable, logical and clear? Your critical faculties: use ‘em or lose ‘em!

Biography

Robin Harris

Harris has been messing with computers for over 30 years and selling and marketing data storage for over 20 in companies large and small. He introduced a couple of multi-billion dollar storage products (DLT, the first Fibre Channel array) to market, as well as a many smaller ones. Earlier he spent 10 years marketing servers and networks. After leaving corporate life he founded TechnoQWAN, a consulting and analyst firm. He also developed StorageMojo into one of the top storage industry blogs.

Robin writes, consults, coaches and lives among the mountains of northern Arizona.

Talkback Most Recent of 175 Talkback(s)

  • Been doing this for 15 years
    and never have I seen this unless the drive is about ready to take a dump.

    I've been in the server arena for quite some time and our file servers (which as of late have approached 1-2TB) have never encountered any errors like this rebuilding the arrays UNLESS the drive is also dying.

    Again, much ado about nothing.
    ZDNet Gravatar
    ITGuy04
    18th Jul 2007
  • ZDNet Blogger

    You've never had a RAID 5 rebuild fail?
    I'm just curious.

    Robin
    ZDNet Gravatar
    R Harris
    19th Jul 2007
  • Umm... Not at this time and Hopefully Never
    Count my blessings, I have never had a RAID 5 setup fail on rebuild. I have had a mirror drop both disks because of quota on AIX, but RAID 5 has been the best of friends to me.
    ZDNet Gravatar
    nucrash
    19th Jul 2007
  • You've never had a RAID 5 rebuild fail
    Yes it happens ALL the time. The thing is just because your rebuild fails does not mean you loose your data. You can start another rebuild that may or may not work and you can copy your data off the array. No data loss at all.

    During the rebuild procedure the drive being rebuilt is populated with the stripes that it needs to complete the raid set. Not much really changes in the other drives until the operation is almost complete.

    Even if another drive pops out of the array during the rebuild operation (which can happen and is common) you can force that drive back online with the raid controller software and be exactly back where you were previously.

    One thing you DON"T want to do is "guess" on the drive with the stalest metadata and leave it plugged into a backplane while forcing another drive online. This will can cause the controller to start a rebuild on the other failed drive overwriting perfectly valid data and thus destroying your striping.

    If your really paranoid pull a raid log when you first create your array or if you make any changes to it. In a worst case scenerio even if metadata is corrupted on the drives to the point that the controller cant read it you can manually put the stripe order into the controller and it will rewrite the configuation.

    The MOST IMPORTANT thing to do is to keep your controller and drive firmware (yes hard drives have firmware) at the latest level and to periodically review your raid log for errors. If you catch issues early they are MUCH easier to deal with than multiple amber lights at 3am with a call into the support center of your hardware vendor.
    ZDNet Gravatar
    RARE_AT_BEST
    21st Oct 2008
  • raid 5 failed rebuild
    i too have experienced failed raid 5 rebuilds back in the day when the hardware controller was a CMD5000 that was controlling 23gb full height seagate drives in a 14 drive scsi diff configuration... yeah, failed rebuilds would occurred, and sometimes the failure would know out the entire raid, eventually forcing you to reset the whole system and restart the rebuild process over again. but these drives would fail when too many bad blocks would be accumulated in the bad black sector database, which could be cleared with a low level format.

    i think the author is sensationalizing the drive failure rate, and not being clear enough that enterprise level drives do not have the same types of failures as consumer grade equipment.

    take into consideration that most enterprise level raids and sans are utilizing some sort of scsi derived standard, i.e. sas, FC, ultra, iscsi, etc. scsi by it's very nature(or actually by it's design) is very conservative. if a drive starts to exhibit errors/problems, the drive will often "prefail" before actual failure. this provides a chance to recover data(i.e. rebuild raid). and ide/ata drive on the other hand simply fails, and typically failures are not as easily recoverable. there's a reason why scsi drives are more expensive than sata drives, this is one of the reasons.

    additionally, the consumer versus professional markets have completely different goals: consumer markets are about biggest bang for the buck, i.e. 1.5 TB drives for under $200. professional markets are about reliability above all other matters, and it's not unusual for a 320GB SAS drive to cost $800+...

    2TB drives? sure their around the corner. but don't let the size intimidate you... regardless of interface technology(sata or scsi) the proper procedure for raid maintenance is to 1) always have a cold spare and 2) always replace failed drives with a new drive, and return the old drive for wrrenty maintenance.
    ZDNet Gravatar
    capsteve
    21st Oct 2008
  • RE: Why RAID 5 stops working in 2009
    @R Harris I am curious too, RAID 5 had been an headache for me, I gave alot of my time try to find a resolution for this, but it has always been a failure.
    Dissertation Writing | Admission Essay Writing | Essay Writing
    ZDNet Gravatar
    lorisinclair
    4th Sep
  • Your storage system is not large enough
    The reason you do not see this problem is that you do not have enough disk drives to be statistically significant. 1-2TB is a couple of disk drives.

    In a typical enterprise data center, there are hundreds to thousands of disk drives. All of the data centers I have worked with that have a decent number of disk drives do, in fact, see this problem. Hence, this is much ado about something.

    I do think that the problem is exacerbated by the behavior of the RAID controller when it encounters a failure on a disk drive. When a RAID controller is running along and gets an uncorrectable read error on a single disk drive in a RAID set, many times it will simply shut that drive down and begin a rebuild operation on the hot spare. Now, enter the problem of the probability of a second read failure on one of the remaining drives in the RAID set. That second failure will cause the RAID controller to quite possibly give up.

    IMHO this is far too aggressive. Some of the newer, more intelligent RAID controllers will take the first offending drive offline but not disable it entirely. Instead, the drive is examined for the root cause of the problem and either repaired and put back into service, or it is used in conjunction with the other remaining drives to perform a more robust rebuild operation. This assumes, of course, that the drive is accessible. If the drive is dead then you are back to the problem of a data error on a second drive causing problems in the rebuild. Even so, I think that the rebuild should complete as it would normally and report enough information back to the host through sense data that the data management people can determine the extent of the problem in terms of which files and/or metadata is affected and so on.

    This stuff is not easy and I agree that the higher capacity drives are increasing the exposure to data errors. I think that we need to be engineering data storage systems that assume data errors are a normal event rather than an anomaly and deal with them more appropriately than we have been.
    ZDNet Gravatar
    storagelunatic
    22nd Oct 2008
  • RE: Why RAID 5 stops working in 2009
    @storagelunatic This can be the reason too.
    Research Paper Writing | Coursework Writing
    ZDNet Gravatar
    lorisinclair
    4th Sep
  • Not only large drives!
    For smaller drives, there is lower probability of a URE during rebuild, but the probability is not zero. It's very embarassing to explain to a client that his "infallible" RAID system has indeed collapsed.

    It's absurd to have massive amounts of stored data dependent on a 100% recovery. The only way to avoid increasing failures of this type as storage needs expand is to design storage architectures so that a few lost bits do not translate into global failures.

    BTW: from the ratings of the original article, it appears to me that IT-ers have quite a case of denial. Thanks for rocking the boat!
    ZDNet Gravatar
    w_c_mead
    13th Apr 2009
  • Your whole analysis is based on a faulty assumption.
    Just because you have an unrecoverable read error does not mean your RAID array is corrupted. This will most probably result in a corrupted file. In most cases the offending sector will be added to the bad sector list maintained by the drive and taken out of use. Also just because drives double in size every year doesn't mean your data does. The only relevent size is that of the data. An error on an unused portion of the drive isn't a problem.
    ZDNet Gravatar
    ShadeTree
    18th Jul 2007
  • Well you know the saying
    "Data will always expand to the capacity of your drive space."

    I know it happens to me all the time. I buy a drive and it gets full so I buy one 3 times bigger and it seems to get full almost instantly. How's that? I think the reason for me is when I have lots of space I delete less so it quickly fills up.

    In the corporate world the same is true but it's an even bigger problem. When drive space is short systems get purged. If you know you have tons of room people decide not to purge. I know people who want to keep their trash permanently and get really upset when the system deletes the contents of the trash. I have no idea why people keep important documents in the trash.
    ZDNet Gravatar
    voska
    18th Jul 2007
  • An unrecoverable read error on rebuild...
    can destroy the whole RAID in RAID5.
    ZDNet Gravatar
    bjbrock
    18th Jul 2007
  • And if there is a fire in your PC all your data can be destroyed.
    The odds as the author stated are about one in a trillion.
    ZDNet Gravatar
    ShadeTree
    19th Jul 2007
  • There is an old saying
    Which in the last 60 years I have found to be so true" If you can imagine it happening sooner or later it will"
    Mike Hereid Sr
    ZDNet Gravatar
    Michael L Hereid Sr
    19th Jul 2007
  • data storage does increase at the rate of Moore's law
    Drives double in size, and so does our usage of them. Why is that? I guess Parkinson's law can be extrapolated to say the data expands to fit the space allotted.
    ZDNet Gravatar
    noglider@...
    19th Jul 2007

Talkback - Tell Us What You Think

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources