Sorry about your broken RAID 5

Sorry about your broken RAID 5

Summary: You didn't know?I'm sorry I was the one to tell you that RAID 5 is broken today and will be well and truly broken in 2009 (see Why RAID 5 stops working in 2009), but somebody had to do it.

SHARE:
TOPICS: Hardware
24

You didn't know? I'm sorry I was the one to tell you that RAID 5 is broken today and will be well and truly broken in 2009 (see Why RAID 5 stops working in 2009), but somebody had to do it. The good news is that the industry is ahead of you developing solutions.

I found the negative response to my last post on the unrecoverable read error (URE) issue fascinating. A number of informed people commented, correcting my math - I took 2 statistics courses in grad school, but that was a long time ago - and taking issue with some of my arguments. All good.

What was interesting to me was that my post didn't say anything that people in the industry haven't known for years. For example, this Intel white paper published last year:

Intelligent RAID 6 Theory Overview And Implementation

RAID 5 systems are commonly deployed for data protection in most business environments. However, RAID 5 systems only tolerate a single drive failure, and the probability of encountering latent defects [i.e. UREs, among other problems] of drives approaches 100 percent as disk capacity and array width increase.

Every engineer in the RAID business knows this. So a) why don't technically-oriented ZDnet readers and b) why the emotional response to a statistical argument grounded on drive vendor's own specs?

Misplaced faith in RAID Beyond the issues with my communication skills I saw several themes:

  • My RAID works great (and therefore always will?)
  • Sensationalism, hype and I don't believe you. La-la-la-la-la!
  • Power factors always surprise people.

It reminded me of a comment from a SOHO/SMB RAID designer a few months back:

I was a big proponent of RAID until I found that our customers were placing so much faith in RAID that they were putting all their data on the NAS and then _deleting_ it from ALL other locations. In many cases, they had no off-site storage strategy for their data.

Array vendors take this seriously Regular readers know I'm not a fan of the array vendors. I'm critical of an architecture where the raw disk capacity comprises only 10% of the cost of a "solution." I believe there are better ways to protect data economically.

Yet industry engineers do take data availability and integrity very seriously. They see most problems well before customers because they are working with the largest population of equipment.

That's why almost every vendor offers some version of RAID 6 to protect against double errors. Even with enterprise disks whose smaller capacity and 10^15 error rate make data loss from a disk failure + URE much less likely (10^15 is 1 URE every 125 TB). RAID 6 is often recommended because in mission-critical environments even a 1% chance of an array read error after a disk failure is often too great.

The industry isn't stopping there Some other initiatives include

  • 4K sectors - Drive vendors have been lobbying OS vendors for years to raise the block size from 512 bytes to 4KB, which enables more robust ECC without a big capacity hit. Word is that Microsoft is might actually, maybe, do it. Next time you see Ballmer, ask him about it. Why wait for Apple to do it first?
  • Many arrays do background sector scrubbing, looking for sectors with currently recoverable read errors and either rewriting and/or removing them before they cause a problem.
  • NAS boxes that virtualize disks as a pool of blocks can combine their file system knowledge to enable data redundancy on a per-file basis for greater availability. A URE on an unused block isn't a problem since the NAS file system knows what blocks are in use and which aren't.
  • Advanced file systems like ZFS, which combine file system and volume management functionality, can combine their parity data with parent-block checksums to perform ". . . combinational reconstruction of a RAID set." (Thanks, Joerg!)

That list just scratches the surface of all the work the industry is doing to ensure data availability and integrity as disk drives continue their capacity growth. RAID 5 is reaching its end of life, but your data can still be safe despite that.

Comments welcome, as always. Industry folks, what else is happening to manage this issue>

Topic: Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

24 comments
Log in or register to join the discussion
  • Still trying to understand your argument

    Granting that a URE rate of 10^-15 [1] means that the incidence of two UREs per array approaches unity once the storage system exceeds 10^30 sectors (10^39 bytes, which is a rather large array), what isn't clear is why this is a problem.

    A RAID5 system doesn't fail when there are two UREs unless they both occur in the same data set (effectively the same sector on two drives.) I'm probably misunderstanding your argument, since you seem to be assuming that two bad sectors anywhere in the array will take the whole array down without possibility of recovery.

    Now, [b]drive[/b] failures are something else again. Lose two drives per array and you are in deep, deep weeds. That happens a lot more than most think; it happened to me Sunday morning when two drives quit within a couple of hours. Not amusing. Fortunately the backups and panic snapshot from the surviving array (minus redundancy) were sufficient to prevent data loss.

    Anyway, I'm afraid you're not getting your thoughts across very clearly.

    [1] Not 10^15, as you wrote -- you might want to fix the typo.
    Yagotta B. Kidding
    • Problem = Disk failure + URE

      Mr. Kidding,
      <br>
      <br>
      Two URE's are NOT the problem. As you note, that is terribly unlikely.
      <br>
      <br>
      The problem is a disk failure + URE. Disk failure is reasonably likely in 4 years of a
      7 drive array. The problem is, during rebuild, is there another error - such as an
      URE?
      <br>
      <br>
      RAID 5 cannot protect against this disk failure + URE situation. As disk drives
      grow in capacity your chance of encountering the second error grows linearly,
      assuming a constant URE rate.
      <br>
      <br>
      That is what the industry is trying to eliminate.
      <br>
      <br>
      Robin
      R Harris
  • Maybe you should title yourr blogs better!!!!...

    "Sorry about your broken RAID 5" And maybe you would not get such
    "why the emotional response to a statistical argument grounded on drive vendor?s own specs"

    It is not broken and will not be broken, it does have limitations.
    That being it cant tolerate more than a single drive failure.

    By your thinking all computer systems with >=1 hard drives is broken, due to the fact a hard drive will fail.
    mrlinux
    • And the limitation is . . .

      Mr. Mrlinux,

      You are correct that your RAID 5 continues to protect you against a single disk
      failure or URE. In that sense it still "works".

      You could own a working Model T Ford, too. It starts, turns and stops just as it did
      when new. Yet in the context of today's world, a Model T Ford - for decades the
      world's most popular car - no longer "works" - is is too slow, the handling is
      terrible, the reliability poor, safety features non-existant. As a usable
      transportation solution, even a working Model T is "broken".

      My goal here is to educate folks to the fact that the useful life of RAID 5 is ending,
      just as the Model T's useful life did, due to the growth of disk capacity and the
      constant URE rate.

      Protecting against a single failure isn't enough any more.

      Robin
      R Harris
      • Maybe using this has a basis for your title...

        "useful life of RAID 5 is ending" would be more appropriate and would generate more useful dialogs.
        mrlinux
        • Title coloring?

          So if a title "enflames" a person, they are not required to read the article in a dispassionate manner first? Analysis only after venting?

          I should hope not. Where would "To kill a mockingbird" be? No birds were killed - you lying #$$%#... but the story was insightful?
          Jim888
          • You help make my point...

            The title is not reflective of the story and appears to be used to get people to post. Such as your post doesnt offer anything to the article.
            mrlinux
          • Crucial difference!

            Worth noting is that you offered a work of fiction in justification for the creative titling of a work.
            FICTION. Where liberties are freely taken and encouraged.
            As contrasted to a technical blog, were creative liberties are STRONGLY DISCOURAGED as FUD pandering.

            Then again, my favorite journalist is/was Hunter S Thompson. But I suspect his style would not be appreciated in most work places.
            shraven
      • Please stop confusing RAID5 and offsite backup as mutually exclusive

        The problem is that your alternative sucks for 90% of the population who absolutely will NOT run backups. RAID5 is like the spare tire approach whereas full off-site replication is the spare car approach. Of course the spare off-site car approach is superior; the problem is that it?s just too much trouble and it?s too expensive and you?re sounding like a broken record on the issue of RAID with your absurd generalizations. RAID5 really isn?t for your traditional computing application anyways and for most things, off-site replication for small things like email, documents, photographs make infinite sense and burning DVDs is probably the most effective and cheapest solution for that or even online solutions.
        The problem is that you?re not acknowledging the fact that this approach simply doesn?t scale and isn?t appropriate for video and DVD archives and that?s really what RAID5 is for. Having 5 separate hard drives is just too confusing and it?s even too difficult for me to keep track of. RAID5 is the most effective way to create a consolidated storage volume with single-drive-failure fault tolerance for LARGE file applications. You don?t really need offsite backup for your DVDs because you have the original media which you can always re-rip in to the hard drive if it ever comes to that. Having RAID5 just means you have better AVAILABILITY to your video content. Please stop confusing RAID5 and offsite backup as mutually exclusive because they?re each solutions to entirely different problems.
        georgeou
        • Can you be more specific?

          George,<br>
          <br>I think I hear you saying that SATA RAID 5 is good for people who won't do
          backups because they have large file data that isn't economical to copy. RAID 5 is
          a management tool, and the chance that a disk failure will corrupt data is less of a
          problem than losing all data to a disk failure.<br>
          <br>
          Did I get that right?<br>
          <br>
          So why not go the extra step and recommend either file-based appliances, like
          Drobo, whose built-in file system allows them to avoid many of the RAID 5 issues,
          or use RAID 6?<br>
          <br>
          Home customers don't understand RAID. What they can understand, and should
          get, are clear recommendations that allow them to protect their data with a
          minimum of fuss.<br>
          <br>
          Encouraging people to trust SATA RAID 5 seems like a customer satisfaction fiasco
          in the making. As drives get larger and UREs remain constant, more and more
          people will discover that they have corrupted data. <br>
          <br>
          How can this be a good thing? Or even an OK thing? Why continue to recommend
          a storage technology that every engineer in the storage business realizes has a
          real and growing problem? <br>
          <br>
          This is where you lose me. As near as I can tell RAID 5 is reaching end-of-life for
          both mission-critical business data and for terabyte + bulk data storage. Why
          recommend it? If you can't be bothered to use one of the dead simple off-site
          backup programs like Carbonite or Mozy, why not at the very least recommend
          RAID 6?<br>
          <br>
          Robin
          R Harris
          • I said RAID5 is appropriate for files that don't need backup

            I said RAID5 is appropriate for files that don't need backup but availability with some level of hardware fault tolerance. The other big reason for RAID5 is that it gives you a nice clean and simple consolidated drive when managing independent partitions is a nightmare. DVD backups for example don't need to be replicated because you already (or should) have the original media. That means looking for movies is made infinitely easier and you don't have to worry about the kids and wife tossing the original media around like Frisbees. For this specific application, off-site replication isn?t needed and would be extremely cost prohibitive. So for massive file storage of video content (TV recordings and DVD rips), RAID5 is the only sensible solution and file replication is a non starter there because it isn?t cost effective and it?s overkill for the problem at hand.

            Now as for personal files, music, and photos, you?d be crazy to rely solely on RAID5 as a ?backup?. The file sizes aren?t big enough to justify the need for multi terabyte volumes anyways so it?s totally unnecessary. You can?t afford to lose your personal files and photos and those are very easy to burn (multiple copies) to DVD. Now if you need to backup more than 8 GBs, it starts making sense to use external hard drives to replicate things and move off-site. So for these specific examples, I?m with you 100%. However, you shouldn?t use this to generalize against RAID5 because RAID5 is a solution to an entirely unrelated problem. But you?re confusing RAID5 as a solution for your specific requirements when it was never a solution for your requirement in the first place so you?re wasting your time attacking RAID5. And the fact that you think I?m encouraging people to use RAID5 for this specific application tells me you?re confusing the issue and my position. I?ll say this one last time; my position on backup is identical to yours. The problem is that you?re not acknowledging that there are other people with other requirements than yourself.

            As for Drobo, you saw my review of it. Drobo is a multi-type RAID system that?s wrapped in a user friendly interface and design and there are a lot of things I like about it. The big problem with Drobo is that it?s too expensive and it?s just too darn slow (even slower than what they told me based on independent benchmarks). The Drobo writes data at around 13 MB/sec which is way too slow for me when my internal RAID solution writes data at 200 MB/sec and data can come in over Gigabit Ethernet at around 70 MB/sec. Now based on my review of Drobo and despite the fact that I personally think it?s too expensive and too slow, I get a lot of people thanking me for my review because they bought a Drobo. So in that sense, I am effectively endorsing Drobo despite the fact that I personally feel it isn?t for me because people who don?t care that much about performance and just want something simple bought the Drobo after reading my review of it.

            As for RAID6, I?d love to use it if it were more affordable. Right now you?re looking at around $50 per port for a RAID6 capable SAS/SATA controller. RAID6 also doesn?t become economical until you get to 6-drive volumes (you lose 1/3 capacity in this case). RAID5 controllers are now essentially about $10 added to the cost of a motherboard and the Intel ICH9R is great. I?ve even tested scrambling the drive orders and upgrading to a different motherboard and it will automatically recognize and mount the RAID. Sure it isn?t as simple as the Drobo, but it meets my requirements (technical people and hardware enthusiasts) better.
            georgeou
      • Say what you mean and stop whining.

        So why not say what you actually meant? RAID 5 is not broken.
        RAID 5 may very well have limitations, which have always existed and continue to exist. Intelligent business already plan against drive/array failures because the entire server could go up in smoke and no type of array will help you then. BACKUP. OFFSITE.
        George made many good points about the purpose of RAID 5: fast, vast storage, with some fault tolerance. It was never intended to be a failsafe method of storing your data.

        You title was sensationalism; FUD nonsense. You did in intentionally with the sole intention of creating controversy. And you got it. So stop crying. The problem with creating controversy is that you may not get the controversy you intended, but something else entirely. Many a politician has been burned in this manner. You took a perfectly good point and spoiled its delivery with your poor selection of a title. And least you're not getting voted out of a job for it.

        And by the way, that Model T still works just fine. Your needs in an automobile may have evolved with time, but the Model T works as advertised and expected. RAID 5 does too.
        shraven
        • Why don't YOU drive a Model T?

          If by limitations you mean " good chance of losing data in case of a disk failure"
          and you are OK with that - fine. If that is your expectation of SATA RAID 5 - or
          anyone else's - then you don't need me. But that isn't what people expect when
          they sign up for protected storage.
          <br>
          <br>
          Go read the RAIDguy comment. If you can't hear it from me maybe you can hear it
          from him.
          <br>
          <br>
          Robin
          R Harris
  • One corroborating opinion...

    Folks commenting brought up a few good points. If a URE during a rebuild hits media with nothing stored on it, nothing valuable is lost. If the rebuild doesn't fail altogether, then only two sectors (the sector getting the URE and the sector being rebuilt) are affected. Scrubbing can help, and bad blocks can be spared out. Yes, the sector that drew the URE can be spared out (and it will in a reasonably good RAID implementation), but what to write to the alternate sector? There's still no good copy of the data available to write to the alternate - it's still bad.

    A quick look at the math: If a given disk is likely to fail in 5 years, and we have 5 disks, we shouldn't be surprised to see a disk fail every year (on average). If we've got 14 TB of capacity in a 7-disk array, and disks can't read a sector once for every 12 TB read (URE of 10^14 bits), and we have to read 12 TB of data to rebuild to a hot spare (the RAID algorithm rebuilds the entire spare, regardless of what is or isn't stored in the array, and so has to read the entire capacity of the remaining 6 disks), then we shouldn't be surprised to see an URE during the rebuild.

    Scrubbing is a good thing, and the organization that defines SCSI protocol (T10.org) has even defined a background capability so that a disk can scrub itself and report any bad sectors it finds - so the RAID algorithm can schedule a rebuild to an alternate. The thing is, this procedure is already taken into account when the disk vendors report their URE rate. Without scrubbing, it's actually worse. Don't take my word for it - ask your disk vendor.

    Trouble is, the RAID software doesn't know anything about the file system, so how does it know the difference between useful information that's corrupted and empty space? It could report an asynchronous error to the OS saying something like, "Blocks X and Y on volume Z are corrupted", but I don't think any OS today can take that sort of information delivered asynchronously and reconcile it with a specific file that needs to be restored from backup. And what if the error is in the file table itself? If the error goes unreported (or is reported but ignored by the OS), then nobody knows the data is corrupted until an application reads the data and something goes wrong. If it's a bad pixel on frame 847394 of the movie Shrek - who cares, nobody will ever notice. But if it's your bank account balance or a blotch on your MRI that your oncologist thinks is a malignant tumor, then you care! For this reason when a URE occurs during rebuild, some RAID algorithms will abort the rebuild and fail the logical volume rather than risk returning bad data with no error indication at some future time.

    Very smart RAID algorithms know how to "mark" a stripe as bad (and even mark which sectors in the stripe are bad) when an URE is encountered during rebuild. So, later, when an application reads the corrupted data (even though the bad sectors have been remapped to good sectors), it can be reported as an URE from the "logical RAID volume", and the application will at least know it fetched bad data. So, if you have a very smart RAID-5 controller, you can get a URE during rebuild and not fail the entire volume. Only the URE sector and the corresponding rebuilt sector on the replacement drive will be corrupted. If the application has access to the correct data (e.g., from a backup), then the good data can be written to the logical volume, and the RAID algorithm may be smart enough to rebuild the stripe using the good data and restore coherency across the stripe. But this only happens if the application writes good data to the bad stripe, and that's likely to involve an actual person understanding a cryptic message from the OS, finding the right backup, and restoring the right information from the backup.

    Some claim they've run large RAID-5 arrays for years and never seen these sorts of problems. Well - maybe you've got a RAID implementation that just ignores the URE during the rebuild, leaves the data corrupted on the affected sectors, and you never knew exactly why it was that a field in your personal budget spreadsheet you knew had something meaningful now contains (*$%&$(#E*&%. More likely, they're using RAID arrays with much smaller capacities. Even the legacy hundred TB systems break up the storage capacity into separate arrays that each have a capacity that's small compared to the 10TB or so threshold where URE during rebuild becomes a problem.

    But still - this is likely to happen whenever the array is rebuilt (a RAID-5 array of 7-2TB disks) - and it's not a good thing. Over a very long period of time, these corrupted sectors accumulate (kind of like using a paper copier to make copies of copies - eventually the distortion adds up and your 500th copy of a copy is obviously distorted compared to the original).

    RAID-6 provides a very nice way to handle this. When a URE is encountered during the rebuild, there's still enough redundancy in the stripe to recover the missing data, the sector that drew the URE is assigned an alternate, and the correct data (from RAID-6 rebuild) is written back to the alternate, and the stripe is complete and whole after the rebuild to the replacement disk. No need to mark bad sectors, and no need to reconcile bad sectors with the file system, and no need to fetch data from backup to restore the stripe.

    At the risk of starting a tangent thread - please keep in mind, RAID is not a viable substitute for regular backups. RAID does nothing to mitigate the proverbial fat finger error. If you've inadvertantly deleted a file, RAID of any sort won't help.
    RAIDGuy
    • I couldn't have said it better myself- Thank You!

      Mr. RAIDGuy,

      You know your stuff. I really appreciate the insights.

      Robin
      R Harris
  • Good try, but no cigar

    First, if you're going to take a quick look at the math, you really should use the right figures. A given disk is *not* likely to fail in 5 years, or anything close to it: 15 - 40 years is what the recent large-scale studies point to for commodity (S)ATA drives, and of course the specs suggest 70 - 130 years (statistically, of course - but these values are valid as long as they're used within the specified service life of the individual drive, so apply in this analysis).

    Second, and not having a vendor at my fingertips this weekend, I don't believe your assertion that vendors take scrubbing into account when specifying error rates. One reason is that the error rates have remained similar over a long period of time, and I doubt that using 'scrubbed' rates would have been considered kosher over a decade ago. Another is that the evidence I've seen suggests that scrubbing (even just once every few weeks) reduces the likelihood of a read error during rebuild of a failed drive by multiple decimal orders of magnitude - so even with disk sizes typical of the past several years a RAID-5 rebuild on an array *without* scrubbing would have been very likely to encounter an read error (i.e., conventional RAID-5 arrays would have already have been discovered to be *grossly* unreliable).

    Scrubbing instead is what has allowed RAID-5 reliability to keep pace with increases in disk sizes, such that while an unscrubbed RAID-5 array today is starting to look a bit dicey a scrubbed RAID-5 array will look pretty good for quite a while yet in most applications. Or to look at things from another viewpoint, there has *always* been some chance of a read error during a RAID-5 rebuild, and the differences are only matters of degree (and the particular tolerance level of the specific installation for such an error - that was part of George's point about the availability of original source material as backup in large read-mostly environments).

    Third, utilities do exist on common platforms that can track down the file (if any) to which a specified unrecoverable sector belongs: all the array has to do during rebuild is keep track of any such sectors (it's not as if one expected a *lot* of them) and report them to the system event log so that an operator can take things from there.

    Fourth, well-designed file systems (and even FAT variants, for that matter) maintain multiple copies of their 'file table' (and often of other critical information), so a bad sector there can just be rewritten from the remaining copy. IIRC ZFS maintains multiple copies of *all* its mapping metadata; in any event, the bottom line (even with *no* added metadata redundancy) is that data usually dominates file-system metadata by many orders of magnitude, hence a URE if encountered at all will almost always only damage a single file.

    Fifth, your analogy with paper copying is flawed: this is digital data we're talking about here, and (again, as long as we're working within the specified service lives of the disks involved) each time it's rebuilt it's as good as new (i.e., no progressive deterioration of the kind that you suggest actually occurs - unless your comment to that effect was harking back a couple of sentences to your suggestion that some arrays just ignored UREs during rebuilds, in which case the answer is that you really wouldn't want to be using such an array anyway).

    So while I heartily agree that people should be aware of the limitations of their redundancy strategy, whatever it (preferably, they) may be, I can also agree with those who suggest that Robin's treatment was superficial and sensational. His blunt assertion that "RAID 6 will give you no more protection than RAID 5 does now" was perhaps the most egregious example (I suspect you'll understand why immediately, and I'll leave it as an exercise for others).
    - bill
    • A couple of comments

      Bill,

      I appreciate the thoughtful response.

      I use the annual failure rates (AFR) found in the CMU and Google studies rather
      than manufacturer's MTBF or AFR figures. MTBFs give the false impression that
      disks will last decades when, in fact, they wear out after a few years. So the failure
      rates mentioned above are reasonable.

      The paper copying analogy is flawed, but the basic point that there are multiple
      forms of bit rot that occur with storage arrays is correct. That is to say that during
      a RAID 5 rebuild you may not encounter any URE's, but that doesn't mean that
      everything you read is correct, i.e. the same as the data you intended to write.
      These issues are a problem with all storage arrays to some extent, with the more
      expensive arrays attempting more thorough end-to-end data validation.

      ZFS uses parent-block checksums to validate child-block correctness and
      integrity. That level of validation isn't possible with an external array.

      Robin
      R Harris
      • Hmmm

        "I use the annual failure rates (AFR) found in the CMU and Google studies"

        So did I: that's where the 15 - 30 year MTBFs came from (though I included the manufacturers' specs as well for comparison).

        "MTBFs give the false impression that disks will last decades"

        Not if you understand what MTBF actually signifies.

        "The paper copying analogy is flawed, but the basic point that there are multiple forms of bit rot that occur with storage arrays is correct."

        It may be correct, but it had nothing to do with the rest of your article - because the kind of bit rot you are describing is not related to URE incidence but rather to the far lower incidence of *undetected* read (or write) errors (or to interconnect defects external to the disk which can potentially affect *any* storage implementation, regardless of its level of redundancy). ZFS can indeed help with this, but once again this is pretty much orthogonal to the main thesis in your article.
        - bill
  • Agree - conditionally...

    Bill,

    I agree with your statements, conditionally...

    A brand new SATA disk likely has an AFR in line with the numbers you suggested - i.e., somewhere in the range of 0.7%, depending on the usage, which says if I have a population of 140 new SATA drives, one might fail in the first year. But that doesn't mean a given drive is expected to last 140 years. Wear and tear do take their toll.

    10 years ago disk manufacturers didn't have to scrub to get the URE rates advertised. And before that they could use less robust ECCs to get the advertised URE rates. There has been a steady progression of technical advancements to meet the "stable" URE rates advertised over the years:
    - retries with offsets
    - temperature recal
    - zoned partitioning
    - more powerful ECC (today consuming up to 1/2 the capacity)
    - etc.
    - and in the last few years - auto-scrubbing.

    Disks have been auto-scrubbing for a while - to detect grown defects before they become uncorrectable. A correctable defect surpassing a certain threshold of "difficulty" (retries, bits in defect,...) is auto reallocated with the corrected data written to the alternate.

    The latest tactic is to cope with the uncorrectable sectors as well by adding the ability to report them (the new auto-scrubbing capability in the SCSI SBC standard) and give the system a chance to write back good data, even when the disk can't.

    The point is that the perceived "stability" in URE frequency over the years is not attributable to any fundamental property of disk drives, but rather it's a market requirement, and disk drive designers go to great effort in each new generation of disks to find ways to maintain the same URE frequency.

    Just as you recommend having a good understanding of the RAID capabilities, I recommend having a good understanding of what precisely the disk specifications published by the disk manufacturer mean, and what assumptions they're based on.

    IF:
    - all the disks in a RAID-5 set are new,
    - the RAID-5 algorithm and/or disk drive scrub, allocate alternates, and reconstruct data to alternates
    - the RAID-5 code works with the file system and backup archives to rewrite blocks incurring URE during a rebuild to a replacement drive
    THEN it's reasonable to expect not to lose data due to a combined disk failure with URE during rebuild.

    If you ask me, those are some pretty powerful IFs. On the other hand, if the RAID is RAID-6, that alone gives me a good warm fuzzy that I won't lose data due to a URE during rebuild.
    RAIDGuy
    • Still no cigar, I'm afraid

      "that doesn't mean a given drive is expected to last 140 years"

      And I never suggested that it did: what MTBF means is not the expected lifetime of a single drive, but the expected failure rate *within the drive's specified service lifetime" (which is typically 5 years). The most recent figures suggest that even under fairly heavy loads a commodity (S)ATA drive has only a relatively small likelihood (in the 15% - 35% range) of failing during its 5-year service life, in somewhat marked contrast to your "if a given disk is likely to fail in 5 years" hypothesis.

      And I'm afraid that I still simply don't believe your assertion that the manufacturers' URE rates assume the presence of scrubbing: as I already noted, the effects of scrubbing are far too profound for it to have been sneaked in under the radar without dramatically affecting the specs at that point (or for any arrays lacking it to have avoided very visible reliability scandals, if indeed the specs assumed it). So I'll have to ask you to provide a credible reference - for (S)ATA drives, of course, since they (rather than SCSI or FC drives) are the drives under discussion here.

      No disagreement whatsoever about the dramatically lower probability of disk-or-sector-failure-induced data loss with RAID-6, but similarly dramatic improvements can be attained in situations where off-site replication is required (i.e., even leaving aside its tolerance of whole-site disasters an off-site replica of a RAID-5 array provides even more security than RAID-6, and even a RAID-0 off-site copy of RAID-5 data can provide slightly better security as long as its contents can be used to repair URE-related RAID-5 rebuild problems).
      - bill