RAIDfail: Don't use RAID 5 on small arrays

RAIDfail: Don't use RAID 5 on small arrays

Summary: Big storage companies stopped recommending RAID 5 a couple of years ago. But I still see small 4-drive arrays touting RAID 5 for home and small office use.

SHARE:

Big storage companies stopped recommending RAID 5 a couple of years ago. But I still see small 4-drive arrays touting RAID 5 for home and small office use.

Big mistake. You want to save money, but you also want to keep your data. RAID 5 isn't worth it.

What's the problem? The problem is that RAID 5 only protects against a single disk failure. But SATA drives are spec'd at one Unrecoverable Read Error (URE) every ~12.5 TB.

Let's do the math.

In a small 4 drive array using 2 TB disks, if you lose a disk you have 6 TB - 3 drives - of remaining capacity. That includes the parity data used to reconstruct the data lost on the failed drive.

Reading through that 6 TB you have a better than 40% chance of encountering an URE - and at that point the disk rebuild will stop since the RAID controller doesn't have the information it needs to reconstruct your data.

Then you pull out your backup copies. You have backups, right?

How to use a small RAID array. 4-drive arrays have lots of advantages: cost; performance (with FireWire or eSATA) fast enough for HD video editing; and portability.

But if you care about your data, RAID 5 is too big a threat. And if you don't mind risking your data - as in performance driven apps like video editing where the data copies are on tape or another disk - RAID 0 (striping) is cheaper and faster.

Most small arrays come with a RAID 1 (mirroring) option that copies your data to 2 different disks. Lose 1 and the other should have it - subject to the occasional URE.

If you want availability and better performance use RAID 1+0 - often abbreviated RAID 10 - which combines mirroring and striping to provide 2 complete copies of your data with the performance of 2 striped drives.

The Storage Bits take The attraction of RAID 5 is that it gives you 3 drives worth of capacity on a 4 drive array - but at the cost of having to use backups if an URE is encountered. Better to use RAID 1 and get 2/3rds the capacity of RAID 5 with a much lower chance of data loss.

The biggest storage mistake consumers make is to believe that any storage device is 100% safe. It isn't.

Maintain at least 2 copies of any data you value. If the data is vital, make that 3 copies. And if thinking about RAID levels makes your teeth ache, consider a Drobo or the new Drobo Pro.

Storage is cheap. Use lots.

Comments welcome, of course. Check out an earlier post Why RAID 5 stops working in 2009 for more details on the RAID 5 problem.

Topics: Hardware, Data Centers, Storage

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

50 comments
Log in or register to join the discussion
  • Drobo

    So.. the article says not use use RAID 5 - then
    recommends that the consumer buys a box (Drobo)
    - which essentially just operates as a RAID 5
    array when its full???? (put in 4 x 1 TB disks
    and get 3 TB of usable space).

    Seems incredibly stupid

    at least the drobopro has dual disk redundancy
    which should minimise the effect of and UREs
    mcfaul@...
    • Drobo isn't a striped RAID array

      That's why it can use disks of different sizes and give you more capacity
      than the smallest disk.

      That's why its better than a RAID 5 array - it isn't one.

      HTH,

      Robin
      Robin Harris
      • yes it is

        if its empty enough it will mirror, but as you
        fill it up it HAS to use parity

        give it :

        1TB
        1TB
        750GB
        500GB

        you get 2.25TB of usable space (check
        drobulator)


        the ONLY way to do this is
        mirror the "top" 250GB across the two 1TB
        raid 3/5 on the next 250GB of the two 1TB
        drives and the top 250GB of the 750GB drive
        then Raid 3/5 across the last 500GB of all four
        disks

        this ***IS*** exactly as dangerous as the raid
        5 array described in the article (mainly
        because it is doing it for the majority of the
        data)

        if you can see some way of getting 2.25TB of
        data onto an array of 1000/1000/750/500 (3.25TB
        total space) disks WITHOUT using parity (which
        gives the dangers described in the article)
        then i would be fascinated to hear it

        I've owned a drobo, i've owned a drobo v2, and
        now i own a droboPro, i'm very active on
        drobospace.com - when they are full - they operate essentially as proprietary raid 5
        arrays - with all the risks and problems that
        come with that
        mcfaul@...
        • 2 copies.

          Which is why Robin suggested buying two Drobos to ensure you always have two copies. But then you run into the management/risk situation of managing that yourself..

          http://www.matrixstore.net/2009/05/21/diy-good-or-bad-short-video/
          thewelshboy
          • Two drobos are not cost practical

            Two drobos are not cost practical. It's far
            more practical and cost effective to get one
            RAID-6 array.
            georgeou
      • Drobo uses a mixure of RAID technologies including RAID-5

        Drobo uses a mixure of RAID technologies
        including RAID-5
        http://blogs.zdnet.com/Ou/?p=508
        georgeou
  • I don't get it

    at the end of the reference drobo article it says "Note that the 4 slot Drobo won?t handle 2 drive failures at once."

    What's the difference between drobo/raid5 then?

    Also It would have been worth mentioning (copied/past from the linked post): raid5 problem: Normal arrays don?t know which blocks have data so a 2nd URE kills the array.
    So, raid5 does have to read the entire 6TB remaining (out of 8TB) instead of just the 2TB needed.

    finally I'd like to say thanks for these articles on raid5 as I was thinking of building my own ... I'm going for WHS instaed :P
    tryonQc
    • Robin mistakenly believes the Drobo doesn't use RAID-5

      You're not the one with the confusion. Robin
      mistakenly believes the Drobo doesn't use RAID-
      5 when in fact it does.
      georgeou
  • RE: RAIDfail: Don't use RAID 5 on small arrays

    drobo can use raid 1/5 internally so it can be part
    mirrored and part parity - basically as long as your two
    largest drives are the same size - you can use the
    maximum amount of protected space available - you dont
    need identical sized drives as you do with traditional
    raid 5 arrays.
    mcfaul@...
  • RE: RAIDfail: Don't use RAID 5 on small arrays

    The only difference I see then is that it knows where the recovery bytes are so it doesn't need to read all the data from the other disks which reduce the risk of failrues (we had 40% f or 6TB, if we only have to read 2TB out of those 6 we cut the 40% by 3)
    tryonQc
  • So long as two disks don't fail at once, you're good to go

    So long as two disks don't fail at once, you're good
    to go with RAID-5. People will only have problem if
    they neglect to swap out a bad drive.

    A 4-drive RAID-5 volume is beginning to make sense
    because it offers you 75% of total raw capacity
    whereas mirroring only offers you 50% capacity. RAID-
    5 makes even more sense when you have a 6-drive volume
    and you're getting 83.3% of total capacity.
    georgeou
    • That's exactly the problem

      Because the drives are put into the system at the same time, environmental characteristics
      that cause 1 drive to fail can easily cause
      > 1 drive to fail simultaneously, or in short
      order. Some organizations without daily
      on-site IT help aren't aware that 1 drive has
      already failed.

      Also, the probability of error or drive failure
      is not constant. It increases as the drives
      age. So the risk of a > 1 drive failure are
      much higher as the years roll by. In fact,
      it may be cheap insurance to simply replace
      all the drives in an array after, say, 3
      years, given the steep decline in drive prices
      every year.

      Additionally, if the 2nd drive were to fail during the rebuild process, the organization
      would also be toast.

      From the outset, the array has to be engineered
      to sustain a 2 drive failure.
      rosanlo
      • Tine is not the only enemy

        I had a drive that was only six months old blow a chip (literally) on the controller board. An identical drive ran well past it's prime, so I have to chalk that failure up to a power surge. I've also seen brand new drives with manufacturing defects; any one remember the IBM 270Mb drives that you had to take out and spin, to get the platters to rotate after they stalled? Based on that, I would say that it would be better to use different manufacturers and replace drives at different intervals, so not all drives are the same age or manufacturer lots.
        I believe either Western Digital or Seagate had issues with their 1 and 1.5 Tb drives around Christmas. This was a firmware issue, which put every drive at risk of failure to work within parameters.
        Realvdude
      • That's exactly the problem? (RAID 6)

        RAID 5 is already designed to allow for 2 drives to fail it is called RAID 6.

        So far in the last 10 years I have never had a RAID 5 where 2 drives failed at the same time. His logical is also flawed because you should always have a backup system that you can take off site, what happens if there is a fire your RAID is worthless. The only servers I have installed that don't have a backup system are Terminal Servers.

        What statistics are is calculations based off of? Server hard drives are different desktop hard drives. I have never just took hard drives off the shelves at your local Best Buy and used them in a server.

        Plus his idea that the business might not even know that a drive has fail means that it is a bad solution. You always want to use a sever with error / malfunction messaging / e-mail that HP or DELL has.

        Is he talking about white box servers? I don't know anyone in their right mind that would recommend a white box server for a company. Most of all if that is the company's only server. You have no true business support with a white box. HP and DELL have parts shipped next day built into the server cost versions paying $50 to have a hard drive shipped over night from Newegg because Best Buy does not have the same one in stock.

        And the biggest thing is he didn't even state that the RAID level should be selected for what kind of job the server will be doing. Database, File Servers, and Application servers all have improved or reduced performance based on the RAID level.
        Yndoendo
        • I never said RAID equals backup

          I never said RAID equals backup. I never said
          you don't need backup on top of RAID. I know
          what RAID 6 is and I've been teaching this
          stuff a long time. What I said was that for
          most smaller jobs (where you already have
          backup), RAID-5 is sufficient.

          If two disks fail, you have to fall back on a
          backup restore and they only thing you've lost
          is some time. Now time is money but you have
          to figure in the additional cost of the drives
          and hardware. If you determine that you can't
          afford to go down period (in the case of an e-
          commerce server or critical company database),
          you choose RAID-6 for better hardware uptime in
          addition to backups. If the server is for some
          less time sensitive application where it's not
          so bad if you're down a few hours, RAID-5 plus
          backup is good enough.
          georgeou
      • The solution to this is obvious

        You just need to build your server over the space of several years, adding a drive every six months or so. That way you have better odds that the mtbf won't be reached on all drives at once. But if you're *really* impatient then i guess you'll just have to chance it, and build it all at once.
        theshowmecanuck@...
  • The REAL problem here ...

    Is that people are using cheapo software RAID that, while often as fast or even faster than hardware RAID, does not have the disk maintenance capabilities of hardware RAID. RAID is about more than just speed. It is about reliability and when it comes to reliability software RAID comes up short. It doesn't provide a dedicated system to actively manage hard drives and neither due the typical file systems that operate on top of it. What is emerging as the best bet for reliability is true hardware RAID with active bad block checking PLUS a next gen filesystem like ZFS or btrfs with active checksumming. Aside from that, if you want reliability USE RAID 1 because with either hardware or software RAID, you can almost always pull out a drive if necessary and just hook it directly to your computer and read it. You can't do that with other RAID levels because the RAID has to mess with your data format. With RAID 1 most RAID systems these days keep their superblock at the end of the drive so as not to prevent using the drive in emergency WITHOUT the RAID controller. With Drobo, I tend to agree with George Ou. Robin seems to be accepting Drobo marketing hype at face value. I'm not sure thats possible.
    George Mitchell
  • Another perspective

    http://subnetmask255x4.wordpress.com/2008/10/28/sata-
    unrecoverable-errors-and-how-that-impacts-raid/

    Plus:
    RAID is not a substitute for a backup strategy.
    SATA RAID-1 has consistently performed well for me.

    Richard Flude
  • Doesn't make sense

    How can an Unrecoverbale Read Error make the entire RAID set unusable?

    If my RAID volume is in a degraded state (one failed drive) and I hit a URE on a block of data, from an application perspective I can see that I may not be able to read that file. How does this trash the whole RAID volume?
    sanjaydhar
    • re:Doesn't make sense

      (A)bort, (R)etry, (I)gnore, (F)ail...

      How many people remember that from using floppies with DOS?

      Well, URE is not the same thing. If you are reading data from a drive and it runs into a corrupt/bad sector, it maps around it, and if it had data in it, then you end up with a corrupted file. However, you are not reading from the drive directly, you are reading from a RAID volume, and the RAID controller is then reading from the drive. It is then up to your RAID controller, be it hardware or software, to decide what to do next.

      Your RAID controller can do one of two things:
      1. try to act like this is a file corrupt error (less likely)
      2. assume the drive is going bad, and mark it as such (more likely)

      If you had a drive go bad, and your RAID controller was in the process of building data onto the hot spare, then if you run into a URE, you get 2 bad drives at once.
      subnetmask255x4