How to use RAID in 2014

How to use RAID in 2014

Summary: Years ago I wrote a post "Why RAID 5 stops working in 2009". Five years later you can still buy many RAID5 arrays. Was I wrong?

SHARE:
TOPICS: Storage
14

No. In that post I laid out the simple problem with RAID 5 in 2009:

  • Disks fail.
  • When they fail the remaining disks are asked to read every available sector to rebuild the lost data on a new drive. 
  • The common unrecoverable read error (URE) spec is often still 1 in every 10^14 bits read. One error in about every 12.5TB.
  • With a 4 drive RAID 5 using 4TB desktop drives, that's 12TB of capacity - and a high likelihood of an URE.
  • When the URE happens your rebuild fails.

Now what?
First, don't panic. I own 3 different 4 drive arrays and I'm OK.

Why?

Because I understand the limitations of the technology and take steps to protect my data. What is the key step?

Don't keep your only copy of important data on a single array!

I keep data copied between arrays - a simple form of mirroring - so if an array fails I still have my data AND can recreate it if a drive rebuild fails.

Limits of RAID 5
With today's large drives you need to look at their error rates. Desktop drives are typically cheaper and have a 10^14 spec.

But drives intended for RAID typically have a higher spec: 10^15, which is one URE every 100GB or so. Much safer.

But regardless of the drive spec, you always need to back up! Whether to another array, as I do, or to the cloud or tape, you cannot rely on any storage device to safely keep your only copy of important data.

The Storage Bits take
RAID 5 still "works" in 2014. But only if you take precautions.

I speak to end-users who thought RAID meant their data was safe - and then discover it wasn't.

RAID offers speed and capacity, but safety comes from copies, not RAID. Remember that and you'll be fine - even in 2024.l

Comments welcome, as always. What is your RAID backup strategy?

Topic: Storage

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

14 comments
Log in or register to join the discussion
  • Intersting numbers, but...

    If the URE spec is 1 in every 12.5TB, that wouldn't take long in just normal use to experience a drive problem. Hum. Do you have the numbers for throughput on your drives? Sounds like in order to even use the drives at all, one MUST have RAID that can recover from read errors in order to use them for any length of time?

    Oh yea, backup your data...
    Bruce622
    • URE in general use isn't the issue...

      ...it's a URE during a rebuild that is the problem. If you're rebuilding and you have a URE, you're RAID is effectively gone. The only recourse of a URE during rebuild is to recreate the RAID then restore from backup.

      The problems with UREs is the reason that RAID 6 came about as someone had the forward thinking to realize that it was a very real possibility for UREs to become a given as time went by with RAID 5 as disk sizes increased, we truly crossed that line back in 2009 with the advent of 3TB drives.
      cdigan
  • Or . . .

    How would you know? Missing DLL? File not found? Yes, back up your data!

    Robin
    R Harris
  • ever heard of ZFS?

    have been using it since 2007...
    ZFS pretty-much ends this topic as it can verify integrity of the blocks... it's that's simple...
    http://www.open-zfs.org
    tangles@...
    • ZFS isn't a magic bullet

      Depends on what level of RAID your ZFS is using. If you're using the RAIDZ, the RAID 5 equivalent, you still have the same issue. Likewise, ZFS requires more hardware resources (ZFS is handled by the CPU and your available system memory where as RAID 5 is going to use a dedicated controller which offloads a lot of the work from the CPU and local system memory).
      cdigan
      • ZFS RaidZ is NOT equivilant to traditional Raid5.

        ZFS is filesystem raid, NOT block raid, and there is HUGE difference. With block raid, even if you have a small problem on your surviving set, you are screwed because you can't successfully resync at block level. With filesystem raid, your risk is limited to losing only the affect file(s) within the surviving set, AND that assumes that you lose the failing device completely. In the case of block raid, even a small problem on a device can cause that device to fail to sync and drop out. Additionally, filesystem checksums all files on a file by file basis. Block raid does not have this capability. And filesystem raid does routine scrubs to clean up data errors as they occur. A scrub on block raid will drop the whole device on an error and attempt a resync. Filesystem raid will ONLY resync the specific affected file(s). More CPU, for sure, but there is no comparison in terms of reliability. That said, frequent backups, preferably multiple backups, are the watchword.
        George Mitchell
    • Short answer: yes!

      Search on ZDNet or StorageMojo for ZFS. Very disappointed that Apple did not follow through on its plan to include ZFS in its server back in 2008.

      Robin
      R Harris
  • Gee

    Next you'll be telling us that the sky is blue, or water is wet! Gee, the things I didn't know.

    Thank you capt'n obvious.
    ccs9623
  • Misleading Title

    What an uninformative post. You talk briefly about the reliability of RAID5 but nothing about how to use RAID in 2014... no mention of SSDs or RAID10 or others... come on. Linkbait.
    tdan76
  • Raid 5 then mirror

    First - you might have the wrong letter for the size in "But drives intended for RAID typically have a higher spec: 10^15, which is one URE every 100GB or so. Much safer."

    Second - we use Raid 5 in the arrays, then we mirror the drives between arrays. Arrays are in different rooms. Gives us redundancy for room outages.
    Silent Observer
  • 2014.... RAID 6 or RAIDdp

    It is 2014 folks. RAID5 is old and ever since SATA, not that reliable. So along came dual parity and RAID6. Been available for quite some time now.
    Iozone
  • Redundancy

    Let's not forget the magic word in this digital age. Silent Observer has the right idea. But, if the place burns to the ground, where are we?
    SOP, people: off-site storage of removable media combined with cloud-based backups will give you the redundancy you need in a worse-case scenario.
    DPeer
  • IRL

    Does this happen in real life. I heard about this back in 09, but in the last 5 years I have never heard of a network Administrator who has actually lost an array in real life becuse it would not rebuild due to URE errors.

    Despite how much I love getting extra storge space. I am starting to become a fan of mirroring over striping due to the performance increase. It should be possible to dupliate a drive in a non-raid environment even if there are a few UREs.

    I will also agree that turning RAID off and going JBOD with ZFS is the way to go. Everything checksummed, on the fly data repair and can still resliver after a URE.
    fwarren
  • 10^15 and RAID5

    Isn't 10^15 bits over 113 TB?
    And cdigan and I are in sync. RAID6 of course as drive sizes get larger and larger.
    With a bad bit lingering in 113 TB of disk, you can calculate the odds of a RAID5 blow-out.
    I'm with cdigan, the threshold was about 3 TB when the odds got bad. Modest RAID5 (4+1) and 500 GB drives isn't too scary.
    alpharob1