The end of the RAID era

The post-RAID era has begun. While RAID arrays aren't going away, the growth is elsewhere, and corporate investment follows growth.

Why now?

Architecturally superior alternatives to RAID now exist. The post-RAID milestone was passed years ago.

  • The authors of the 1988 original RAID paper (Patterson, Gibson and Katz) all moved on long ago: Patterson to scale-out object storage and much more; Gibson to Panasas, a scale-out object storage company he co-founded; and Katz has been working on Hadoop among many other projects.
  • What are probably the fastest growing large storage infrastructures in the world - Google's and Amazon's - are aren't based on RAID.
  • Major storage vendors including NetApp, HP, EMC and Hitachi have all invested in - and are selling - noRAID systems.
  • But the biggest reason? The math behind erasure codes improved after the RAID paper was written.

How erasure codes work

RAID uses erasure coding to create parity information that protects a RAID array from 1 (RAID5) or 2 (RAID6) uncorrectable read errors (URE). But RAID 5 stopped working 3 years ago if you use SATA drives.

Erasure codes break up a data segment into n fragments, add m additional fragments, store n+m across different devices, and can recover the data from any n of the devices. In a RAID5 8 drive stripe, the original data is divided into 7 fragments, an 8th fragment is calculated - the parity data - and then any one of the 8 drives can fail without losing (theoretically) any data.

The RAID5 problem is that with larger disks rebuild times get dangerously long and that an URE will be found on another disk, killing the rebuild. Surviving 2 failures is the minimum today.

In the '90s a new form of erasure coding was developed that enabled developers to create codes with an arbitrary level of redundancy - survive 4 failures? 10? Pick a number! - called fountain or rateless erasure codes. Startups including Digital Fountain, Cleversafe and Amplidata sprang up to take advantage of these new codes.

This StorageMojo video explores the advantages of rateless codes, using Amplidata as an example. One key advantage: the redundancy needed to survive 4 failures is, they tell me, down to 50-60% of the data. Much better than the 3x replication that Amazon and Google use in their infrastructures, and very competitive with RAID6.

The Storage Bits take Redundant Arrays of Inexpensive Disks shook up a complacent industry 2 decades ago. But time and technology move on.

Make no mistake: RAID isn't going to disappear. But it is moving to the margins as its limitations and costs become clearer thanks to new alternatives.

Despite the industry investment in RAID, we now have better solutions. Properly priced and marketed, these solutions will drive the next big round of storage growth.

Courteous comments welcome, of course. I've been doing work for Amplidata. And for a quick and deeper intro to erasure coding for storage , check out Prof. Jim Plank's Erasure Codes for Storage Applications (pdf) presentation.