Chunks: the hidden key to RAID performance

Chunks: the hidden key to RAID performance

Summary: Why do people think RAID means performance?George Ou, Technical Director at ZDNet and a fellow ZDnet blogger, has a great post about real life RAID performance - hardware vs software - plus some helpful comments about data layout, especially for MS SQL Server.

TOPICS: Hardware

Why do people think RAID means performance? George Ou, Technical Director at ZDNet and a fellow ZDnet blogger, has a great post about real life RAID performance - hardware vs software - plus some helpful comments about data layout, especially for MS SQL Server. As George notes, data layout can have a major impact on storage performance. But what about RAID itself? What is the theory behind RAID performance?

RAID = Redundant Array of Inexpensive Disks RAID wasn't originally about performance, it was about cost. Some CompSci jocks at Berkeley wanted to use 5.25" drives for a bunch of cheap storage because they couldn't afford the then-common 9" drives. The small drives had poor reliability as well as lower cost, so they cobbled them together to create the first RAID array.

But as they worked out the details, they saw that RAID could have performance advantages for certain workloads.

The three pillars of RAID performance

  • Cache
  • Striping
  • Chunk size

Let's look at all three.

Cache Cache is simply RAM, or memory, placed in the data path in front of a disk or disk array. You can read or write RAM about 100,000 times faster than a fast disk and about 300,000 times faster than a slow disk. Writing or reading cache is a lot faster than disk.

Most external array controllers are redundant as well, with some kind of load balancing driver that keeps both controllers productive. In this case, the cache has to be dual-ported. If a controller fails, the other controller can see all the pending writes of the failed controller and completes them.

Dual-ported cache, controller failover, dual server interfaces and failover drivers are all tricky to engineer and test, which is one reason why mid-range and high-end controllers are so expensive. But they sure speed writes up.

Striping for speed Striping is taking a virtual disk that the operating system sees, and spreading that virtual disk across several real, physical disks.

A RAID controller presents something that looks like a disk, like a C: drive, to the operating system, be it Windows, OS X or Linux. The RAID controller isn't presenting a real disk. It is presenting a group of disks and making them look like a single disk to your computer.

The advantage is that instead of a single disk's performance, you now have the performance of several disks. Instead of 50 I/Os per second (IOPS) on a 5400 RPM drive, you might have 150, 200 IOPS or more, depending on the number of drives. If you use fast 15k drives you might reach 900, 1,000 or more IOPS.

And instead of 1.5 gigabits per second bandwidth, you might have 4.5 or 6 Gb/sec. If you have a cache that you are in a hurry to empty, a nice fast stripe set is very helpful.

The hard part: spreading I/Os smoothly across all the disks. If they all jam up on one disk your costly storage system will be no faster than a single disk. That's where chunk size comes in.

Chunk size: the hidden key to RAID performance Stripes go across disk drives. But how big are the pieces of the stripe on each disk? The pieces a stripe is broken into are called chunks. Is the stripe broken into 1k byte pieces? Or 1 MB pieces? Or even larger? To get good performance you must have a reasonable chunk size.

So what is a reasonable chunk size? It depends on your average I/O request size. Here's the rule of thumb: big I/Os = small chunks; small I/Os = big chunks.

Do you do video editing or a lot of Photoshop work? Then your average request size will be large and your performance will be dominated by how long it takes to get the data to or from the disks. So you want a lot of bandwidth to move data quickly. To get a lot of bandwidth you want each disk to shoulder part of the load, so you want a small chunk size. What is small? Anywhere from 512 bytes (one block) to 8 KB.

If you are running a database and doing lots of small I/Os - 512 bytes to 4 KB say - then you want to maximize your IOPS, which ideally means sending each I/O to only one disk and spreading the I/Os evenly across the disks. What you don't want is a single I/O getting sent to two disks, since waiting for the heads will slow things down. So you want a large chunk size - at least 64 KB or more. That large chunk will mean that most I/Os get serviced by a single disk and more I/Os are available on the remaining disks.

However, many databases use their own strategies to gather I/Os to minimize I/O overhead. In that case you need to know what the database is actually doing to choose the right chunk size.

The Storage Bits take RAID systems are complex and their operation is sometimes counter-intuitive. You should take the time to understand what your workload is in order to configure a RAID system for good performance. Otherwise you could end up with an expensive and slow problem instead of a fast I/O solution.

Comments welcome, of course.

Topic: Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • The "i" got redefined by the industry

    The "i" in RAID got redefined by the industry to mean "independent" because they got sick and tired of being questioned about their 500 to 1000 percent premium over identical hard drives on the market that don't carry name brand logos on the drives themselves.

    Nice post, thanks!

    The concept of using the application to split a database table can be applied to MySQL as well I believe. You can always do manual table splitting too though.
    • RAID industry is in sad shape

      The whole array business is under significant pressure as customers are slowly
      waking up to the fact that paying 5% for capacity and 95% for protection is whacked.
      R Harris
  • Still expensive and not really practical for end users.

    Yes i get more storage, but say a raid 5, if more than one drive dies, i am dead, all of my data is gone.

    Raid 1 is good, but you take a performance hit, but you have realtime backup, course the downside to that is that viruses or system changes can negate the backup, thus rendering it useless.

    Raid 0.. good for speed, but if you lose a disk.. your dead in the water. Great for log files or gaming directories.

    I like something i recently found called XRaid, which keeps each disk as its own entity and if one fails and can rebuild it like a raid 5. If more than one dies, your out of luck but atleast the data on the remaining drives lives on.

    The problem right now is that we have soo much storage space and if you want backup, you either buy two or buy rediculously expensive backup devices. Its a lose-lose. Hence why i like the XRaid.

    Overall though, i have never had a raid 5 die. I have lost drives in raid 5 setups, but never the whole array. Had a scare once, but turned out to be the controller.

    Essentially it all comes down to the risk your willing to take vs the money your willing to spend.
    • I agree - RAID is for servers if it is for anybody

      And I don't think RAID will be that popular in ten years - copies of data will be the

      Why I think that is another post.

      Thanks for writing.

      R Harris
  • RE: Chunks: the hidden key to RAID performance

    Hello Robin:

    Nice article. Congratulations!

    What chunk size would you suggest to an RAID5 array
    (4 disks) writing files of 600k to 4Mb each?

    I usually have 100 files being transfered to this array
    in a 2-3 seconds interval.

    thank you!
  • logic of small chunk sizes?

    Hello Robin,

    Thanks for your article. I read it (and an almost identical article here in an effort to optimize an upcoming stripe set I need to build.

    Two questions:

    1) I'm trying to understand the logic of using smaller chunk sizes for a large video file. Yes, we want to encourage the array to spread the data across multiple drives. But it's a very large file, so it will naturally span many chunks, essentially getting us the parallel i/o that we're seeking. Am I missing something?

    2) On the other hand for "lots of small I/Os", are you saying that if a stripe set is given two separate I/O requests, each smaller than 1 chunk, and each chunk on a different drive, that it can process both requests in parallel?

    Thanks in advance!