Trust Backblaze's drive reliability data?

Trust Backblaze's drive reliability data?

Summary: TweakTown's tacit defense of industry silence on drive reliability - and against the Backblaze report - fails on multiple levels. Here's why.

SHARE:
23

Let me begin by noting that I've never accepted money or services from Backblaze. I admire what they're doing, how they've bootstrapped the company and their willingness to ignore storage industry taboos.

So TweakTown's (TT) critique of Backblaze (BB) caught my attention. I worked for a drive manufacturer and have worked with multiple qual and support teams, so I hoped TT would shed some light on this arcane topic. They didn't.

The first fail
The post title is "Dispelling Backblaze's HDD Reliability Myth - The Real Story Covered." But it isn't a myth: it's real-life data.

Then TT questions BB's motives. Of course a small company wants publicity.

That they do it by providing statistically significant information about their hard drive experience – something millions of consumers are hungering for – strikes me as a fair exchange. Why don't we stick to what they said rather than throwing mud?

Sourcing
Backblaze is criticized for buying disk drives the way you and I do. They take advantage of pricing oddities and often purchase USB hard drives and remove the cases.

This does subject drives to extra handling. But since most BB drives appear to be bought this way, they all get the same handling, which should only affect the more delicate drives.

Enclosures and drive age
Backblaze's enclosures are also criticized because they put 45 drives in 4U box, with improved mountings over three generations. They say:

. . . we can see that the drives in use the longest suffer the highest failure rates. One likely reason is simple: these older drives are in revision 1.0 of their storage enclosures, which suffer from significant vibration issues that merited a redesign.

But here's a graph of AFR and age, by manufacturer, of BB's data (leaving out the Seagate drive with the 120% AFR to keep the graph readable). I'm not seeing the effect TweakTown is.

bb_afr_age_data

 

Vibration is an issue with drives, especially when seek rhythms synchronize, creating harmonics. However, with large block writes as the main BB workload the huge majority of seeks will be track-to-track with very low mechanical impact.

BB acknowledges that their workload is unique and not suited for every drive, whose results they leave out of the report. But wouldn't you like to know if one vendor is extra sensitive to vibration?

Temperature
TweakTown further states:

Backblaze claims that drive temperature doesn't affect drive life. That is counter to the observations of many others, including drive manufacturers.

In fact, the large scale Google/CMU (links below) studies of drives found that Backblaze is correct: temperatures have to be much higher than spec before drive life is hurt.

I attribute this to accelerated life testing: operation at high temperature. Engineers aren't stupid: drives are designed to work at higher than spec temperatures to give marketing the numbers they want.

Workloads
Later, TweakTown asserts:

Backblaze procures the cheapest possible HDD on the market at all times, regardless of its workload rating, and then subjects them to a harsh environment that is virtually guaranteed to destroy the drive. This leads to higher failure rates than observed in the wild.

The aggregate AFRs BB reported are in the ballpark of the other large scale studies. And the Google and Carnegie-Mellon studies found almost no AFR difference between "enterprise" and "consumer" drives in 24/7 server use. At best, BB can be accused of running accelerated life testing and finding that some drives do better than others.

TT also asserts that "Random data requires more movement, and thus creates more wear and tear on delicate HDD heads." Heads ride on air bearings over the platter, so it is more accurate to say that the head actuator assembly might sustain more wear, but again, there is no evidence from large scale studies that consumer drives can't handle frequent seeks.

Summary
TweakTown concludes with:

The data from Backblaze should not influence a purchasing decision by any consumer, regardless of what type of drive they are purchasing. . . . Even for the winners, the results aren't good; the failure rates are exponentially higher than those observed in the real-world.

Except, of course, that while some of the rates are way higher than have been reported in published studies, most of them aren't. Further, BB is careful to give drive model numbers, and it may very well be that today, for example, Seagate is competitive with high capacity drives.

The Storage Bits take
I understand a test engineer's desire for controlled environments and workloads for testing. But that isn't the real world: some drives are busier; some have higher ambient temps; some come from a bad run; or get banged around in shipment.

But rather than bash Backblaze for giving consumers the benefit of their experience, TweakTown should be asking, as I do, for other major drive users to come clean. I'm looking at you, Google, Amazon and Microsoft.

Google says they want to organize the world's information, but when it comes to something they have unique expertise in, they clam up. Amazon sells disks, but do you see reviews from their maintenance team?

I'd much rather have the results of millions of drive years over many more models. But we don't because the folks who know won't talk.

So yes, as a consumer, I would look at Backblaze's results. If I were upgrading my arrays tomorrow, I'd make an extra effort to buy Hitachi per the Backblaze experience. What they found squares with what I've heard from insiders over the last 10 years.

TweakTown repeatedly objected to the media attention this post got. If other players had already spoken it wouldn't be an issue, would it? I'll take Backblaze's info over nothing any day.

Comments welcome, of course. How about it, readers: would you prefer no info to the limited data Backblaze released?

Topics: Storage, Amazon, Google, Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

23 comments
Log in or register to join the discussion
  • What credentials does the TT blogger have?

    None, absolutely none.

    An Editor, a restaurant manager, and a boeing mechanic. How does that person, qualify to dispel an organization that manages PB's of data, and thousands of harddrives?

    Building a PC rig in your bedroom, and writing for an enthusiast PC site, does not make you an expert. If anything he's shown that, he's a good Internet googler.
    unredeemed
    • Redeemed

      Mr. Undeemed it wasn't difficult to figure out who you were. As a member of the review community such personal attacks should be done by name and not from the shadows. Have the professional courtesy to use your name next time.

      Mr. Alcorn has been testing enterprise storage products for ten years and using his own money to purchase them as an enthusiast for seven.

      Maybe you or those in your camp would like to tell us more about Micron's NVMe solution? Maybe you would like to share with us a bit more on how multiclient NAS testing just fell in your lap?



      Mr. Harris, writer of "StorageMojo.com and ZDnet, the most widely read and highest ranked storage blogs in the world."

      How is that working out for you? Alexa shows SM at over 800K so that's a wash. What are you doing there, maybe 200 readers a month? Judging by the Like, Tweet and InShare activity here you may do a few thousand readers a month. If you want some real traffic let me know but we don't publish tall tails like your Blu-Ray is Dead article from 2008. Maybe I'm just in the niche category but it looks like the format is doing well enough. I would have to ask though, how long has it been since you actually sat down and tested a piece of enterprise hardware? Looking at your articles it seems to me the only thing you are actually capable of is bloviating about the storage industry instead of any real contributions.

      The three of us are all writers swimming in the same pond and I'm sure a few lines were crossed. Personally I don't mind the scrutiny but I don't like anyone questioning our credibility. Calling us bloggers is a bit of a no-no in my book too.
      ChrisRamseyer
      • Chris, why not respond to my points? Is that too much to ask?

        Instead of telling me and my tiny number of readers how insignificant we are and questioning my credibility - something you mind when done to you - why not just tell us all where I'm wrong? Everyone would love to know.

        And please recall that I did not attack Mr. Alcorn, just his conclusions. I'm sure he means well and is a fine human being, but I don't agree with what he wrote and I pointed out where and why.

        Better yet, why don't you do something positive: join me in calling for large users of disk drives like Google, Amazon and Microsoft to reveal what they already track and know about drives? If enough IT consumers ask for it, maybe we'll get it.

        Robin
        R Harris
  • Thanks Robin!

    Thanks for being brave enough to publicize this data. During the past fifteen years I have definitely seen the quality of Seagate drives go downhill. I really can't comment on WD since I have hardly used any of them. But today's Seagate products don't seem nearly as durable as they used to be. I have still got old 20GB Seagate drives that run flawlessly. On my most recent drive replavements, starting about three years ago, I moved from Seagate's AS series, to NS drives, in my case, Constellations. After making this move, I mentioned it to a local PC expert. His response was that, in the case of both Seagate and WD, he had given up on consumer grade products entirely and was mostly using and recommending enterprise grade products. I have been very happy with the Constellations, they can tolerate a lot more sector failures than the Barracuda consumer grade drives. So far, after three years, they have only lost a few sectors and some of them none. But, of course, that means forgoing the giant multi-terrabyte products AND paying significantly more money. But, that said, I can easily believe and trust the Blackblaze numbers. Of course the vendors will not be happy about that because it may affect consumer buying patterns in ways that threaten their profits. Too bad. I expect they will learn to survive. At this point I plan to continue buying 500GB enterprise grade drives except for backup use where I will use the giant sized consumer drives. I do have one 4T Seagate Desktop drive at this point and am happy with it, but it is a backup drive and does not get heavy use.
    George Mitchell
  • I want the data

    The only two hard drives I've ever had fail were WD Green models. WD was good about replacing the first drive that went south, but when the replacement also failed within a year I gave up and purchased a drive from another line. I didn't conclude that there's something wrong with all WD drives, but that there might possibly be a problem with the particular Green model that kept failing. I've actually had good luck so far with Seagate drives, but my sample is so limited I can't really draw any general conclusions about brands. So real-world experience like Backblaze's is valuable to me, but I wouldn't base all future purchasing decisions just on their data. Still, I want to hear what they have to say.
    preilly2@...
  • Drives and Failures

    High Capacity modern drives, especially ones that are 7200 RPM or Faster are so sensitive to handling. I took some training by Seagate. Just moving a new 10,000 RPM Drive on a hard surface without padding reduced its life significantly. I have worked with large quantities of Fibre Channel and other Enterprise Drives for many years. Normal life span is about 5 years. Then every year you lose 10% in a controlled environment with normal conditions. Personally I think most of the Failures are from "Farming" the drives and lax handling. In fact these Pods should be populated when installed and never ever shipped with the drives installed first.
    My experience with SATA drives is less. But as Backblase has found the Hitachi gear is above the rest. Seagate has had some issues with its low end drives but are Best at the Fibre Drives that I work with.
    Bill Kittle
  • Look Deeper.

    BB doesn't secure their drives into enclosures with bolts or screws, they simply drop them in. The weight of the vibrating drive is supported by the SATA connector!
    Drives are placed in groups of 5 on top of a SATA multiplier board, so 5lbs of vibrating weight on top of a thin PCB.
    They literally use a rubber band to handle vibration. Previous incarnations did not even use the rubber band. This obviously isn't good for comparing reliability between different drive housings.
    They also admit that there are RMA'd and refurb drives in the test pool. How is that responsible?
    innocenct
    • Amazing that Hitachi does so well, isn't it?

      Since all drives get roughly the same treatment - shucking, mounting & workload - what you're saying is that the Hitachi drives are even better than they seem.

      Good point!
      R Harris
      • Did you read the Backblaze blog post?

        They disclosed using RMA'd drives and refurbished drives from Seagate, but none from Hitachi.
        Presumably, the Hitachi drives are also all installed in the newer chassis revisions, since they have a lower average age.
        innocenct
        • All hitachi drives are in pod 2.0 enclosures

          innocent-1. we don't use many rma drives from hitachi, because hardly any of them fail. But there are a few here and there 2. All of the Hitachi drives are in pod 2.0 enclosures (rubber bands without clamp down lids). They just work. Does that help you with your assumptions? - sean
          BBSean
          • Ethics 101

            sharris007 - Do you feel, as a representative of your company, that it is ethically/morally sound to intentionally use drives in workloads and environments they are not designed OR warrantied for, and then RMA them when they fail due to shoddy enclosures?
            1. Do you think that RMA/refurbished drives should be included in the reliability data?
            2. To answer the questions we would need to know which enclosures the Seagate drives were placed in. With three different revisions, and alterations specifically made to those revisions to address drive reliability concerns created by poor vibration mitigation, do you really feel that the AFR from each enclosure should be used as a basis for statistical comparisons?
            innocenct
          • Relation?

            Sean Harris, Director of Cloud Storage at Backblaze, any relation to the author of this piece, Robin Harris?
            innocenct
          • Luke, I am your father!

            Nice try, TT'er. But there's no relation, other than the name.

            Robin
            R Harris
  • Impossible to ignore.

    It is impossible to ignore the different chassis revisions, with the early versions having no insulation or mounting.
    It is impossible to make a comparison between drives held in storage pod 1.0:
    http://blog.backblaze.com/wp-content/uploads/2009/08/backblaze-storage-pod-partially-assembled-large.jpg

    and Storage Pod 3.0:
    http://blog.backblaze.com/2013/02/20/180tb-of-good-vibrations-storage-pod-3-0/
    Lester75
    • And you can stop reading now

      You hit on the most important piece - you can't compare apples and oranges and make any assertions or conclusions.

      I've read the various pieces and it's clear to me that the BB 'study' can only be considered anecdotal since there were no controls to ensure that different drives were tested in the same manner.

      If you have any background in research you will understand that alone invalidates the BB data.
      bostonBC
  • Old study links....

    The studies linked are from 2007. These studies arent very relevant today.
    innocenct
    • How do you know?

      Do you have more recent data? Gee, why not?

      Can you point to any substantive reason - changes in the design and manufacture of drives - that would render them irrelevant? No.

      Robin
      R Harris
  • I'll keep buying WDC Blacks

    The comparison between Hitachi and Seagate is apples-versus-apples because the number of drives is almost the same. WDC, on the other hand, could easily be a little more or less reliable compared to the published results because of the order-of-magnitude fewer drives in the study. Backblaze compared like items: Seagate Barracuda and Hitachi Deskstar; it would have been unfair if the former was compared against an Ultrastar, but that was not the case.

    TT made a fuss over the fact that Seagate drives failed, were replaced under warranty, and then failed again. TT tried to spin this as bias, but it is actually a further indictment as to the reliability, or lack thereof, of Seagate drives. TT also made a fuss over vibration and temperature, yet neglected to mention that all drives were subjected to these conditions equally, so they became non-issues.

    TT wrote: "Even for the winners, the results aren't good; the failure rates are exponentially higher than those observed in the real-world." Given that no one else has published comparable data, this conclusion is nonsensical.

    Backblaze wrote: "These (green) drives are designed to be energy-efficient, and spin down aggressively when not in use. In the Backblaze environment, they spin down frequently, and then spin right back up. We think that this causes a lot of wear on the drive." But then Backblaze commented: "We wish we had more of the Western Digital Red 3TB drives." Does Backblaze understand that Reds are IntelliPower, so the platter speed will vary (but no IntelliPark, so no head unloading)?

    A WDC employee noted in the Backblaze blog comments that, so far, the Hitachi and WDC lines have remained separate. Therefore if someone buys HGST today, they are buying the same reliable drives as before.
    PC Cobbler
  • It's just not a scientific study

    As a product reviewer I understand the importance of keeping things apples to apples. While real world use is great, real world use in different worlds is not. There are just too many variables. It would be possible to use all of the data to come up with some very detailed reports but just lumping it all in together is not the right way to publish this data and have the world take it as gospel.

    I respect BB for writing the article and releasing the data but go ahead and release the Excel doc that says what drives, what enclosure they can came out of, what Pod version they were used in and so forth.

    When it comes to external enclosures, some are easier to remove the drive than others. If you have to smack one with a hammer and screwdriver to break a tab, then the shock is going to have an impact on the drive, or one would assume. I own around 300 HDDs and SSDs and cracked open my fair share of external drives.

    Something else to consider is where the drives came from. I've purchased cases with 20 drives inside, drives from e-tails and drives in retail stores. The packing from the manufacture and through the supply chain matters. Next month I'm publishing an article that looks at user comments on failure rates (DOA, right out of the box) from two large e-tailers. It looks at the way the drives ship from the e-tailer. One has a very high failure rate, right out of the box, according to user comments. The other has a very low failure rate per the comments. I've purchased drives from both for years and it's not difficult to figure out why.

    Also, we know vibration has an impact on both performance and reliability. You can put a HDD on a desk and then in an enclosure, testing it both ways. The drive will perform a little better in the enclosure than it does on the desk due to vibration. I proved this in an article last year. The test was with a single drive. I don't remember how many BB had per system but vibration and harmonics are like little earthquakes as far as the drives are concerned, think about the scale. That said, does the BB article test drive reliability or a drives ability to survive in poor conditions? All of the HDD manufactures make specific drives for this type of environment and this specific workload.

    Something else, the BB drives run 24/7 and 5 years is the target. How many HDDs in external enclosures are designed for 24/7 operation?

    If I took a Chevy Volt to a track day I couldn't conclude that a Vette is a poor performer. At the same time I can't expect to put my family in a Vette and conclude that a Chevy van is a poor family vehicle.
    ChrisRamseyer
  • Good questions and discussion

    Co-founder and CEO of Backblaze here. The questions and discussion about drive reliability, usage, measurement approaches, etc. are fantastic and I'm glad to see them happening.

    For years people have been asking us to publish our experiences with the different drives since no one had published a large scale study and it's a question every consumer, IT director, and cloud storage engineer wants to know.

    We set out to build an inexpensive online backup service, not to design a hard drive study. Thus, our results are based on the drives we purchased, running in the Storage Pods we designed, and used the way they were needed by our service.

    I'm thrilled to see all the discussion that has come from us publishing our data as it makes all of us smarter to both hear more people's experiences/data as well and be challenged by the questions raised.

    One of the questions, "Does the version of Storage Pod affect reliability of the drives?" is certainly interesting. Since the drives are largely dispersed among all versions, and if anything, the Hitachi's were in the older ones, I think the answer will be that it does not change the results dramatically. Having said that - we're going to dig into the data and see if there is enough to draw statistical conclusions from. And if there are, we'll publish them for everyone to see.

    Gleb
    budmang