Disk drive life depends on . . . luck

Disk drive life depends on . . . luck

Summary: What is the primary determinant of drive life? I've read the latest research and talked to insiders.

TOPICS: Google, Hardware, Storage

What is the primary determinant of drive life? I've read the latest research and talked to insiders. There are so many variables that the best answer is just . . . luck. Why is that? Is there anything you can do?

Many variables - and some that aren't The recent Carnegie Mellon University and Google research dispelled some popular myths about disk drives:

  • Enterprise drives are no more reliable than consumer drives. The extra money seems to go for margin and warranty costs. These are mass-produced products. There is no secret to making a disk last 3x.
  • SMART drive status reporting is pretty useless. If SMART is telling you there is a problem, there probably is a problem. But if it reports no problem, that means nothing.
  • Workload has no effect on drive life. So use it all you want. Google did find that a heavy workload increases infant mortality, so when you install a new drive work it hard so you can replace it while it is still economic to do so.
  • Ambient temperature has very little effect on drive life until it gets up over 104 F. or 40 C. Even then the effect is slight.

So what does affect drive life? The research shows that drives are much less reliable than vendors commonly claim. There are two major reasons for this discrepancy:

  1. Vendor numbers are based on accelerated testing, which means high-temperature operation. That just isn't a very good simulation of real life, especially real consumer life. But it may explain why drives aren't much affected by temperature.
  2. A high percentage of failed drives report "no trouble found" in vendor testing. This probably reflects the quality of the testing more than anything.

The top issues in drive life:

  1. Drive age. There is some infant mortality, but not much. The big issue is that once drives are three years old their annual failure rates skyrocket.
  2. Handling. Dropping a drive is a bad idea, even a couple of inches onto a table. I saw evidence in the 1990s that found that reducing drive handling to the minimum required to install it improved reliability by 20% or more. There have been many improvements in shock specs since so this may be less valid, but it still makes sense. Drives are mechanical devices. Don't knock them around.
  3. Early production quality. Can't wait to get your hands on that new 4 TB, 15K drive? You could be buying a problem. The first three months of a new drive's production typically has a higher failure rate. After that the factory line settles down and quality goes up.
  4. Statistical variation. Google and CMU looked at 100,000 drives each in their studies. Most of us have very small sample sizes and don't keep very good records. But the data shows that drives fail for no apparent reason at all ages and in all environments. A drive can fail at any time without warning.

That's where the luck comes in. Here at Chez Mojo I had four working disk drives at the beginning of last week. By the end of last week I had two. Different vendors, environments, enclosures, ages, everything.

It was just my bad luck. And normal statistical variation.

What about vendors? I think there are differences, but the conspiracy of silence among big drive consumers, including Google, means data is sparse. But I have some ideas on that for a future post. Stay tuned.

Comments welcome, of course. In a moment of brain cramp I left out the point about early drive production. I added it Friday morning.

Topics: Google, Hardware, Storage

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.


Log in or register to join the discussion
  • Interesting Article!!

    I guess I just have bad luck with drives!!
  • Ah yes,

    I remember when I came to work one day to find 7 failed drives (out of an array of 20). I also found out that these drives were recalled by the manufacturer. No power interruption occurred, and no other system was affected. Now THAT was a long day . . .

    I also remember "stiction" - when drives got stuck and you needed to shake them just right to get them running again. Happened mostly with IBM and Quantum drives way back when . . .
    Roger Ramjet
    • Stiction

      Yep, Quantums were the popular drives for external boxes to use with Atari 1040ST computers, and the famous "one inch drop" was the expected way to start up the drive.
      big red one
      • Stiction

        I haven't seen a stiction problem for many years. In my experience, the old Seagate ST225 20MB MFM drives were among the most likely to develop this problem. The lubricant they used to use on the spindle would coagulate so the drive wouldn't spin up when started. The trick was to turn the drive upside down, give it a quick flick of the wrist, and let centrifugal force get the platters spinning again. Amazingly enough, I still come across working ST225 drives.
        • Stiction also caused by stuck read/write heads

          Maybe half of the drives with stiction were the result of stiffened
          motor bearing lubricant. Stiction in roughly the other half, at least
          with later drives, was caused by the read/write heads sticking to the
          platters. I was never really sure exactly why that happened--I think
          there were several causes: humidity getting into the drive, then
          drying; residual magnetism; etc. Sometimes when I removed a drive's
          cover to unstick the heads from the platters, all I'd have to do was to
          clean the read/write heads with a piece of paper dampened with
          anhydrous alcohol, slipped between the heads and the platters, and
          rubbed back and forth a few times. Not so easy in a multi-platter
          John Sawyer
          • Stuck on stiction

            Or a fine sandpaper worked for me. I couldn't resist drinking the alcohol.
    • Good ole days (?)

      This was a pretty frequent activity in our environment. The best story was I was working in a remote office building a NetFrame (remember those?) and the boot kept failing. I pulled the drive out did the 1 inch drop. The regional manager looked at me and immediately went into a panic. The junior tech I was with looked on in amazement. But, the server then booted. It was pretty funny. (even tho we had already put in a 20 hour day. Ah, when it was that easy to fix a server problem.
      • The good ol days..

        My work included working with a Fairchild Sentry 500 test system. FIVE MINUTES to boot the d**n thing. First you started the pumps, then you got the platter up to speed, THEN you booted the system. Oh, that pump was for the " flying heads ". And that disk sat on edge...and the cabinet was 4 feet tall..This was how Data General made the chips that went onto the MicroNova...
        Anybody else remember the WD " Click of Death? " I had to do a massive weekend swap of all those friggen drives..
        That massive recall started when people would sit down at their desks, turn on their computers and wait. and wait. and wait.
        Old Timer 8080
    • RE: Disk drive life depends on . . . luck

      To looks so cool all the rage the bearing of <a href="http://flvto.com/">youtube converter</a> clause you appliance <a href="http://flvto.com/">youtube to mp3 converter</a> by the side of among <a href="http://flvto.com/">youtube mp3 converter</a>
      youtube to mp3 converter
  • A quick off the on agian power

    issue seems to me to be the hardest thing on a hard drive. That's why every PC I build comes with a UPS or I don't sell it.

    In my experience, I've seen more failed Maxtor drives than any other brand. And it seems like Seagate drives fail less than any other brand.

    And I trully believe that quality SCSI drives will last longer than ATA drives. I have two Compaq drive arrays with several SCSI's that are nine years old and one that is eleven. I don't have any ATA near to that age.
    • Seconded.

      Re: Maxtor & Seagate.

      In the computer industry, my rule is to try to avoid any company whose first letter is "M". Sadly, there's at least one inescapable exception to that rule. :)
      • Seagate

        Western Digital, too. The others I don't touch anymore.
        • No WD for me

          I've used Seagate, WD and Maxtor, and the WD has been the one to fail. It WAS in a WD external box, so maybe it was their s/w that wrecked the drive. If you check WD's forum you find a lot of complaints about their USB drives, especially the MyBook.
          big red one
          • You Missed the Point

            The point being that end-Users simply don't have statistically enough exposure to the various brands under controlled conditions to determine relative reliability or the conditions that cause failure. I may have had one brand or another fail, have always had excellent warranty replacement, and don't think my failure experience - or any individual Users experience with a few drives in an unscientific sample - have any validity as generalizations. By all means, buy according to your prejudices, but lets not confuse them with wisdom; there just isn't any meaningful supporting evidence.
          • THANK YOU

            I mean, i have a bias against Fujitsu drives (i have read alot about fujitsu being crap) but quite honestly, after reading that article, I have to reevaluate my position on that.

            I just cannot wait till they change drive technology (i seem to remember at one point a holographic 3d cube for storage being in the works with almost no moving parts).

            but as an example, i have had LOTS of luck with the major brands (WD, MAxtor, Seagate, Deskstar (both Hitachi and IBM)) and have rarely...actually i don't think i have had one of MY drives fail, so i must be chock full of good luck LOL. I have drives i no longer use from an old P2 dual processor system (10 gig drive IIRC, still sitting on my desk) and have a string of drives in USB enclosures so i can access the data when i need it, all from my old computers. And the brands are so varied (OK, except Fujitsu LOL).

            So yes, people should quit showing their brand prejudice because that is all it is, it is not based on fact.
          • SCSI

            I'm still running 5 1/4 Full Hight Seagate SCSI drives. 9 and 23GIG. I call them <b>"My little egg friers"</b>. They get so hot, you can literally fry eggs on them. I also have some 3.5" 4.5 and 9GIG IBM SCSI drives. They need major cooling or they just stop!
            I still remember working with MFM and RLL drives!!
            Man, I'm getting old!! LOL
            I had some failures with the IBM drives and after extra cooling of the RAID enclosure that failure stopped.
          • emotional purchase

            "emotional Purchase" when you plunk down a few hundred $$ and it fails inside a "users determined lifetime" that person will feel they got burned by the vendor and wont go back to it. They then have a right to be prejudiced against that brand
          • WD is my problem as well...

            ... so I have switched to Seagate and probably will try a Maxtor in the near future. I do hope the MyBook isn't all that faulty, since I already have two of them at home.

            I have decided to stick with 160gb because the WD2500 (250gb) burned up before I could grab any data - and that was my "archive" disk! It was basically used inside a Sony E-machine or an HP which did not run very long or often, but still it failed. That just a month after a WD1200 (120gb) bit the dust, but that seems to have been a simple "repartition", ... perhaps brought about by XP??? Don't know, but two WD's in one month - when there were at least a year between, and none over three or four years old was too much for me to trust WD any more.

            In contrast, my oldest "native" machine (bought new by me) is the HP XE-783 with a Maxtor 30gb C:\, and it has just kept running longer than the WD's, so that's why I'm going to try the Maxtor(s). I am electing to use USB externals so that they don't run just because the computer(s) are turned on. I have a rack of DJ switches which control up to 8 outlets, and it is driven by one of those old "Power Control Centers" (remember them? they had a switch for "Computer, Monitor, Printer, Aux 1, Aux 2; but they aren't made any more for what is termed, "Startup surge", so I had to get the DJ system from a music store in St. Joseph, MO).

            Can't imagine purchasing anything with more capacity, and most certainly not the 500gb or 1.x tb units because of the amount of time/access to transfer data if it begins to fail.
    • But I wonder if anything will change

      now that Maxtor is owned by Seagate, and if so, in which direction.
      • WEllll....

        Honestly? Probably nothing at all, as per article.

        Quantum was absorbed by Maxtor (well at least their HDD part, I have it on good authority they still do tape drives at the moment) and really did not affect Maxtor one way or another.

        Likely same thing for Seagate. It is like Compaq when it got HPed. WEll not quite, because Compaq's quality actually WAS bad (*shudders at how many he to work on in that 3 or so years before the HP buyout) but now Compaq is the "Business computer" and HP is the "consumer computer".

        Since i do not see a lot of Seagate/Maxtor advertisment, no idea if they will do the same thing