How smart is SMART?

How smart is SMART?

Summary: Most disk drives include a feature named SMART - Self-Monitoring, Analysis and Reporting Technology - intended to tell you if your drive is dying. Can you rely on it?

SHARE:

Most disk drives include a feature named SMART - Self-Monitoring, Analysis and Reporting Technology - intended to tell you if your drive is dying. Can you rely on it? Sadly, no. Here's why.

What is SMART? SMART is a protocol for passing information from a disk drive to the CPU. The protocol is part of the ATA and SCSI standards and is based on work by IBM, Seagate and others done in the '90s. The vendors generously placed their work in the public domain.

The protocol mandates a consistent structure for presenting drive data, but the data that gets measured is up to the drive vendor. Typically, SMART will present information on

  • head flying height
  • spin-up time
  • bad block count
  • seek time
  • drive calibration retries

and more. SMART looks at the trends in these and other measures to determine if the drive is headed for failure.

Does SMART work? According to Google's review of 100,000 drives, the answer is a qualified no. They found that enough drives failed without a SMART warning to make SMART useless for predicting drive life. But they also found that if SMART said there was a problem the drive was much more likely to fail.

So if SMART says you have a problem, you probably do. But if SMART says you don't have a problem, you can't trust it.

Why doesn't SMART work better? Drives are complex pieces of equipment with many failure modes. The drive vendors decide what parameters are measured and what is the failure threshold. Since roughly 40% of the drives returned to vendors are NTF - No Trouble Found - vendors set the thresholds to ignore piddling errors. They might catch more failing drives, but only at the cost of even more NTF drives.

Another issue is that many drive failures can't be predicted. SMART mostly looks at mechanical trends, but disk drives are also electronic. A cracked capacitor, power surge or interface failure can kill a drive even if the data is still safely on the disk.

Finally, there are problems in storing and interpreting the SMART data. The data is stored in a small amount of RAM so the drive a) starts from scratch each time the drive is powered on and b) trends may be missed if the RAM fills up and is purged partway through an event.

The Storage Bits take The drive vendors are doing the best they can creating larger and higher data rate drives. The intentions behind SMART are good, but its limitations mean that a "good" drive can go "bad" without warning.

The really smart answer: only regular backups can protect your data from sudden drive failure. Accept no substitutes.

Comments welcome, of course.

Topics: Storage, Data Centers, Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

9 comments
Log in or register to join the discussion
  • I have to go with Google on this

    Through the thousands of hard drive failures that I have dealt with exactly one drive was diagnosed before failing by SMART.

    I remember when SMART was first introduced. After a few years I realized that SMART wasn't going to go anywhere useful.

    Diagnosing a hard drive by its sound is a lot more useful and accurate.
    dragosani
  • RE: How smart is SMART?

    So if vendors put the SMART data in flash memory, would that be a significant aid to detecting problems, since patterns could be tracked over a longer time, and a fault status retained in memory even after shutdown?

    If the status is in RAM, I guess if a fault is detected, unless you check the drive before restart, it might not show up next time?

    Then the best practice might be to check your drive status before shutdown, rather than at startup.
    grvaughan
  • Dittos

    We just lost a drive today on a computer we use for data analysis. It runs 24/7. No SMART warning.

    So I don't think the problem is that the data is being lost on a shutdown/reboot, although that could be a contributing factor. I believe mfg's really aren't measuring the right things or being strict enough with what they do measure.
    technojoe
  • SMART drive ECC to avoid warranty or sell defective hardware?

    After a temporary 3day boot sector failure (Seagate drive), and several calls, Compaq-India attached to my pc and runs Seagate drive software which passes diagnostics. No service.
    I Downloaded SIsoftware ???System ANalyser, Diagnostic and Reporting Assistant??? which shows e.g. 30k seek errors and 40k ECC data corrections.
    Compaq-India attaches, sees my screen shots, they will only consider the Seagate diags on board.
    another 1day boot secor failure.
    Compaq-India attached, diags twice "failed to run", reran ok, all ok, go away.
    Seagate-India wouldn't help either, diags ran ok. No way to disable SMART capability.

    I think that Seagate software doesn't report problems per their "thresholds" and uses the ECC to make them run.
    Doo3
  • There was another article on lifespan of Hard Drives.

    I can't find the link now but the article mentioned the drives that were tested were OEM and non OEM brands.

    IBM branded hard drives had a failure rate of 1,000 per 1,000,000 tested.

    Whereas Hard drives used by HP,COMPAQ,DELL had a failure rate of 100,000 per 1,000,000 tested.

    So this fit squarely on the companies that manufacured these hard drives with cheap parts.
    inachu
  • THE PROBLEM IS NOT THE PARTS..

    It's the Threshold... if In software you put a soft threshold, even if the drive is about to fail, it will not report it, that way they make sure it fails after the warranty has expired, no need to catch pending HDD Failure, before the warranty expires...

    A ECC error recovery for some of this companies is a good read...

    Also IBM sold to HITACHI there HDD Unit., they are now HITACHI DRIVES, and they are frankly better that Seagate drives with 5 Year warranty.... I Have 1 that has bean corrupting data and it's less than 6 months oll.. but Seagate Diagnostics... can catch the error... saying that it must be Memory, but other drives do not corrupt Data.. (Have 3 500GB HDD in my SN27P2) 2 Seagates 1 WD, so i know for shure 1 Seagate is Defective., but it did past seagate inspection, so no exchange...


    SMART Was a Good Idea, as it was a technical achivement, but bean counters, get the last say, in this kind of matters.
    filizaragoza
    • Hitachi better than Seagate? Highly doubtful.

      The IBM/Hitachi genealogy of hard drives is not good at all. IBM's 60GXP and 75GXP hard drives were like Firestone tires on a late 1990s Ford Explorer. I have a Hitachi drive that was built about three years after those flawed IBM drives were issued. It makes the same, very concerning clicking noises that the IBM 60GXP drive that I owned often made. Needless to say I back it up very often and can't wait for the day when I can afford to upgrade my machine with some sort of RAID solution, which of course will be free of Hitachi drives.
      eljay001
    • Calling it like it is

      When the vendors advertise that SMART is to flag errors and reduce loss and risk and they hide errors it is
      FRAUD
      vendors get burned for less.
      When a drive has lots of errors and the vendor specifically hides it and software is designed to say the drive has no problems - it is
      FRAUD
      they should be called on it.
      Doo3
  • SMART, isn't.

    SMART is about as smart as a wicker hatch of a submarine.
    Dr. John