Leopard's Time Machine: Consider the 'bad and ugly' side of storage

Leopard's Time Machine: Consider the 'bad and ugly' side of storage

Summary: A new article on potential problems with high-capacity hard disk drives may lead Mac OS X Leopard customers to rethink their storage needs.

SHARE:

Leopard’s installer: The case of the disappearing volumesBefore tackling my Thanksgiving Day's supper (which came at my nephew's apartment in the form of a steak this year instead of a turkey), I read through an article in the November ACM Queue about potential data reliability issues with high-capacity hard disks. The concerns over these new HDDs — certain to be a hit with storage-hungry, content-creating Mac users — made me rethink my backup and archive strategy as well as what kind of storage I will need for Leopard's Time Machine snapshots.

The article in question, Hard Disk Drives: The Good, The Bad and The Ugly, is a technical article by Jon Elerath, a manager of reliability engineering at enterprise storage company Network Appliance. It discusses a range of reliability issues inherent in drives using super-low-flying read/write heads and perpendicular magnetic recording technology to increase capacities. These technologies are now in use on desktop drives with a capacity of 1TB and in the highest-capacity notebook drive on the market.

The new technology required to achieve these capacities is not without concern. Are the failure mechanisms or the probability of failure any different from predecessors? Not only are there new issues to address stemming from the new technologies, but also failure mechanisms and modes vary by manufacturer, capacity, interface, and production lot.

Of course, this article is aimed at storage professionals and managers of RAID systems and servers. That market segment increasingly includes Mac customers, however, the lure of high-capacity storage is felt by a much larger group of Mac users, whose business and personal workflows handle large content files and valuable content.

The article starts out with an interesting decision tree on potential read problems with HDDs, some of these can be "operational," with an electrical cause or a misalignment of the head assembly and the data on the disk surface. The other branch of the tree concerns "latent" failures, which I found the most troubling.

Failures where the data is still good and uncorrupted such as happen with a problem with electrical, mechanical, or magnetic function impairment can be more-easily detected and accommodated. But the next-generation of high-capacity mechanisms could be more susceptible to data corruption. Worse, this is the leading problem with HDDs, according to research described in the report.

Elerath says: "Hard-disk drives don't just fail catastrophically. They may also silently corrupt data."

Part of the problem is one of scale, it appears by my reading. Hard disk technology is really a miracle of manufacturing and engineering. However, issues that weren't such a problem before in lower-capacity media and with previous read/write head technologies, may crop up when way more data is packed into the same small place. According to the paper, a complex mix factors can increase the chance for "latent" defects.

Latent defects are the most insidious kinds of errors. These data corruptions are present on the HDD but undiscovered until the data is read.

Of course, drives have algorithms in firmware to attempt to recover missing and corrupted data. The mechanism looks "off-track" or around the place it thinks the data should be. In a RAID system, the controller reconstructs the missing data using the parity information. If you don't have redundancy (like most of us), then you can only hope that the drive can recover the data.

Depending on the size of the media defect, this may be a few blocks or hundreds of blocks. As the areal density of HDDs increases, the same physical size of defect will affect more blocks or tracks and require more time for re-creation of data. One tradeoff is the amount of time spent recovering corrupted data. A desktop HDD (most ATA drives) is optimized to find the data no matter how long it takes. In a desktop there is no redundancy and it is (correctly) assumed that the user would rather wait 60 seconds and eventually retrieve the data than have the HDD give up and lose data.

But will the OS or the application or even the user wait 60 seconds? Most applications will time out in that span. And that the drive attempts to find the data doesn't mean that it will eventually find it — that's why servers use RAID.

So, what does all this mean in for Mac users? Here's my first take:

RAID Level 1. To protect against data rot, a single drive isn't good enough anymore. I suggest that users consider a mirrored array for a Time Machine drive. The hope would be that one of the drives will correctly read and store your data correctly and then retrieve it. This is the most expensive choice for data, however, it is a simple and effective solution.

Coddle your drives. With more data packed onto a platter, modern hard drives are more susceptible to physical problems. Make sure that you have a good, cushioned case for your notebook.

I'm always amazed that people will spend $1K or $2K on a notebook and then brag about the cheap bag they have for it. And I don't put my notebook in an overhead anymore after a guy yanked his bag out and pulled my briefcase along with it.

This goes for desktop drives as well. Please treat 3.5-inch drives with respect and handle them with care — they are more fragile than notebook drives.

Topics: Data Centers, Hardware, Storage

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

15 comments
Log in or register to join the discussion
  • Apple is PERFECT and has already thought of this

    Just format your Time Machine drive to use ZFS. OS X's implementation of ZFS is far superior to Solaris' and will survive even a total physical failure of the disk.

    You know, the anti-Apple bias on ZDNet is unbelievable!! I might have to stop coming here if you guys don't stop finding flaws with Apple products. My fragile ego can't take the pain any more.
    NonZealot
    • A suggestion

      "You know, the anti-Apple bias on ZDNet is unbelievable!! I might have to stop coming here if you guys don't stop finding flaws with Apple products. My fragile ego can't take the pain any more."

      Maybe you should wander over to eWeek's Microsoft Watch then. Joe Wilcox makes sure it's all about Apple, all the time!
      Joe_Raby
    • i find it interesting...

      that you are the first poster so often.
      lostarchitect
      • When you're 15 years old

        pimple-faced and craving attention, you do what it takes...
        frgough
    • Clearly...

      You (as a presumed American) have little to be thankful for. Shouldn't you be still inhaling vast quantities of turkey or something? Or perhaps you've had one too many turkeys... Do turkeys in the US have dubious chemicals added to them?
      ego.sum.stig
      • One thing

        Thanksgiving is celebrated on Thursday. He posted on Friday.
        laura.b
    • ZFS

      From what I've read, Apple's implementation of ZFS is incomplete. Currently, ZFS-formatted drives are read-only. Apple plans to fully support ZFS in later releases.

      Ed
      ejsecco
  • Considering Drobo

    I saw the Drobo movie and it looks very nice, and it works with Time Machine.

    http://www.drobo.com/

    I can't wait until Time Machine works with Air Disk. Ubiquitous backup presence for
    all my Macs tucked away in my closet. Very nice.
    YinToYourYang-22527499
    • Looks nice

      5 bills with no drives included is still a bit high for me though.
      DarthRidiculous
  • Sone things to note . . .

    First of all, the ECC found in all SMART drives will fix the latent errors, making the numbers even smaller.

    Second of all, I'm not seeing new hard drives failing any more than old hard drives. Despite the "the sky is falling" attitude, it seems that hard drive manufacturers are doing a much better job than Jon Elerath gives them credit for. Most of the "problems" he claims hard drives have seem to be solved. Sure, hard drives can *theoretically* have those problems - but hard drive manufacturers seem to have designed their hard drives to minimize or fix those issues, and IMHO they are well ahead of the theoretical increased problems that larger capacities can bring.

    Sounds to me like a lot of FUD. I think the *actual* amount of errors that happen are a lot less than the *theoretical* errors, because frankly the manufacturers have put a lot of effort into fixing the issues that he doesn't account for.
    CobraA1
  • Hmm...

    This is all good and dandy except what about those of us who can't afford two drives for a RAID array? I really look forward to when solid state storage becomes the norm as storing data on equipment that is constantly moving is just a recipe for disaster (as has been shown many many times already).

    In addition, I never store my laptop in the overhead compartment on airplanes. You just never know what kind of other luggage is in there and how your laptop will fit with all the other luggage as things shift around during the flight not to mention the dreaded possibility of some klutz dragging out your laptop as they reach for their luggage.

    - John Musbach
    John Musbach
  • Overkill. Here's why.

    David,
    <br>
    <br>
    Mirrored Time Machine drive? Overkill. Why?
    <br>
    <br>
    You already have one copy of your data on your system
    drive. You have a second copy of your data on the TM
    drive. So the risk is that you'll lose the data on the
    system drive AND that the TM drive will have a
    corrupted copy.
    <br>
    <br>
    That is very unlikely. The maintenance issues with a
    mirrored second drive are more likely to bite than the
    lost copy + corrupted copy scenario.
    <br>
    <br>
    Silent data corruption is a real problem. Two copies of
    your data, ala Time Machine, will handle the vast
    majority of them with the least hassle.
    <br>
    <br>
    Robin (Storage Bits and <a
    href="http://storagemojo.com/"
    target="_blank">StorageMojo</a>
    R Harris
    • You forgot the biggest reason why this is overkill

      [i]Mirrored Time Machine drive? Overkill. Why?[/i]

      Exactly right. After all, as Robin taught us a few months ago, only [url=http://blogs.zdnet.com/storage/?p=169] Windows puts your data at risk! [/url]

      Robin seems to believe that people are safer on HPFS because NTFS is on more machines. I guess data corruption, like malware authors, only strike based on marketshare?

      snicker, smirk :)
      NonZealot
    • Depends if the backup's off or on site

      It might be worth it if your second backup was taken off-site for major disaster data recovery - like wildfires, hurricanes or Coppola's Colombian armed bandits. I've taken to backing up a second set my home data and taking it to my office, then backing up a second set of my office data and taking it home.
      drprodny
  • Raid Server

    Watch for medium to large hards on sale. Grab 4 or 5. Stuff them in an old P2 or P3 case. Load the OS of your choice. Grin when people ask if you're worried about data lose.
    dmhunter@...