50 ways to lose your data

50 ways to lose your data

Summary: Apologies to Paul SimonDisk drives are marvelous devices. Especially when they go "clunk" and stop working.

SHARE:
13

Apologies to Paul Simon Disk drives are marvelous devices. Especially when they go "clunk" and stop working. I'm not kidding: at least you know your data is hosed. I prefer that to the silent data corruption you don't find out about until you can't access a file or your OS starts freezing. Or a RAID rebuild fails.

Silent data corruption is common You just don't know it. Many low-end RAID controllers don't report problems, figuring you'll never notice. If you do notice, months later, what is the chance that you'll know it was the controller's fault?

Back up is better than insurance Insurance is designed to protect you against damaging but uncommon events. But data loss is very common. Backup isn't insurance. It is simple digital hygiene. You'll use it again and again.

What are disks made of? Hard drives sit at the bottom of a stack of hardware and software that usually gets your data from your CPU to the disk and back. But there are a lot of places where things can go wrong.

Here's a partial list: Media: those beautifully plated silver disks are subject to a couple of major problems:

  • Flipped bits: when a read-only track sits next to frequently written track, the extraneous magnetic field from the writes weakens the magnetization of the read-only bits until your disk can't read it. Normally disk ECC corrects these errors, but not always.

    This is why disk fanatics periodically zero-out their disks and reload all their data. I'm not recommending this, just noting the practice.

  • Physical problems, like a piece of dust, can scratch the disk and/or create enough heat so the head stops reading momentarily. Depending on severity the disk may remove that block from use or begin a death spiral into oblivion.

Wear out: disks have a lot of moving parts. In a 7200 RPM drive the disks are spinning 120 times per second compared to the 500 RPM of a CD drive. After a few years the motor can start to go. It may become slightly erratic, so some bits get squeezed and others get smeared.

The arm that moves the heads may can move dozens of times per second. When the bearings get loose it can go off track and corrupt data on adjacent tracks.

Electrical: if the drive power supply fails your drive will shut down. But if it is slowly degrading it can create extra heat or power surges that affect already marginal components. Component failures leading to sudden death are not seen by SMART reporting, which is one reason why SMART isn't much use.

Software: drives contain small computers that run on several hundred thousand lines of code. Is that code bug free? Need you ask? Among the more common bugs - and let's not get started on the less common ones - are:

  • New code that fixes a problem and accidently breaks old code
  • Putting the right data in the wrong place.
  • Phantom writes that are reported as written but, oops!, aren't.
  • Cache management bugs that munge data, or return correct data to the wrong place.
  • OK, this is less common, but sometimes the on-disk ECC miscorrects the data. ECC is software, right? How do you know it always works correctly? You don't.

Bus controllers: whether managing IDE, ATAPI, SATA, SSA or FC, controllers are small computers running code. Bugs in controller code have corrupted data in the past and will no doubt do so again.

RAID controllers: again, small computers running code subject to bugs, as well as all manner of electrical, connector and cable problems. One insidious problem is corruption of RAID 5 parity data. It is pretty simple to check a file by reading it and matching the metadata. Checking parity data is much more difficult, so you typically won't see parity errors until a rebuild. Then, of course, it is too late.

The Storage Bits take While this list is admittedly incomplete - and less than 50 if you're counting- I'm hoping it will help readers understand why backing up your data is worth the time and money. Modern data storage is a miracle of mass-produced high-technology, but it isn't perfect. Disks will fail. Power will surge. Bugs will surface. You can't avoid them.

What you can avoid is losing your data. If you don't already have a cheap external USB drive, go buy one and at least store your documents and email on it. You won't regret it.

Next: some more way our systems lose data and what vendors can do - and I know at least one of them is doing - to protect our data from silent data corruption.

Comments welcome, of course. As I was writing this a friend called me in a panic saying "I think my hard drive is going out!"

"Good thing you have it backed up" I said. Of course, he didn't. He's out buying a USB drive this very minute.

Topics: Data Centers, Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

13 comments
Log in or register to join the discussion
  • 50 ways to lose your data

    As a veteran of the HDD industry (long in tooth I might add) I read all of your musings on data storage with interest and believe you seldom get it wrong. Your column above is no exception and serves the purpose of introducing the various possibilities of disk drive and software related data failure to a wide audience. These potential failures loom even larger as wider deployment of increasingly higher capacity drives is driven by more and more digital stuff (more eggs in a single basket. It is a testement to HDD designers and makers that HDDs work at all given the various mechanical and magnetic exhortations they must perform at incredibly high levels of precision.
    I especially appreciate your point that regular back-up really isn't the direct analogue to insurance that most people think of. Your use of the term "digital hygeine" is far more accurate in my opinion...who could easily dismiss the neccessity of bathing or brushing their teeth? (Other than my old college room mate.) Out of curiosity I googled (what a great verb) "digital hygiene" and found only 549 hits. Most of these references were to anti-virus methods. Of the 549 references, only 39 also contained the phrase "back up", while 44 contained the word "backup". These findings suggest that your creative writing deserves a tip of the hat. More importantly, readers should note well that a regular routine of data backup is as important, if not more so, as replacing your car's engine oil or flossing.
    simplifried
    • Thank you!

      Gary,

      I appreciate the kudos. And I agree with you that disk drives are little short of
      modern miracles.

      If you haven't already, you might also enjoy my other site <a href="http://
      storagemojo.com/" >StorageMojo</a>.

      Robin
      R Harris
  • The drive failures you are talking about

    can be reduced by just not shutting the drive off! Bearing wear on the spindles happens when you start a cold system up and shut down. Simple. I have drives that are still functional today... why? I firmly believe it's because they are not shut down.

    Some of my drives are 8 years old and still functional. AS for the rest, crap shoot really. I have seen brand new drives die in less than 30 days (customer machines) and others last several years without nary a hitch. Ironically the ones that seem to last the longest are the ones that are never shut down, and on top of that these same PC's also seem to have fewer hardware issues in general. Just my observations... ]:)
    Linux User 147560
    • I Agree

      I never turn my computers off, and I have hard drives that are 5+ years old, and still running. It's less wear and tear on all the components to just leave them on. I recommend to everyone that asks that they leave them on.
      mail@...
    • Just pulled...

      ...a still working Western Digital Caviar 2250 (255.9MB) drive out of one of my old, old, old utility systems. It's been in continuous use since shortly after the born on date, 17 Nov 93. In nearly 14 years, it's probably been shut down a dozen times, for less than 3 weeks total, mostly during physical moves.
      Dr. John
    • Power supplies, too

      All hardware seems to work better if "constantly on" - from the legendary taxi doing a million kilometres to switchmode power supplies (now the technology of preference).

      I used to teach television servicing, and saw numbers of SMPSs explode (!) when turned on - the sudden application of (in Australia) a worst-case peak of some 340 volts to circuitry is the problem.

      This included a lecture where I said "And the SMPS is great for efficiency, blah, blah, but it does suffer from one problem..."

      (lecturer turns on television)

      BOOM!

      Recently, my workhorse PC (an old but useful Compaq Presario) power supply exploded at turn-on. On examination, it was a small DIP - maybe an opto-coupler.

      My RAID server is a twin to the workhorse, so I VERY deliberately left it turned on. When I finally did need to power-cycle it...

      BOOM!

      Fortunately, no other hardware (memory, motherboard, HDDs) was damaged - there was no dreaded "power surge of death". But these two failures do support the idea of leaving stuff turned on.

      Cheers,

      Ian.
      ianbatty
    • Cooling helps too...

      Hot drives die young. System builders don't often keep that in mind when choosing a case and assembling a system. My preference is to mount the drive in a 3.5" bay at the front of the case with a fan blowing outside air directly over the drive.
      57ford
    • Anecdotal evidence?

      My anecdotal evidance disagrees, though with a small sample size.

      I turn off computers at the end of the day or when not in use for a few hours. The oldest active drive I have is about 8 years old, the next is 6-7 years. Oldest historic ones were around 6 years.

      The most catastrophic problem I had was the complete death of a Micropolis SCSI drive which was handed down to me, so I don't know what it went through before. 2nd worst was an IBM Deathstar which collected bad sectors every now and then (I actually kept on using it as an "untrusted temp drive" for a few years).
      beau parisi
  • PLEASE clarify

    Please clarify the following lines:
    "If you don?t already have a cheap external USB drive, go buy one and at least store your documents and email on it. You won?t regret it."

    I'm sure everyone here knows that you mean "store [b]a copy[/b] of your documents and email on it."

    Some readers might mistakenly infer you intend for them to forego the computer's internal drive and store their data on the USB hard drive. I've had people even brag to me about how they don't trust their computers' hard drives and therefore store everything on USB hard drives [b]instead[/b].

    My advice to users is always this: If the data is worth having, it's worth having (at least) TWICE. And the related, "NEVER trust your thumb drive with your only copy of ANYTHING you value."
    bmgoodman
  • ZFS file system helps

    Sun's ZFS file system keeps a checksum of each block of data, so it knows when there is file corruption, adn can correct for it. It also has a different RAID mode called RAID-Z that is spposed to be more efficient. The trouble is ZFS is only available on Solaris 10 and FreeBSD 7.0 (still in beta), and has a license incompatibility with Linux.
    cjc5447
    • RE: 50 ways to lose your data

      @cjc5447 ZFS is an amazing filesystem! I wish I could use it as my every day filesystem. The nice thing about ZFS is that it makes corrections while keeping the filesystem alive -- there is no need to unmount the device while it's being repaired. As for availability: there is some level of availability in Linux via FUSE but as I understand it, it's slow and has missing features. I would love it if ZFS could be used as an alternative to NTFS in windows. Then it would have certainly gained in popularity.<br> <br>Another way to lose data no mentioned was through bad backups: in other words your file becomes silently corrupt but as you rotate your backups the original good file is lost! It may not happen often but if it's an important file that needs to be retrieved from backup 2, 6 or 52 weeks later, it's too late by then!

      Update: My apologies. I read this article thinking it was new and didn't realize it was 4 years old.
      stux1
  • RE: 50 ways to lose your data

    I totally agree with using backups especially online backups. For me Safecopy backup, www.safecopybackup.com, is a perfect fit. I can backup all my files from both my Mac and Pc with just only one account. I can also backup my USB drives and share files as well. I'm very happy with and it's worth checking it.
    dobi2009
  • RE: 50 ways to lose your data

    Let's not forget random radiation bursts from the Sun can hit a PC and cause bit errors too. :)

    Our Sun is going through a particularly noisy period at the moment...
    ProfQuatermass