Why disks can't read - or write

We worry about a disk drive's "click of death" but the fact is that unrecoverable read errors are much more common. These errors are what remains after sophisticated signal processing fails to recover the data. What causes these errors? Many things.

No bit left behind We worry about a disk drive's "click of death" but the fact is that unrecoverable read errors are much more common. Ever look at a pretty external disk and wonder "what could go wrong?" If you must know. . . .

Why do we have errors? Most SATA drives have an unrecoverable read error (URE) or bit error rate (BER) of 1 in 1014, or about 1 block in every 12 TB. These errors are sometimes latent - we don't know about them until we can't read the data.

But these errors are what remains after very sophisticated servo and signal processing technology tries to recover the data. What causes these errors? Many things.

Hot head Disks are permanently formatted at the factory with information that tells the read/write head where it is. These embedded servo tracks tell the r/w electronics where the head is - until they don't.

Like every other bit on a disk, servo tracks can be damaged or corrupted - and your formatting software or RAID array can't replace them. A grain of dust under the head can create a momentary flash of heat - thermal asperity in engineering lingo - that wipes out a few hundred bits. Or something harder and chunkier, say a chip of metal media plating, can scratch out those bits.

Either way, vital positioning information is lost - and so is your data.

Round and round we go and where we stop . . . Disk drive tracks are not perfectly circular. At 120 revolutions per second - @7200 rpm - it is a roller-coaster ride for the heads.

The head positioning system knows this and adapts, like we adapt to the motion of an escalator. But as disks shake from noise or vibration, or bearings in motors or actuators wear, the heads can't adapt as fast or as reliably. They start to lose the ability to lock onto a track long enough to read your data.

Which translates to losing your data.

Consumer SATA drives like most of us use will retry dozens, maybe more than 100 times before giving up. But RAID drives and costly enterprise drives will quit after a few tries and declare the drive failed, the idea being that mission-critical systems need the best performance.

So when your consumer drive can't find the data, it really can't find the data.

Other read problems can be due to electrostatic discharge - which is hard to prove - and damage to the drive electronics. So keep your fingers off the drive's circuit board!

Scribble, scribble, scribble Sometimes reads aren't readable because the write, not the read, failed. Read-after-write checks slow drives down too much to be practical in most applications.

Media damage - scratches, pits, irregularities - can corrupt data. Hard particles scratch, while soft ones like aluminum, will smear across the surface, making magnetization more difficult.

Since the outer tracks move faster than the inner tracks, the speed difference may help account for the observed difference in outer track errors. Another possible cause: disk platter lubricant build up due to centrifugal force which forces the head away from the media.

The Storage Bits take Disk drives are incredible machines. 30 years ago a 500 MB drive was the size of a washing machine and cost $50,000. Now you can get a 1 TB drive that fits in shirt pocket: 2000x the capacity; 25x more reliable; and 75x the data rate. All for 1/200th the price.

But as we put more data on a smaller machine, the impact of data loss grows. Users who rely on them should appreciate their limitations so they can protect themselves and their data.

Comments welcome, of course.


You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.
See All