MTBF is the most widely misunderstood concept in consumer storage. There's something better: AFR. Here's what AFR means and a handy MTBF to AFR conversion table.
The 200 year drive
I often see forum comments like “a 1.6 million hour MTBF means this drive will last 200 years!”
Disks have motors, bearings and lubricants that wear out or break down. There are plenty of 10 year old drives that still work, but research from Google and CMU shows failures start climbing at 3 years.
So back 'em up, bunky.
MTBF and AFR
MTBF (Mean Time Between Failures) is a statistical measure relevant to large populations. If you only have a few drives the reliability you see has little statistical significance.
But people still want to know what to expect from their disk drives. That's why the industry is moving to a more transparent number: the Annualized Failure Rate or AFR.
The AFR, or its cousin the Annualized Return Rate or ARR, give a better view of the chance of a drive failure in any given year. The ARR takes into account that about 60% of all returned this drives are NTF or No Trouble Found when tested.
Converting from MTBF to AFR/ARR
This table presents the MTBF, the AFR and the ARR - assuming a 40% NTF rate - so you know what to expect for a given MTBF.
The AFR is still a statistical construct. And it's only accurate for the first 3 years. After that AFR begins to climb.
If you buy a drive that fails two months later - as sometimes happens - you just got the 1 or so drives out of 100 that fails.
Poor you, but think of the happy owners of the 98 or 99 drives that didn't fail. When you have 40 or 50 of the same drives you can start judging the accuracy of the vendor specs. Want to know the chance of failure over some number o years? Add up the annual AFR's or, more conservatively, the ARR's.
How MTBF is derived
Drive specs are based on accelerated life testing. A population of 1000 or so drives are run flat out at a high temperature for about 30 days. MTBF is calculated from that.
Want to calculate AFR and ARR from MTBF yourself?
AFR = 1-(EXP(-8760/MTBF))
MTBF in hours & 8760 is the hours in a year.
ARR = AFR/0.6
0.6 is the fraction of returned drives with trouble found
The Storage Bits take
AFR isn't perfect. Civilians will still draw the wrong conclusions, but at least they'll avoid the "200 year drive!" fallacy.
Disk drives are incredible. They work at nanometer accuracies while spinning 120 times a second. Next to a disk drive a $35,000 Swiss watch has the precision of a pile driver.
30 years ago a 25,000 hour MTBF was a good thing - on a $20k drive - and now you get a 500,000 hours MTBF on a $60 drive, with 2000 times the capacity. Amazing.
If you back up your data you can keep your drives as long as you want. For critical apps I'd replace drives every three years.
Comments welcome, of course. The friendly folks at Seagate provided the formulas, but the conclusions are my own.