HDD warming: global data threat?

If there's one thing I hate, it's unsettled science. For instance: the effect of temperature on disk drives. Shorten their life or not? Most studies say no - including a new one - but Microsoft researchers disagree. Can’t we all just get along?
Written by Robin Harris, Contributor

The folks at Backblaze published a detailed blog post on observed effects of temperature on disk drives. Like most studies, they didn't find one:

After looking at data on over 34,000 drives, I found that overall there is no correlation between temperature and failure rate.

But then they ruined it - damn you, Backblaze! - by linking to a study by Microsoft and UVA researchers who DID find an issue. That blew my day as I had to, you know, look at the data and THINK.

Hate that. But here goes.

The Backblaze data
Backblaze looked at 17 drive models from Seagate, WD, Hitachi and Toshiba. Author Brian Beach used a point-biserial correlation coefficient on drive average temperatures and whether drives failed.

He found one drive - a Seagate 1.5TB Barracuda LP - that had a weak but statistically significant correlation between failure rate and higher temperature. The Annual Failure Rate (AFR) doubled from cool drives to warm (above average temperature) drives. But because so many continued to work fine at any temperature, the correlation was weak.

Two more models, a Seagate Barracuda 3TB and a Hitachi Deskstar, showed weaker correlations - but in opposite directions. The Hitachi failed slightly more often at 21°C than at 31°C, while the Seagate failed slightly more often at the higher temperature.

Oh great! Now too cold is bad too.

Microsoft/UVA study
The 2010 Microsoft study, Datacenter Scale Evaluation of the Impact of Temperature on Hard Disk Drive Failures by Sriram Sankar, Mark Shaw and Kushagra Vaid of Microsoft and Sudhanva Gurumurthi, U of Virginia, came to very different conclusions:

1) We show strong correlation between temperature observed at different location granularities and failures observed. . . .

2) Although average temperature shows a correlation to disk failures, we show that variations in temperature or workload changes do not show significant correlation to failures observed in drive locations.

3) We . . . show that Chassis design knobs (disk placement, fan speeds) have a larger impact than tuning Workload knobs (intensity, different workload patterns), on disk temperature.

4) With the help of Arrhenius based temperature models and the datacenter cost model, we . . . show that datacenter temperature control has a significant cost advantage over increased fan speeds.

Here's a couple of relevant tables:


Drive vendors have their say
Most drives today are spec'd at a 60°C (140°F) or even 70°C (158°F) operating temperature. Per the MS-UVA study, it is the average temperature, not variations in temperature, that affect drive life the most. If drives get really hot once in a while, not a big deal.

And hey, they say they'll operate, not that they'll last.

Reconciliation, to a point
Look at the data: Backblaze temps stop at 31°C while the MS/UVA study showed that AFR's are relatively flat up to 33°C and then start climbing. Not much disagreement between Backblaze and MS/UVA.

The Storage Bits take
One of the most popular myths about disk drives is that they are very sensitive to temperature. That may have been true 20 years ago, but it is clearly less so now. The drive vendors seem unconcerned as well.

Given that most users have a few dozen mixed age/vendor/chassis at most, these statistical musings have little predictive value. If you are running a data center and have thousands of drives, you should do a more careful analysis of the tradeoff between energy costs and increased disk failures.

The hidden storage market - between the 3 drive vendors and 8 or so Internet giants - is driving storage requirements now, not PCs or the enterprise. These warehouse scale systems are designed to tolerate drives failures gracefully, much more so than most enterprise infrastructures. 

Eyeballing the stats from these and other studies, most enterprises should aim for about 35°C (95°F) disk temps in temperature controlled data centers. Save money and reduce global warming.

Comments welcome, as always. Scientists, always picking at each other: feature, not a bug. People who say "settled science" don't understand science.

Editorial standards