X
Business

Contemplating Google's drive failure rates

Google has attracted a lot of attention with a new study that contradicts the accepted wisdom that hard drives are more likely to fail in cool conditions than warm ones. However, I don't think we ought to be switching off the datacentre air conditioners any time soon.
Written by Angus Kidman, Contributor

Google has attracted a lot of attention with a new study that contradicts the accepted wisdom that hard drives are more likely to fail in warm conditions than cool ones. However, I don't think we ought to be switching off the datacentre air conditioners any time soon.

Google's study, imaginatively titled "Failure Trends in a Large Disk Drive Population", has drawn comment because of two conclusions it draws that go against conventional wisdom. It is widely believed that hard drives are more likely to fail after extended periods of continued use, and that warm temperatures are detrimental to their performance. The latter is certainly accepted wisdom in the datacentre community, where air conditioning management is just as important as capacity planning.

However, Google's study of its own hard drive failure rates -- drawn from the seemingly endless array of bog-standard drives it uses to index and cache Internet content -- suggests almost the opposite. Drives which are only used infrequently are just as likely to fail as those used continuously, and temperature does not appear to influence drive failure rates anywhere as much as we think. "Surprisingly, we found that temperature and activity levels were much less correlated with drive failures than previously reported," the study noted.

The sheer volume of drives which Google churns through (the study covers more than 100,000 drives in total) means that its conclusions have more than a patina of scientific credibility. However, it doesn't mean they can be translated directly into most other business scenarios.

For one thing, Google uses its own internally developed file system on the drives. Secondly, the applications which it runs are quite different from those of a typical business. So directly translating its conclusions into operational guidelines for other enterprises might be risky.

To be fair, Google's researchers don't make such sweeping suggestions, but they do raise the possibility. In a discussion of their finding that drives older than three years are more susceptible to warm temperatures than newer models, they note: "This is a surprising result, which could indicate that datacentre or server designers have more freedom than previously thought when setting operating temperatures for equipment containing disk drives." Possibly, but if the datacentre is already working, why fix it?

One argument for doing so, of course, is to reduce the hideously large electricity bills associated with these facilities. But that would require careful calculations of its own: the "Google model" is based on instantly replacing failed drives as required, so what you save on electricity you might still end up spending on storage.

Editorial standards