The Universe hates your data

Storage is the most difficult problem in information technology. Why? Because entropy is always working to destroy our data. There's only one strategy that works.
Written by Robin Harris, Contributor

Why does data storage have to be so hard? Drive failures, bit rot, file system errors and more. CPUs and networks seem to just work - why not storage? Entropy, my friend, entropy. The Universe hates your data.

Really. Entropy refers to the universal tendency for systems to become less ordered over time.

For example, in an internal combustion engine the ignition of fuel drives an ordered set of actions: pistons move; valves open; crankshaft rotates. But as the heat of ignition diffuses across the mechanical components it becomes less useful: on a cold day it warms the car, but much of the fuel's energy escapes as waste heat and does no useful work.

In information theory entropy refers to how ordered - or predictable - a bitstream is. That's useful because ordered bit streams - say, a clear blue sky in a photo - are more compressible.

Copies vs originals But storage exists at the boundary of information theory and the physical world. In much of information theory - for example erasure codes - the goal is not maximum compression, but maximum reliability.

Networks commonly encode 8 bits of data into 10 bits to enable data recovery when errors occur. Packet networks - most data networks - don't only rely on 8/10 encoding: they keep copies of the data in buffers. If the receiving node has a problem they retransmit the packet. Networks work with copies - not originals.

But in storage we don't have that option: we store originals. So entropy is an even bigger problem.

That's why all workable data protection strategies rely on adding bits. The bits may be in a data stream as in 8/10 encoding, or they may be in copies of documents, such as backups. Or, most reliably, the extra bits are at every level of data transmission and storage.

The bottom line is that data at rest is always vulnerable to entropic decay. Your data is never 100% safe.

The Storage Bits take Techies love positive numbers: GHz; cores; data rates; access times. But data entropy is all about negativity: MTTF; AFR; MTTDL; rebuild times. The numbers are squishy and thinking about our data's mortality - and by extension, our own - isn't pleasant.

Yet storage industry scientists and engineers soldier on creating ever denser - more ordered - storage devices and systems. And, at the same time, creating data protection schemes to guard the ever more vulnerable data.

Some problems can be solved. Others can only be managed. Storage is, and always will be, among the latter.

So back up your data! The Universe is bigger than all of us and our storage systems.

Comments welcome, of course.

Editorial standards