The Universe hates your data

Summary:Storage is the most difficult problem in information technology. Why? Because entropy is always working to destroy our data. There's only one strategy that works.

Why does data storage have to be so hard? Drive failures, bit rot, file system errors and more. CPUs and networks seem to just work - why not storage? Entropy, my friend, entropy. The Universe hates your data.

Really. Entropy refers to the universal tendency for systems to become less ordered over time.

For example, in an internal combustion engine the ignition of fuel drives an ordered set of actions: pistons move; valves open; crankshaft rotates. But as the heat of ignition diffuses across the mechanical components it becomes less useful: on a cold day it warms the car, but much of the fuel's energy escapes as waste heat and does no useful work.

In information theory entropy refers to how ordered - or predictable - a bitstream is. That's useful because ordered bit streams - say, a clear blue sky in a photo - are more compressible.

Copies vs originals But storage exists at the boundary of information theory and the physical world. In much of information theory - for example erasure codes - the goal is not maximum compression, but maximum reliability.

Networks commonly encode 8 bits of data into 10 bits to enable data recovery when errors occur. Packet networks - most data networks - don't only rely on 8/10 encoding: they keep copies of the data in buffers. If the receiving node has a problem they retransmit the packet. Networks work with copies - not originals.

But in storage we don't have that option: we store originals. So entropy is an even bigger problem.

That's why all workable data protection strategies rely on adding bits. The bits may be in a data stream as in 8/10 encoding, or they may be in copies of documents, such as backups. Or, most reliably, the extra bits are at every level of data transmission and storage.

The bottom line is that data at rest is always vulnerable to entropic decay. Your data is never 100% safe.

The Storage Bits take Techies love positive numbers: GHz; cores; data rates; access times. But data entropy is all about negativity: MTTF; AFR; MTTDL; rebuild times. The numbers are squishy and thinking about our data's mortality - and by extension, our own - isn't pleasant.

Yet storage industry scientists and engineers soldier on creating ever denser - more ordered - storage devices and systems. And, at the same time, creating data protection schemes to guard the ever more vulnerable data.

Some problems can be solved. Others can only be managed. Storage is, and always will be, among the latter.

So back up your data! The Universe is bigger than all of us and our storage systems.

Comments welcome, of course.

Topics: Storage, Data Centers, Hardware

About

Harris has been working with computers for over 35 years and selling and marketing data storage for over 30 in companies large and small. He introduced a couple of multi-billion dollar storage products (DLT, the first Fibre Channel array) to market, as well as a many smaller ones. Earlier he spent 10 years marketing servers and networks.... Full Bio

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.