The Universe hates your data

The Universe hates your data

Summary: Storage is the most difficult problem in information technology. Why? Because entropy is always working to destroy our data. There's only one strategy that works.

SHARE:

Why does data storage have to be so hard? Drive failures, bit rot, file system errors and more. CPUs and networks seem to just work - why not storage? Entropy, my friend, entropy. The Universe hates your data.

Really. Entropy refers to the universal tendency for systems to become less ordered over time.

For example, in an internal combustion engine the ignition of fuel drives an ordered set of actions: pistons move; valves open; crankshaft rotates. But as the heat of ignition diffuses across the mechanical components it becomes less useful: on a cold day it warms the car, but much of the fuel's energy escapes as waste heat and does no useful work.

In information theory entropy refers to how ordered - or predictable - a bitstream is. That's useful because ordered bit streams - say, a clear blue sky in a photo - are more compressible.

Copies vs originals But storage exists at the boundary of information theory and the physical world. In much of information theory - for example erasure codes - the goal is not maximum compression, but maximum reliability.

Networks commonly encode 8 bits of data into 10 bits to enable data recovery when errors occur. Packet networks - most data networks - don't only rely on 8/10 encoding: they keep copies of the data in buffers. If the receiving node has a problem they retransmit the packet. Networks work with copies - not originals.

But in storage we don't have that option: we store originals. So entropy is an even bigger problem.

That's why all workable data protection strategies rely on adding bits. The bits may be in a data stream as in 8/10 encoding, or they may be in copies of documents, such as backups. Or, most reliably, the extra bits are at every level of data transmission and storage.

The bottom line is that data at rest is always vulnerable to entropic decay. Your data is never 100% safe.

The Storage Bits take Techies love positive numbers: GHz; cores; data rates; access times. But data entropy is all about negativity: MTTF; AFR; MTTDL; rebuild times. The numbers are squishy and thinking about our data's mortality - and by extension, our own - isn't pleasant.

Yet storage industry scientists and engineers soldier on creating ever denser - more ordered - storage devices and systems. And, at the same time, creating data protection schemes to guard the ever more vulnerable data.

Some problems can be solved. Others can only be managed. Storage is, and always will be, among the latter.

So back up your data! The Universe is bigger than all of us and our storage systems.

Comments welcome, of course.

Topics: Storage, Data Centers, Hardware

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

21 comments
Log in or register to join the discussion
  • RE: The Universe hates your data

    We don't need no stinkin' backups.

    We just need a p2p-ish network that will keep our data moving thru the aether, forever.
    tiredofpickingusernames
  • Shortcuts by disk and network makers are a factor

    In efforts to reduce costs, disk and network gear makers have ignored or abandoned many safeguards for data. The worst offenders don't simply cause data loss, they MASK errors, leaving the user with undetected data corruption that becomes impossible to recover from. Backing up bad data just exacerbates the problem.

    There are several forms of enhanced error-correction that could be implemented, but they have been shunned because they reduce net data capacity or throughput, or because the cost of implementation eats into the thin margin that many device makers operate on.

    There are few alternatives available to end-users other than to create archive copies of data and pray. And as long as cheap storage is more marketable than reliable storage, it will remain the dominant factor.
    terry flores
    • Some things never change

      @terry flores
      [i]And as long as cheap storage is more marketable than reliable storage, it will remain the dominant factor. [/i]

      And forever be the source of tears and regret for the unprepared and uninitiated. :(

      -- Pity those who don't learn to back up their backups. At least the vital stuff. You know the bit(s).
      klumper
  • RE: The Universe hates your data

    What is the factor of additional safety when implementing an SSD solution? Using it for both primary storage and critical data backup.
    weather.guy
    • RE: The Universe hates your data

      @weather.guy <br><br>In my PC I use SSD in RAID 0 for the boot drive. I don't back up the boot drive. Should something happen to it I prefer to rebuild Windows. All other data is on rotating media. Important data is backed up to a set of 7 locations that rotate every day when PC is shut down. Once a month I do a full backup of the data drive to 1 of 3 HD. So I have 7 generations of important data and 3 generations of the rest.
      TrueDinosaur
  • It's worse than that, Jim

    You're thinking of a storage device as one unified entity - it's not. Each byte written to it has a variable decay rate. Which means even your *backups* are unreliable. So one way backups are slowly going to decay unless you completely and totally wipe your backup device - low level format it (to randomise the media) and rewrite every time.

    Otherwise, you'll write new data (which has a longer survival time) while the old data will be further decayed. So you get part of your filesystem for example refreshed in the last backup - but then the older part dies and all the files vanish.

    That's why BOTH techniques: backup AND recovery bits (like PAR) has to be done.
    TheWerewolf
  • Arunabh Das

    What about non-mechanical storage? Flash drives should solve the problem of data decay, right? - Arunabh Das
    arunabhdas
    • RE: The Universe hates your data

      Short answer: No

      Long answer: Flash memory is based on quantum level storage of charge. This charge leaks away over time. While Flash used to be designed to store data for decades, manufactures realize that customers don't buy flash based on data retention and have optimized for cost (smaller cells and storing multiple bits per cell (MLC)) in exchange for shorter data retention times. Since Flash memory is a commodity, cost is the main factor in market share.
      donaldrich
    • RE: The Universe hates your data

      @arunabhdas There's a reason they call them the LAWS of thermodynamics. Everything wears out over time.
      CobraA1
  • RE: The Universe hates your data

    Robin,
    Short of going with pen and paper, can you give examples of what type of media is "best" for long term archiving?
    Thanks.
    Bert
    riverab@...
    • RE: The Universe hates your data

      @riverab@...

      Punch cards? :-)
      TrueDinosaur
  • RE: The Universe hates your data

    There is no single "best" medium. You use multiple formats and devices. I have backups on hard drives, Zip disk, and even floppies. Every two weeks I make a bootable backup of my main hard drive, and really important files have multiple backups.
    GrizzledGeezer
  • How About a Case Study?

    Hey, Robin.

    I am enjoying your articles on backups and data retention. We obviously have a long way to go before we get to the longevity of carbon on goatskin. As a sequel to this article, how about writing an article that is a case study in which someone brings home their shiny new laptop that they will use with a digital camera and downloaded music. How ought this person go about performing backups and archives? Perhaps focus on non-cloud solutions. How does the end-user (self appointed admin) go about verifying backups and that the system really works? How many "disks" does he need? What media? How often to swap media? When the fateful day of /dev/hda failure arrives, (and we all know it's 'when,' not 'if') what pre-set steps should this person perform?

    I apologize if you've already written one of these. Just direct me to it and I'll shut up. I didn't go on a hunt before writing this note.

    Regards & Thanks --
    james
    amicalola
  • Stone Tablets...

    They seem to the longest lasting medium so far.
    Now we just need to build a fast reliable error correcting read/write mechanism.

    Then store them under an almost non destructive housing, like a pyramid built with 10 ton stones so as to protect them from the elements.

    Hey I think someone has already done that.
    dunn@...
    • RE: The Universe hates your data

      @dunn@... I tried that, but they are almost as hard to read as punch cards.
      Scrod
  • Good article and reading good comments

    I taught college courses in this field a number of years ago. I also gave a special lecture on the merits of various storage media, after which I kept my electronic notes. The article here does focus on some of my topic.
    If you're interested in reading further on my thoughts, go read the following original post for the discussion thread at
    http://www.tripadvisor.com/ShowTopic-g1-i12530-k2972197-Storing_your_Travel_photos_data_Long_Term-Travel_Gadgets_and_Gear.html
    DocTech
  • Software refresh

    Is there software that will run in the background that will refresh storage media. I have used to recover data from hard drives before. The only problem with the software is your system must be down for hours.
    Leftie
    • RE: Software refresh

      @Leftie Yes, there is a way to refresh the storage media. This is called "scrubbing" by ZFS (http://docs.oracle.com/cd/E19082-01/817-2271/gbbxi/index.html)

      Basically, it tells ZFS to read all your data, then using a combination of the ZFS end-to-end checksums and mirroring, it can detect and repair errors in your data as they appear.
      pinglet
  • RE: The Universe hates your data

    I've been fighting against entropy all my life; gravity drags at me, my hair grays, and my data shreds. Assume failure, plan for the worst, back up everything. If I could back myself up, I'd do it.
    Scrod
  • I think it's just you.

    "Why does data storage have to be so hard? Drive failures, bit rot, file system errors and more."

    I think you just got a bad batch of drives once and never recovered since.

    Okay, on occasion stuff does happen.

    But, honestly - it seems rare enough that I don't worry about it. I have backups just in case, but honestly it really doesn't happen enough to make it an issue.

    Okay, I may have an issue once a year or so, for less than a day. Not really enough to worry over, sorry.
    CobraA1