The Universe hates your data

By | June 21, 2010, 7:59am PDT

Summary: Storage is the most difficult problem in information technology. Why? Because entropy is always working to destroy our data. There’s only one strategy that works.

Why does data storage have to be so hard? Drive failures, bit rot, file system errors and more. CPUs and networks seem to just work - why not storage? Entropy, my friend, entropy. The Universe hates your data.

Really.
Entropy refers to the universal tendency for systems to become less ordered over time.

For example, in an internal combustion engine the ignition of fuel drives an ordered set of actions: pistons move; valves open; crankshaft rotates. But as the heat of ignition diffuses across the mechanical components it becomes less useful: on a cold day it warms the car, but much of the fuel’s energy escapes as waste heat and does no useful work.

In information theory entropy refers to how ordered - or predictable - a bitstream is. That’s useful because ordered bit streams - say, a clear blue sky in a photo - are more compressible.

Copies vs originals
But storage exists at the boundary of information theory and the physical world. In much of information theory - for example erasure codes - the goal is not maximum compression, but maximum reliability.

Networks commonly encode 8 bits of data into 10 bits to enable data recovery when errors occur. Packet networks - most data networks - don’t only rely on 8/10 encoding: they keep copies of the data in buffers. If the receiving node has a problem they retransmit the packet. Networks work with copies - not originals.

But in storage we don’t have that option: we store originals. So entropy is an even bigger problem.

That’s why all workable data protection strategies rely on adding bits. The bits may be in a data stream as in 8/10 encoding, or they may be in copies of documents, such as backups. Or, most reliably, the extra bits are at every level of data transmission and storage.

The bottom line is that data at rest is always vulnerable to entropic decay. Your data is never 100% safe.

The Storage Bits take
Techies love positive numbers: GHz; cores; data rates; access times. But data entropy is all about negativity: MTTF; AFR; MTTDL; rebuild times. The numbers are squishy and thinking about our data’s mortality - and by extension, our own - isn’t pleasant.

Yet storage industry scientists and engineers soldier on creating ever denser - more ordered - storage devices and systems. And, at the same time, creating data protection schemes to guard the ever more vulnerable data.

Some problems can be solved. Others can only be managed. Storage is, and always will be, among the latter.

So back up your data! The Universe is bigger than all of us and our storage systems.

Comments welcome, of course.

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Robin Harris has been messing with computers for over 30 years and selling and marketing data storage for over 20 in companies large and small.

Disclosure

Robin Harris

Robin Harris is a president of TechnoQWAN, a consulting and analyst firm in northern Arizona. He also writes StorageMojo.com, a blog which accepts advertising from companies in the storage industry, and has a 25 year history with IT vendors. He has many industry contacts, many of whom are friends and all of whom he has opinions about. Robin has relationships with many companies in the technology industry. Every company he writes about may have sought to influence his opinion through carefully-crafted marketing messages and self-serving white papers, gifts ranging from desk calendars, t-shirts, lunches and trips as well as analyst or consulting assignments. He also invests in some technology companies. He may accept payment for services in stock as well. Robin discloses financial investments in or client relationships with companies named in Storage Bits. To help readers sort out the gold from the dross in his writings, Robin tries to communicate his reasons as clearly as he can. If you agree, you are intelligent and discerning. If you disagree, well, you disagree. In all cases, Robin encourages readers to subject everything they read, see or hear on the internet or from politicians to some simple questions: * What assumptions are implicit in the world view and judgments of the author? * What, if any, is the factual basis for the opinions the author expresses? * Is it reasonable, logical and clear? Your critical faculties: use ‘em or lose ‘em!

Biography

Robin Harris

Harris has been messing with computers for over 30 years and selling and marketing data storage for over 20 in companies large and small. He introduced a couple of multi-billion dollar storage products (DLT, the first Fibre Channel array) to market, as well as a many smaller ones. Earlier he spent 10 years marketing servers and networks. After leaving corporate life he founded TechnoQWAN, a consulting and analyst firm. He also developed StorageMojo into one of the top storage industry blogs.

Robin writes, consults, coaches and lives among the mountains of northern Arizona.

28
Comments

Join the conversation!

Just In

RE: The Universe hates your data
FAULKNE 13th Oct
Good day to confirm this comment I would appreciate T h e b e s t o f Z D N e t d e l i v e r e d your website very nice to everyone Yes, Oracle is the only one with shared-disk architecture, but that is there advantage. It means you can add or remove nodes and the database lives on. In a shared nothing architecture, if you lose a node, you lose the system. I'm sure Oracle appreciates EMC highlighting their advantage.I also desire to signal in your RSS feeds. Thank you as soon as once again and maintain up the great operate Awesome post! Thank you very much || thanks for nice content this is really benefit to me.
0 Votes
+ -
RE: The Universe hates your data
tiredofpickingusernames 21st Jun 2010
We don't need no stinkin' backups.

We just need a p2p-ish network that will keep our data moving thru the aether, forever.
0 Votes
+ -
In efforts to reduce costs, disk and network gear makers have ignored or abandoned many safeguards for data. The worst offenders don't simply cause data loss, they MASK errors, leaving the user with undetected data corruption that becomes impossible to recover from. Backing up bad data just exacerbates the problem.

There are several forms of enhanced error-correction that could be implemented, but they have been shunned because they reduce net data capacity or throughput, or because the cost of implementation eats into the thin margin that many device makers operate on.

There are few alternatives available to end-users other than to create archive copies of data and pray. And as long as cheap storage is more marketable than reliable storage, it will remain the dominant factor.
0 Votes
+ -
Some things never change
klumper 21st Jun 2010
@terry flores
And as long as cheap storage is more marketable than reliable storage, it will remain the dominant factor.

And forever be the source of tears and regret for the unprepared and uninitiated. sad

-- Pity those who don't learn to back up their backups. At least the vital stuff. You know the bit(s).
0 Votes
+ -
RE: The Universe hates your data
weather.guy 21st Jun 2010
What is the factor of additional safety when implementing an SSD solution? Using it for both primary storage and critical data backup.
0 Votes
+ -
RE: The Universe hates your data
TrueDinosaur Updated - 22nd Jun 2010
@weather.guy

In my PC I use SSD in RAID 0 for the boot drive. I don't back up the boot drive. Should something happen to it I prefer to rebuild Windows. All other data is on rotating media. Important data is backed up to a set of 7 locations that rotate every day when PC is shut down. Once a month I do a full backup of the data drive to 1 of 3 HD. So I have 7 generations of important data and 3 generations of the rest.
0 Votes
+ -
It's worse than that, Jim
TheWerewolf 21st Jun 2010
You're thinking of a storage device as one unified entity - it's not. Each byte written to it has a variable decay rate. Which means even your *backups* are unreliable. So one way backups are slowly going to decay unless you completely and totally wipe your backup device - low level format it (to randomise the media) and rewrite every time.

Otherwise, you'll write new data (which has a longer survival time) while the old data will be further decayed. So you get part of your filesystem for example refreshed in the last backup - but then the older part dies and all the files vanish.

That's why BOTH techniques: backup AND recovery bits (like PAR) has to be done.
0 Votes
+ -
Arunabh Das
arunabhdas 21st Jun 2010
What about non-mechanical storage? Flash drives should solve the problem of data decay, right? - Arunabh Das
0 Votes
+ -
RE: The Universe hates your data
donaldrich 22nd Jun 2010
Short answer: No

Long answer: Flash memory is based on quantum level storage of charge. This charge leaks away over time. While Flash used to be designed to store data for decades, manufactures realize that customers don't buy flash based on data retention and have optimized for cost (smaller cells and storing multiple bits per cell (MLC)) in exchange for shorter data retention times. Since Flash memory is a commodity, cost is the main factor in market share.
0 Votes
+ -
RE: The Universe hates your data
CobraA1 22nd Jun 2010
@arunabhdas There's a reason they call them the LAWS of thermodynamics. Everything wears out over time.
0 Votes
+ -
RE: The Universe hates your data
riverab@... 22nd Jun 2010
Robin,
Short of going with pen and paper, can you give examples of what type of media is "best" for long term archiving?
Thanks.
Bert
0 Votes
+ -
RE: The Universe hates your data
TrueDinosaur 22nd Jun 2010
@riverab@...

Punch cards? happy
0 Votes
+ -
RE: The Universe hates your data
GrizzledGeezer 22nd Jun 2010
There is no single "best" medium. You use multiple formats and devices. I have backups on hard drives, Zip disk, and even floppies. Every two weeks I make a bootable backup of my main hard drive, and really important files have multiple backups.
0 Votes
+ -
How About a Case Study?
amicalola 22nd Jun 2010
Hey, Robin.

I am enjoying your articles on backups and data retention. We obviously have a long way to go before we get to the longevity of carbon on goatskin. As a sequel to this article, how about writing an article that is a case study in which someone brings home their shiny new laptop that they will use with a digital camera and downloaded music. How ought this person go about performing backups and archives? Perhaps focus on non-cloud solutions. How does the end-user (self appointed admin) go about verifying backups and that the system really works? How many "disks" does he need? What media? How often to swap media? When the fateful day of /dev/hda failure arrives, (and we all know it's 'when,' not 'if') what pre-set steps should this person perform?

I apologize if you've already written one of these. Just direct me to it and I'll shut up. I didn't go on a hunt before writing this note.

Regards & Thanks --
james
0 Votes
+ -
Stone Tablets...
dunn@... 22nd Jun 2010
They seem to the longest lasting medium so far.
Now we just need to build a fast reliable error correcting read/write mechanism.

Then store them under an almost non destructive housing, like a pyramid built with 10 ton stones so as to protect them from the elements.

Hey I think someone has already done that.
0 Votes
+ -
RE: The Universe hates your data
Scrod 22nd Jun 2010
@dunn@... I tried that, but they are almost as hard to read as punch cards.
0 Votes
+ -
I taught college courses in this field a number of years ago. I also gave a special lecture on the merits of various storage media, after which I kept my electronic notes. The article here does focus on some of my topic.
If you're interested in reading further on my thoughts, go read the following original post for the discussion thread at
http://www.tripadvisor.com/ShowTopic-g1-i12530-k2972197-Storing_your_Travel_photos_data_Long_Term-Travel_Gadgets_and_Gear.html
0 Votes
+ -
Software refresh
Leftie 22nd Jun 2010
Is there software that will run in the background that will refresh storage media. I have used to recover data from hard drives before. The only problem with the software is your system must be down for hours.
0 Votes
+ -
RE: The Universe hates your data
Scrod 22nd Jun 2010
I've been fighting against entropy all my life; gravity drags at me, my hair grays, and my data shreds. Assume failure, plan for the worst, back up everything. If I could back myself up, I'd do it.
0 Votes
+ -
I think it's just you.
CobraA1 22nd Jun 2010
"Why does data storage have to be so hard? Drive failures, bit rot, file system errors and more."

I think you just got a bad batch of drives once and never recovered since.

Okay, on occasion stuff does happen.

But, honestly - it seems rare enough that I don't worry about it. I have backups just in case, but honestly it really doesn't happen enough to make it an issue.

Okay, I may have an issue once a year or so, for less than a day. Not really enough to worry over, sorry.
0 Votes
+ -
RE: The Universe hates your data
Tom62 28th Jun 2010
Wouldn't holographic data storage be an option?
0 Votes
+ -
I also desire to signal in your RSS feeds. Thank you as soon as once again and maintain up the great operate! nccma cooler
0 Votes
+ -
I used to be more than happy to seek out this internet-site.I wanted to thanks in your time for this glorious read!! I positively enjoying each little bit of it and I have you bookmarked to check out new stuff you weblog post. this thread is amazing i like your work and i appreciate you that you have share a useful stuff thanks for sharing the i shop abatwa
0 Votes
+ -
I used to be more than happy to seek out this internet-site.I wanted to thanks in your time for this glorious read!! I positively enjoying each little bit of it and I have you bookmarked to check out new stuff you weblog post.Bookmarking now thanks please consider a follow up post. power sa shop
0 Votes
+ -
I think the representation of this article is actually superb one. This is my first visit to your site. Thanks a lot and keep sharing the information. Keep updating the information for all of us. Thanks ZDNet Government was launched as the brand's first industry vertical, with a mission to cater to IT professionals in the public secto I agree with your post. However, do you have any sources I can cite for my paper wheel car com bury
0 Votes
+ -
Well welcome, hopefully you can become a vital member of the community and really help to push far ahead of google. Which Im sure the development team would love. This will of course earn you alot points too and get you on the leaders board. z d n e t t h a n k Im not sure i come to an agreement with you on every level, howevor it absolutely was a good posting, many thanks for taking the time to put up your ideas.
0 Votes
+ -
Thanks nice info z d n e t I really liked your current article write more..let me add you to its favorite The articles you have on zdnet s i t e are always so enjoyable to read. Good work and I bookmarked it.
0 Votes
+ -
Fantastic news about the new release.I positively enjoying each little bit of it and I have you b o o k m a r k e d to check out new stuff you weblog post.Im not sure i come to an agreement with you on every level, howevor it absolutely was a good posting, many thanks for taking the time to put up your ideas
0 Votes
+ -
Good day to confirm this comment I would appreciate T h e b e s t o f Z D N e t d e l i v e r e d your website very nice to everyone Yes, Oracle is the only one with shared-disk architecture, but that is there advantage. It means you can add or remove nodes and the database lives on. In a shared nothing architecture, if you lose a node, you lose the system. I'm sure Oracle appreciates EMC highlighting their advantage.I also desire to signal in your RSS feeds. Thank you as soon as once again and maintain up the great operate Awesome post! Thank you very much || thanks for nice content this is really benefit to me.

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix