ie8 fix

100 year data preservation

By | September 22, 2010, 6:51am PDT

A 350 year old copy of Shakespeare is about as readable as a new one. But a 35 year old floppy? Preserving data is essential to digital civilization, but how? Here’s a new approach.

I’m at the Storage Networking Industry Association’s Storage Developers Conference in Silicon Valley. Sam Fineberg, HP Distinguished Technologist, gave a talk on long-term digital data preservation. These are my notes.

The problem
SNIA surveyed businesses about their data retention requirements. 68% of organizations needed to preserve data for 100 years or longer.

Data is fragile. Threats include:

  • Media/hardware obsolescence
  • even if you have an 8 inch floppy drive, there may not be hardware capable running the software required to read it, let alone the application to open the files on the floppy.

  • Software/format obsolescence. Remember WordStar?
  • Lost context/metadata. A document’s contents may appear mundane, but if it is from the President to the Secretary of State, its context makes it important.
  • Disaster
  • Human error
  • Media fault
  • Attack

Preserving bits is hard
Saving 1 PB for 50 years, with a 50% chance of damage gives a bit half-life of 1017 years. That isn’t achievable for large data sets.

There is no simple technical fix: we can’t predict change but know it will occur. Processes are key. Processes for data preservation must evolve to get us to the next step. Standards make it easier, but aren’t the whole answer.

What to preserve?
Bits? Applications? Context?

Is it even possible to preserve everything? For example, with an old book: the content? Paper wear? Political context? Bookplate? Where it falls open?

We will lose information moving from physical to digital. And we can’t know what future generations will consider valuable. For example, scientists collect old hollow metal buttons because they contain air samples from when the buttons were made. Who dreamed 150 years ago that would be valuable?

Preservation must facilitate storage of objects. Map to a wide variety of devices and technologies. Resilient.

SIRF’s up
SIRF: Self-contained Information Retention Format. SIRF is the digital equivalent of a physical container that archivists already know how to manage. SIRF containers hold preservation objects, a catalog and an object that labels the SIRF container.

SIRF maintains referential integrity, links between objects and context. Any SIRF compliant app can read and interpret the objects. Objects are migrated easily.

Use cases
A couple of use cases show some of the problems:

  • Legal holds and e-discovery. In civil suits the parties are required to preserve all requested documents - legal hold - under threat of severe penalties. But not all documents are included, such as client-attorney emails. How can all documents be preserved and the right ones selected for disclosure?
  • Biomedical info. Medical images are needed for patient history. But what if the patient was 12 years old and now is an adult? How do we protect their privacy and ensure that only the “right” adults now get access to it?

The Storage Bits take
Massive data loss can threaten civilization. The burning of the ancient Library of Alexandria, destroying hundreds of thousands of handwritten books, contributed to Europe’s Dark Ages as knowledge of ancient art, science and math were lost. The little recovered through Muslim scholars helped create the Enlightenment, but how much more was lost?

But the threat of digital data loss is far larger. Cheap storage and sophisticated data mining allows us to derive value from datasets that once we couldn’t even afford to collect, let alone analyze.

This is important work.

Comments welcome, of course.

Kick off your day with ZDNet's daily e-mail newsletter. It's the freshest tech news and opinion, served hot. Get it.

Topics

Robin Harris has been messing with computers for over 30 years and selling and marketing data storage for over 20 in companies large and small.

Disclosure

Robin Harris

Robin Harris is a president of TechnoQWAN, a consulting and analyst firm in northern Arizona. He also writes StorageMojo.com, a blog which accepts advertising from companies in the storage industry, and has a 25 year history with IT vendors. He has many industry contacts, many of whom are friends and all of whom he has opinions about. Robin has relationships with many companies in the technology industry. Every company he writes about may have sought to influence his opinion through carefully-crafted marketing messages and self-serving white papers, gifts ranging from desk calendars, t-shirts, lunches and trips as well as analyst or consulting assignments. He also invests in some technology companies. He may accept payment for services in stock as well. Robin discloses financial investments in or client relationships with companies named in Storage Bits. To help readers sort out the gold from the dross in his writings, Robin tries to communicate his reasons as clearly as he can. If you agree, you are intelligent and discerning. If you disagree, well, you disagree. In all cases, Robin encourages readers to subject everything they read, see or hear on the internet or from politicians to some simple questions: * What assumptions are implicit in the world view and judgments of the author? * What, if any, is the factual basis for the opinions the author expresses? * Is it reasonable, logical and clear? Your critical faculties: use ‘em or lose ‘em!

Biography

Robin Harris

Harris has been messing with computers for over 30 years and selling and marketing data storage for over 20 in companies large and small. He introduced a couple of multi-billion dollar storage products (DLT, the first Fibre Channel array) to market, as well as a many smaller ones. Earlier he spent 10 years marketing servers and networks. After leaving corporate life he founded TechnoQWAN, a consulting and analyst firm. He also developed StorageMojo into one of the top storage industry blogs.

Robin writes, consults, coaches and lives among the mountains of northern Arizona.

Related Discussions on TechRepublic

Did you know you can take part in these discussions with your ZDNet membership?
57
Comments

Join the conversation!

Just In

RE: 100 year data preservation
FAULKNE 13th Oct
Good day to confirm this comment I would appreciate T h e b e s t o f Z D N e t d e l i v e r e d your website very nice to everyone Yes, Oracle is the only one with shared-disk architecture, but that is there advantage. It means you can add or remove nodes and the database lives on. In a shared nothing architecture, if you lose a node, you lose the system. I'm sure Oracle appreciates EMC highlighting their advantage.I also desire to signal in your RSS feeds. Thank you as soon as once again and maintain up the great operate Awesome post! Thank you very much || thanks for nice content this is really benefit to me.
0 Votes
+ -
RE: 100 year data preservation
Vesicant Updated - 22nd Sep 2010
>A 350 year old copy of Shakespeare is about as readable as a new one

Sure, assuming there is a copy in the first place. How many books have been lost or destroyed or simply disintegrated in the last 350 years? And how many people today can read Old English in the original (I'm thinking of Beowulf, not Shakespeare)? That 350 year old Shakespeare folio only exists because somebody recognized it had extraordinary value and took extraordinary measures to preserve it. The same thing applies to digital, too. If you're that worried about it, you could start chiseling ones and zeros into stone tablets.
0 Votes
+ -
Contributr
RE: 100 year data preservation
R Harris 22nd Sep 2010
@Vesicant
There were a couple of elements to the preservation of Shakespeare's first folio. One, it was printed so there were many copies, which makes the survival of at least one much more likely. Two, the folios were expensive and purchased by rich people who had the room to store them and who took care of them.

England hasn't suffered an invasion since 1066 and, overall, war damage hasn't been large. These also helped preserve many British cultural treasures.

Robin
0 Votes
+ -
RE: 100 year data preservation
unclefixer@... 22nd Sep 2010
@R Harris Good points, but in addition to 1066, Germany bombed the bejesus out of England in WWII...
Good points, though-
0 Votes
+ -
German Bombing in WWII doesn't count.
Bill4 Updated - 24th Sep 2010
@unclefixer As everyone who has read the authoritative British history text 1066 And All That knows. History came to an end in 1918.
0 Votes
+ -
RE: 100 year data preservation
Maarek 22nd Sep 2010
What's the most universal file format and what type of media will last the longest for digital storage? CD/DVD right now has the longest life when storing data. If you dig deep enough online, you can find this media that guarantees life of 50 years or more. the cheap-o media will last 7, less if not in climate controlled areas.

The format? Well, TXT files are the most universal so far, like Courier New is the default font in most books next to Times New Roman. SQL & MySQL data are stored in TXT files that list commands to restore data in their proper manner. These files are normally stored on TAPE drives for quick backups or RAID arrays for redundancy.

IMO, someone from Iron Mountain would have the best answer to these questions.
0 Votes
+ -
Contributr
RE: 100 year data preservation
R Harris 22nd Sep 2010
@Maarek Sure, .txt files will preserve text, but what about the typography and design of a document? PDF-A, which is a PDF with the fonts included, is probably the best we have today. But since no digital format is over 60 years old - and those are obsolete - we can be pretty sure we'll have to keep migrating from old formats to new ones every so often.

Robin
0 Votes
+ -
RE: 100 year data preservation
zaghy2zy 22nd Sep 2010
@R Harris

Can't we just save them as JPEGs? I guess it's much more universal right now (and probably later on)... And probably later on we'll have much more advanced OCR tech to take out the text in the JPEGs... I guess that can be a much viable format...

However, I do think we could also include the reader in data preservation. Maybe not the actual reader, but it's source code so that they could just compile it or revise it later to read formats we have today. We'll probably have more advanced OSs that couldn't run our current programs anymore but probably if they have the source code, they could compile the program themselves to make it work later on. Right?
0 Votes
+ -
RE: 100 year data preservation
Liliana Pubill 22nd Sep 2010
@Maarek Absolute nonsense, that is, if you lock it up in a metal box away from disasters and do not touch it for 100 years. CDs and DVDs are extremely vulnerable. They're one of the cheapest form of useless storage. The format alone will be a problem eventually.
0 Votes
+ -
RE: 100 year data preservation
JeffLS Updated - 23rd Sep 2010
@Maarek
Storage 100+ years is an extremely interesting problem. It cannot be solved with some simple technology, but will require a heavy dose of people and process as well.

While some particular media (e.g., CD/DVD) may (or may not) have a long life, you'd also need to store one or more devices that can read such media, one or more controllers compatible with that device, one or more systems compatible with the controllers, display devices, software to read it, OS to run it all, etc, etc, etc.

Even microfilm/microfiche, which has a long shelf-life, becomes a problem. You need to keep plenty of the readers around, plus bulbs, power supplies, etc.

TXT files? Sure, if you don't care about formatting. Of course, TXT is based on a binary format, mostly ASCII - what if ASCII doesn't survive another 50 years?

I've been through many of these scenarios with companies over the years. It is a truly sticky problem. Some have opted to print multiple copies and store them in different vaults - the theory being that we should still be able to read them with our eyes.... though, who knows, maybe we'll lose that capacity too.

Context will continue to be a real problem too. Even that 350 yr old copy of Shakespeare has lost all context.
0 Votes
+ -
Constant copy & conversion?
jhimes 22nd Sep 2010
Well, you could have copied the 8" floppy data to 3.5" and then later archived to tape / hard drive / CD / DVD / blu-ray and then copy / convert at a later date to whatever the future holds.

The applications to access or read the old stuff may be a challenge where the conversion to PDF may not help (source code, images in their native format that may not fit on a PDF page & so on).

Then there is still print, film, micofiche & lots of other non digital formats.

Yikes!
0 Votes
+ -
RE: 100 year data preservation
Cardhu 23rd Sep 2010
@jhimes

Data conversion is unreliable. The US government has discovered this problem with CAD technical drawings for ships and aircraft having lifespans of 20, 40, 60, and even more years. The data conversion process does not consistently maintain correct scale and dimension, rendering these crucial technical documents highly unreliable for enhancements, maintenance, and repairs.

Yikes! indeed. You are absolutely correct. Print, film, and microfiche suffer from the limitations of their materials. Unless stored in inert environments, each of these materials oxidize and degrade. Much of our film heritage has been lost through this process.

The pel mel advance of technology has gained high volume and ease of data exchange at the expense of longevity. But longevity is a crucial customer requirement that is not getting the proper attention it deserves.
0 Votes
+ -
100 year data preservation
robcurr 22nd Sep 2010
If you are talking about preserving data beyond a possible collapse of a civilization then the only true way to go about it is to include some sort of primer which does not rely upon any cultural contexts or languages and definitely does not involve jpeg decompression or knowledge of any hardware or software. Similar to the story 'Contact' where an alien civilization transmitted instructions for building a time/space machine to us via radio waves. The primer used basic concepts of mathematics to teach the language of the actual instructions. Even then you have to assume that any future society would need to be adequately sophisticated in order to decipher the original information.
0 Votes
+ -
RE: 100 year data preservation
pptcrafter 22nd Sep 2010
Very interesting subject and discussion. Leads to the question "Is 100 years enough?"
0 Votes
+ -
Convert to "New" format - whatever that is
oldbaritone 22nd Sep 2010
I started with 8" floppies on S-100 systems, and I'm still doing this -

When a new format comes along and moves into common use, copy your essential data from its current media to the new one.

Over the years, I've migrated data from 8" to 5" to 1/4" tape backup (60 Megs on a single cartridge! Wow!...) to CDs, to DVDs, and I'm about to migrate some to DVD-double layers.

Generally, the new format holds much more data than its predecessor, and it seems like something new comes along every 5 years or so.

You may even stumble across something entertaining - like the life-size picture of Neil Armstrong on the moon, 11x17 multi-strike 16x, or the old "Bug Zapper"

wink
0 Votes
+ -
RE: 100 year data preservation
storyofmylife 22nd Sep 2010
This is our area of focus. We want your life stories preserved, b/c too much heartbreak we hear in our line of work about lost or destroyed legacies. Journals, pictures, back them up - put them out there, use a service like ours or others but PLEASE don't just throw them out there and assume that a company will be around. We've already acquired the assets (content: data) from 3 companies that have gone under. Others will follow. Thanks for bringing this important conversation out for discussion. We often joke with people not to leave what your future ancestors are reading about you up to the search engines, but if you WANT to preserve your stuff it's important to think about how.
0 Votes
+ -
This Just Scratches the Surface
Brett.Chapman@... 22nd Sep 2010
This just barely scratches the surface.
Think about source code - do you just save the text based source? What about the compiler? If I need a Windows XP system to run the compiler would I be able to get hard ware to install it? Would I even be able to activate an XP license in 25 years?
Also, where do you physically store all this data? Companies demand higher returns per square foot intheir facilities each year. Data store of legacy info is probably very low in the priority list. Also, you need to store this data in at least two separate physical locations to guard against disasters.
While wea re talking about storing data off site, who do you trust? What if a contracting service loses your data? The service could very well be too expensive for smaller companies and will only get more expensive in the future if liabilities increase.
You could spend a lot of time struggling with this issue in an IT department.
0 Votes
+ -
RE: 100 year data preservation
s.malandrin Updated - 22nd Sep 2010
Well I am a doctor and a microbiologist and I think that the Nature's example can spread some lights about how to preserve data. Think to this: Nature remembers how to make a human being since 4 milllions years ago, when appeared the first man. More, Nature remembers how to make a living being since 25 billions of years ago, when appeared the first living molecules... How nature stored this information without losing them for such a long times? First copies: Nature made lots of copies of valuable info (DNA), to cover the entire world. Second evolution: valuable info were kept alive with constant evolution of info, formats and containers, each copy similar to father but somehow different and actual. Third safety: Nature buried valueble info under a cover of nonsense or missense info, so that the risk of losing information by casual or systematic attacks to the system have only few chance to result effective (DNA valuable info aka "genes" are fragmented and dispersed in a context of nonsense or missense DNA). The loss of byte can't be avoided, so my question is: how many byte can you lose from your systems without an actual loss of information? (Tanks @ nfordzdn for critics)
0 Votes
+ -
RE: 100 year data preservation
nfordzdn 22nd Sep 2010
@s.malandrin -- I think that your message answers itself, in a way: it is filled with misspellings, misusage of words ("loosing" for "losing"), and grammatical errors, but we can still read it and understand your intent. (This is not meant to belittle your writing. I am a one-language person, so anyone who can communicate in more than one - even badly - is a step ahead of me.)
0 Votes
+ -
RE: 100 year data preservation
Liliana Pubill Updated - 22nd Sep 2010
@nfordzdn -- Hmm, the writing of prescriptions can barely be read at all by people like you and I. Who can decipher the writing of a doctor? But, they ARE saving lives nevertheless. Doctors spell like this because they can, they do not need to prove themselves to anyone, because they already have a degree.
0 Votes
+ -
RE: 100 year data preservation
JOHN_TUOHY Updated - 23rd Sep 2010
@nfordzdn Actually Malandrin's monograph is quite grammatically correct with only a small spelling error; it is an example of a slightly different format from what you're used to. As a practising scientist, he/she is expressing him/herself in point format whereas you are expressing yourself in essay format. If you wish to point out syntactical mistakes, please do not bury full stops within brackets

Personally I do not agree with Malandrin:
(1) We cannot afford to allow data integrity to 'evolve'. We require absolute integrity unless we bury the data in lossy images.
(2) With reference to 'Junk DNA', we cannot simply bury the real data within an enormous amount of junk data in order to dilute the statistical probability of corruption. It would be more efficient to use checksums & parity bits.
0 Votes
+ -
RE: 100 year data preservation
AdamSmithau 22nd Sep 2010
Interesting read, I was having a discussion with a Museum Curator a few weekends ago about this and he was asking the same questions. The points which are raised throughout this thread mirror our discussions and we ended up with two points; Firstly, we can build DNA like redundancy into the data we want to retain - be it through parity, RAID or the like to be able to reconstruct the information when single messages are damaged however as mentioned by @robcurr we also need to be aware that the devices needed to read these also need to be available.

His solution was simple and spelled out in the way that only someone with a museum and a big chequebook could; hardcopies outlining either the formats and hardware to be stored with the recording devices; or skip the reconstruction and leave printouts and models of the information for the archaeologists from the future to examine without needing the engineers to rebuild old technologies.
0 Votes
+ -
RE:100 year... the other face of the coin
s.malandrin 22nd Sep 2010
I'm worried with this thought: are we sure we need to remember more than what we remembered till now? A brain that is unable to select valuable info and at the same time forget all the thousands of sensation coming each second from all the sensitive cell of the body, is quickly overcome and become unable to relate with the outer world: in a word we call it "autism". What is the risk for a world unable to forget? Are we sure that the burning of the ancient Library of Alexandria only lead to Europe?s Dark Ages? Or this event was the beginning of a proces rebuilding of human knowledge through destruction of previous, culminated with Enlightenment? Changes happens, loss of data too: so changes happens because of loss of data?
0 Votes
+ -
RE: 100 year data preservation
wmscarpenter 22nd Sep 2010
Don't think about 100 year preservation.

Think one year preservation, repeat next year for 100 years.

Knowing what is supported now and what is essentially certain to be supported next year is not difficult.

You need to decide what to preserve at least annually so the above process shouldn't be a great increase in effort.
0 Votes
+ -
RE: 100 year data preservation
twaynesdomain 22nd Sep 2010
You missed some: Molecular travel. Sun's effects. Earth's effects. Gamma infiltration. Etc.
0 Votes
+ -
it's about time...
josephmartins 22nd Sep 2010
It only took SNIA how long to get its collective head out of its nether region and take the long-term preservation issue more seriously? SNIA should have set to work on this problem, in collaboration with ohhhh I don't know maybe ARMA, a decade ago.

Volumes upon volumes of research have already been shared on this subject from universities and other orgs around the world.

I cannot help but wonder how far SNIA will proceed, again, without involving/engaging subject matter experts outside the storage industry for assistance...as it failed to do back when it "defined" ILM. Strangely, the storage industry acts as if this is a new problem to be solved. It's not.

As an aside...most of the data shown above is from last year's SNIA SIRF presentation. I assume SNIA simply rehashed it at this year's SDC. Has it made any progress?
0 Votes
+ -
Value of Data
sboverie 22nd Sep 2010
On one hand it is good that a 350 year old copy of Shakespeare's work is readable and in good condition; on the other hand, there are millions of copies, in many languages of the same work but recently published.

The 350 year old book would have more value if it was the only source left of Shakespeare's work. That Shakespeare is still continuously published shows that this information is still viewed as relevent.

We also live in a time where entire languages are becoming extint. Some languages are preserved in recordings and include dictionaries and commentaries from the people who spoke those languages; but many disappear and are lost. The Rosetta Stone provided a key to translating Egyptian hieroglyphics by having enough Greek and Latin (?) to make the old script make sense. Even though hieroglyphs can be translated, we are not sure how the words were pronounced.

There will always be some data loss and some of that data really is not important enough to preserve more than one person's lifetime. Some information does get preserved but remains a mystery, Rosebud.
0 Votes
+ -
At times, even short term is difficult
rketchum@... 22nd Sep 2010
Say you don't have the funds to get some of your data storage updated. A motherboard goes out, old 8:, 5 1/4", 3 1/2", LS-120, Zip drive, Tape drive or tapes unusable. What happens to the data? I recently disposed of a Jazz drive. Can't read any of the stack of QIC-80, 3020 backup tapes. The tape drive works, but to read 3.2 GB tapes, requires a 2MB floppy accelerator card. The card is made by Pacific Data (for Exabyte) and the drivers that are with the software only work for DOS, Win 3.1 or Win 95.

A prime example of more things to come. Pencil and paper work the best, if you can still read my writing and have enough rooms in your climate controlled area to store everything.
0 Votes
+ -
I am not sure what they are planning....
Liliana Pubill Updated - 22nd Sep 2010
But CDs and DVDs are junk. Not only are they prone to scratches that will render the data unreadable immediately, but they're prone to cracking, breaking and burning in a fire. So they're just as vulnerable than most books (except for water based substances, but not solvents).

Most recently I scratched a CD pretty badly, just attempting to remove it from it's case. While attempting to open a 30 CDR pack, somehow when I removed the plastic the case slipped off my hands and when it hit the floor 30 discs scattered and severely scratched before they were even burned. Call it carelessness, ten years ago this would have never happened to me, but 10 years older now I see my reflexes are not as good as they used to be. How prone are we to drop a CD/DVD? We do it everyday, people do it all the time! I have a 250GB Firewire drive, and while a bit larger, it holds the information of about 80 DVDs and I've never dropped it. I know it wont last forever but temporarily, it is way more durable than any of my movies DVDs. Unless they come up with actual metal discs that can't be destroyed by water, fire, being scratched by a pen or a fingernail, and/or wont split-in-half so easily, then I will have to say data is always going to be vulnerable to disasters.
0 Votes
+ -
Plan now
tony@... 22nd Sep 2010
As an example, I have pictures from the first Casio digital camera. They were stored in a proprietary format - basically JPEG with some metadata. Realising that reading these was going to be difficult, I batch converted them to something else (TIF) but then set up some virtual machines (VMs) so that I could preserve old operating systems and software as a means of still being able to go back to these things.
The biggest problem with all of this is when you need hardware drivers - VMs tend to rely on the hardware of the underlying host, so I recently had to reinstall XP on a spare machine because the great HP film scanner that I have has no later drivers for it, and none of my SCSI cards have Win7 drivers either.
I do think that the JPEG file format is likely to be supported for a very long while. Not the best - I scanned most of my pictures some years ago into Flashpix, as this was lossless. So far, I can still run Photoshop with a plugin.
0 Votes
+ -
RE: 100 year data preservation
Doug_Dame@... 22nd Sep 2010
Random food for thought:

* there is ZERO information, in any format or structure, that I now own or control, that I personally will need or even want to re-use in 100 years.

* Shakespeare's works exist 350 years later BECAUSE they were recognized, in their time, as being "keepers", and that judgment has been re-affirmed more or less continuously ever since.

* An unknown but presumably large amount of other materials produced in Shakespeare's time has disappeared without a trace. Unless you are a historian or an academic, chances are 99% (my guess) you cannot name a single "lost document" created by any person in the world during Shakespeare's lifetime. Let alone estimate any putative "loss to civilization and human progress" resulting from the loss of such a document.

* the "value half-life" of most of today's "information" is short.

* The amount of valuable but unique information is very small. It perhaps is mainly found in original works of art and documents of historic provenance. In terms of actual "information worth keeping," in today's digital world, most everything worth saving for the long term is already saved in massively parallel fashion.

* Any argument based on a implicit premise that all information is worth going to some special effort to save for possible new/better use in 100+ years fails any kind of rational cost/benefit analysis. Any expectation or requirement that we should save copies of everything just in case is essentially "a tax on today" that will never pay off.

* Darwinism works for information too. By and large, the useful and good will survive, the marginal and obsolete will fade into the mists of time. It's the natural process, and there's no compelling evidence civilization needs to invest a lot of time & effort into doing more.
0 Votes
+ -
Convert the old format to new format and save it all on proven technology with repository and version no. use OAIS module.
http://www.exlibrisgroup.com/category/RosettaOverview
0 Votes
+ -
RE: 100 year data preservation
unclefixer@... 22nd Sep 2010
Being in this business, I wouldn't trust ANY of the media or hardware available currently to store data for that long... just because of having seen how they've been doing so far.
I've found notebooks that I've tossed in a box, stored in a shed through YEARS of harsh seasons, and those crappy notebooks have held up better than hard drives the same age-
I think they have a ways to go....
IN the meantime, I vote that all important data should be preserved by being written or printed and stuffed into crappy spiral notebooks, and kept in a shed! happy
www.dfwsupergeek.com
0 Votes
+ -
Storage
gbravin 22nd Sep 2010
Very good problem issue. The same applyes to other sectors. For example very few turntables can convert my vynils to digital music. No reader can convert music recorded in cassettes (8 tracks, do you remember them?)
0 Votes
+ -
RE: 100 year data preservation
Galdang 22nd Sep 2010
loosed, loos?ing, loos?es
v.tr.
1. To let loose; release: loosed the dogs.
2. To make loose; undo: loosed his belt.
3. To cast loose; detach: hikers loosing their packs at camp.
4. To let fly; discharge: loosed an arrow.
5. To release pressure or obligation from; absolve: loosed her from the responsibility.
6. To make less strict; relax: a leader's strong authority that was loosed by easy times.
0 Votes
+ -
RE: 100 year data preservation
Galdang 22nd Sep 2010
merriam-webster:

losing
adj
Definition of LOSING
1
: resulting in or likely to result in defeat
2
: marked by many losses or more losses than wins
0 Votes
+ -
RE: 100 year data preservation
chauvinemmons 23rd Sep 2010
Lets just start with Microsoft Office after office 2003 I find it unusable, Office 2007 2010 and now 2014 in the works
GOOD LUCK Im done computered out.
Back to pencil and paper for me.
I will sell you my copy of Office 2007 cheap, its worthless to me.
0 Votes
+ -
RE: 100 year data preservation
jendrisko 23rd Sep 2010
Good and interesting article - however the paragraph about burning of the Alexandrian library and ... rediscovering small part of the ancient knowledge through muslim scholars forgets "the Mission of the Irish Monks" - see http://en.wikipedia.org/wiki/How_the_Irish_Saved_Civilization

This suggests that:
- you need dedicated peolpe/community to preserve information/knowledge in difficult times, technologically You do the best possible and hope that it is enough...
- also "Zeitgeist" can contribute to forgetting even still available information, what about "political correctness" and the elite's view served through massmedia...?

[page 3, 4] For, as the Roman Empire fell, as all through Europe matted, unwashed barbarians descended on the Roman cities, looting artifacts and burning books, the Irish, who were just learning to read and write, took up the great labor of copying all of Western literature ? everything they could lay their hands on. These scribes then served as conduits through which the Greco-Roman and Judeo-Christian cultures were transmitted to the tribes of Europe, newly settled amid the rubble and ruined vineyards of the civilization they had overwhelmed. Without this Service of the Scribes, everything that happened subsequently would have been unthinkable. Without the Mission of the Irish Monks, who single-handedly re-founded European civilization throughout the continent in the bays and valleys of their exile, the world that came after them would have been an entirely different one ? a world without books. And our own world would never have come to be.

[page 5] Many historians fail to mention it entirely, and few advert to the breathtaking drama of this cultural cliffhanger. This is probably because it is easier to describe stasis (classical, then medieval) than movement (classical to medieval). It is also true that historians are generally expert in one period or the other, so that analysis of the transition falls outside their ? and everyone's? ? competence. At all events, I know of no single book now in print that is devoted to the subject of the transition, nor even one in which this subject plays a substantial part.
http://www.doyletics.com/arj/howtheir.htm
0 Votes
+ -
RE: 100 year data preservation
GreyTech 23rd Sep 2010
Microfilm is still the only 100 year archival media. It doesn't rely on any advanced technology. I does still need to be held in reasonable environmental conditions.

That is clearly not very useful for the storage of large databases where the importance is the ability to analyse them.

Clearly what is needed is a suitable replacement for paper and ink. We need to be able to read it with whatever technology is current in 100 years. The human eye is still the best long term bet. As Dr. Malandrin suggested DNA has retained the building blocks of life but not without evolving and changing and sometimes corrupting. To archive what we have now we need a mechanism to retain information without change or corruption and optical storage as in paper and ink is still the most retentive.
0 Votes
+ -
RE: 100 year data preservation
ugisv 23rd Sep 2010
Nice thread. I am the person that grooves reading and thinking about such futuristic issues.
I would like to agree with Doug_Dame that Darwinism works for information too and excited about s.malandrin's suggestion to learn from the nature how to preserve information and his/her raised question about risk for the world that is unable to forget.
Here follows "random food for thought" from me:
* to preserve the information while this planet is hospitable for our civilization I think we should not care too much about how to remember every single piece of information we produce and could relatively passively relay on the Darwinistic rules and be sure that all the substantial information will survive.
* To preserve the information after this or other planets or even the whole Universe become hostile to current form of human beings we have to exploit our survival instinct more intensively and like Bible teaches us (I still love this book) more work on changing ourselves than changing the environment and other forms of life and beings. This means that we have to more proactively evolve our evolution processes. In this case storing of relevant information we produced until now is subissue of the more larger issue of keeping human beings able to survive in different environments and to cooperate with other forms of life and beings.
0 Votes
+ -
RE: 100 year data preservation
esimel Updated - 23rd Sep 2010
s.malandrin is probably closest to reality. The way I see it, the only way to assure preservation is by using a dynamic system that is constantly and periodically monitoring the stored information (including the very important technical preservation metadata associated to each data file) and proceeds with the necessary migrations for each data type when the time comes. Forget media, this is the least of the problems. Much more important is to assure we or the proper machine understands the formats and can reproduce them accurately. And the only way to do it is to keep on migrating, forever and ever... Because this will become a very sophisticated and expensive process, I believe in the future we'll be relying on some kind of auto-replicating network of centralized official "heritage keeper" systems or something of the kind, that does the task for us.
0 Votes
+ -
RE: 100 year data preservation
dpbaird 23rd Sep 2010
I'm writing this just before I go back to working on our family history. It includes a Forward with advice to the readers. I'm storing documents in PDF/A format and images as JPEGs indexed with simple HTML files. My advice to our descendants is COPY, COPY, COPY. I admit that I can only do so much to anticipate the future. (I got my start with an 031 Key Punch in 1953.) Business records have the problems of data definitions (record formats may be the same but the significance of the data contents may not be known), application dependent data formats as well as recording media. As the saying goes, PLAN Ahead.
0 Votes
+ -
RE: 100 year data preservation
josephmartins 23rd Sep 2010
@dpbaird The challenge is that many companies do not plan ahead as they should. There's simply no incentive to do so. Employees make short-sighted decisions to achieve short term goals at the expense of long-term impact (thanks to poorly constructed incentives). As they see it, the future mess is some other poor schmuck's responsibility. They'll have long since moved on to ruin another organization.
0 Votes
+ -
RE: 100 year data preservation
roodyg 23rd Sep 2010
Personally, I say "good riddance". wink

Seriously, like an attic or basement that has seen one too many generations, we all have way too much digital "junk". It costs time, energy, and money to maintain and preserve, and 99% of it should have been forgotten years ago.

The other 1% probably isn't that critical either. It would be a shame if Shakespears works were lost forever, but it wouldn't end life as we know it.

And really, do the children in 2450 really need all the collected blog works of anyone?

Save the planet, delete almost everything.
0 Votes
+ -
Paper is still the best way.
vbprgrmr@... 23rd Sep 2010
All of my old computer programs, short stories and a novel were on 8" floppies. I lost them digitally because I forgot to transfer them as I updated computers over the years. But thankfully I did print them out. No sign of brittleness or discoloration yet. So if the writings are important to you or society, then Shakespeare's way is the best. Archive them to paper, or even better make it acid-free archival paper. Why trust a digital solution when the original way has served us well.
0 Votes
+ -
RE: 100 year data preservation
littlemas2 23rd Sep 2010
To me this seems like it will develop a whole industry. I am not an IT professional, but I already see the need for the ability to have virtual machines available on modern OS's. I had to spend quite a bit of time converting old WP documents for my Dad. I chose to store them on Google Docs in .doc format, because I figured if the standards change, Google will make conversion that much easier in the future.

I think for file storage and conversions, large cloud companies might be the answer. Instead of everyone having to be able to run a WinXP or Windows 98 to run a specific program such as an old WordPerfect, if I could simply upload my file to an on-line conversion service it would be much easier.

As far as hardware, I again think that the answer might be an industry that springs up to retain or re-manufacture old equipment. What we need is technology libraries that have the equivalent of microfilm readers.
0 Votes
+ -
RE: 100 year data preservation
Daddy Tadpole 23rd Sep 2010
In reply to some remarks about the lifetime of microfilm, it's currently rated at several hundre years. Silver on glass or well characterised polymer would be the obvious choice, but Ilford claim dye technology is equally reliable.

Photography has the advantage of being readable with simple optics (no drive needed).

However, as others have commented, little attention has been paid to recoverable coding. Some images could be recorded as such (one image per colour). Other kinds of data would require the coding algorithm to be indicated in text form on the same film as the data.
0 Votes
+ -
The best solution:
LittleM 23rd Sep 2010
Piracy.
0 Votes
+ -
RE: 100 year data preservation
Doug_Dame@... 23rd Sep 2010
While the Library of Alexandria and the Irish Monks are very significant historically, we need to recognize the very changed historical circumstances between then and now:

Then: copies of books made by hand. Very few copies exist.
Now: Copies of electronic media can be made in a few seconds in most cases. Hundreds to millions of copies exist of most things deemed to be useful.

Then: The "body of scientific knowledge" was small, concentrated in the brains of a small number of researchers, and growing slowly.
Now (in order): Large, many, almost too fast to keep up.

Then: One guy with an out of the box idea could jump knowledge forward significantly ... e.g. the Eureka moment.
Now: near-simultaneous discovery is the norm. e.g., even going back a hundred years as far as telephones, radios, flight. There's now tremendous redundancy (and therefore resilience) in the discovery process in almost every field.

Therefore the likelihood of a disaster that wipes out a significant chunk of human knowledge (without also wiping out all humans), or sets us back x-generations, is many orders of magnitude lower now than it was in any era before the invention of the printing press, aka the first practical, large-scale, information redundancy machine.

Unless of course a humongously massive extra-terrestrial electromagnetic pulse (EMP) knocks out all existing electronic equipment and storage media. happy
0 Votes
+ -
Inherent Reliability
amicalola 23rd Sep 2010
I believe Robin is looking at inherent reliability of the medium. Here is an interesting general article on the topic, that contains the comment: A piece of paper can last for centuries left alone in a dry, dark room. Nothing created by a computer has that kind of inherent longevity - nothing like it in fact. Computers and their contents only survive by the active and ongoing help of human beings.

Left alone, magnetic fields decay, etc. Perhaps the transient nature of the electronic medium, it's faster decay rate, will begin to catch up with us? I don't think it will, but one never knows.
0 Votes
+ -
RE: 100 year data preservation
Lerianis10 19th Oct 2010
All of these problems and issues will disappear once we move to SSD drives totally, which is coming in the near future.

What I am more worried about is older games (which ARE part of our culture) not being able to be 'played correctly' on newer OS's. Saving pages of a book is easy: JPEG it and done.
Saving games and other cultural stuff where the idiots want to DRM it out the wazoo is much harder.
0 Votes
+ -
Good day to confirm this comment I would appreciate T h e b e s t o f Z D N e t d e l i v e r e d your website very nice to everyone Yes, Oracle is the only one with shared-disk architecture, but that is there advantage. It means you can add or remove nodes and the database lives on. In a shared nothing architecture, if you lose a node, you lose the system. I'm sure Oracle appreciates EMC highlighting their advantage.I also desire to signal in your RSS feeds. Thank you as soon as once again and maintain up the great operate Awesome post! Thank you very much || thanks for nice content this is really benefit to me.

Join the conversation!

Formatting +
BB Codes - Note: HTML is not supported in forums
  • [b] Bold [/b]
  • [i] Italic [/i]
  • [u] Underline [/u]
  • [s] Strikethrough [/s]
  • [q] "Quote" [/q]
  • [ol][*] 1. Ordered List [/ol]
  • [ul][*] · Unordered List [/ul]
  • [pre] Preformat [/pre]
  • [quote] "Blockquote" [/quote]
ie8 fix

The best of ZDNet, delivered

ZDNet Newsletters

Get the best of ZDNet delivered straight to your inbox

Facebook Activity

White Papers, Webcasts, & Resources
ie8 fix