Your points are good and in principle I agree with all of them. I would however like to raise some specific issues that make digitally stored data very vulnerable.
(I also wish to raise some very specific issues about the shortsighted approach to standards have plagued digital optical storage development. I will however post these comments later (hopefully tomorrow) as I've run out of time here.)
You're right, information always degrades. ...That's entropy for you! Sometimes I wish I'd never heard of Ludwig Boltzmann or Claude Shannon and that I could live in blissful ignorance of entropy's consequences, nevertheless entropy and information loss are central to my earlier post. In essence, archiving of data and its long-term integrity and survival requires, as physical laws dictate, that work be committed to keep that order.
Moreover, that slide from order into disorder and randomness is exacerbated when users don't actively participate to counter the process, although there are notable exceptions as you correctly point out 'We know these were important accomplishments only because somebody believed they should be preserved.'
I'd add however, that the perception of what a bygone culture considers valuable and ought to be kept for posterity may be totally different to that of the culture that unearths the information centuries later. Today, our archaeologists have far wider interests in Roman Civilization than the Romans would have deliberately kept for posterity. This of course is the great conundrum of data storage: how much information can be discarded before the data is effectively editorialized (or distorted to a point where full reconstruction is not possible), and how much does one keep before being overwhelmed? Information, coding theory and the laws of entropy may say one thing, practical reality, however, will most probably be vastly different.
When referring to the ephemeral nature of digital storage, I was endeavoring to make two key points:
(a) That for every given technology there are limits on the storage density, and the closer we push towards these limits the more the tendency for stored data to 'leak away', and;
(b) modern electronic data is almost exclusively stored in machine-readable form, that it exists in this non-human-readable form makes it more vulnerable. Machine obsolescence, stored format obsolescence, use of proprietary and unpublished formats, the widespread lack of interest in data storage standards and the lack of a wide or universal commitment to open standards, together with the fact that we humans cannot physically see the ageing process (as we would with say a book or work of art), often combine so as to effectively stop us intervening and protecting our data until it's too late.
Storage Density and Granularity
---------------------------------------
Papyrus has both longevity of physical form and is human-readable, that's to say it has an excellent biological interface. These properties confer storage durability but its course granularity means its storage density is intrinsically very low. Nevertheless, deterioration of the physical media through time, its imperfect translation into modern languages together with an incomplete understanding of its original idiomatic or contextual meaning within the culture in which it was written all ensure that its contents are subject to entropic information loss.
In contrast, modern digital data storage exhibits intrinsic differences, fine granularity for instance, thus, we expect and do see different issues and problems with this type of storage technology than say when compared to the much older papyrus example. I'll discuss issues with modern high-density hard disk drives first as they are an exemplary example of such storage (although optical storage, with its similar issues, isn't far behind).
The Hard Disk Drive: an Example of Problematic Data Storage
-----------------------------------------------------------------------------
As I see it, these are the key issues:
1. As mentioned, information stored on disk drives cannot be interpreted directly by biological sensors, eyes etc.; thus our inability to be able to directly interact with the data makes it more vulnerable (it's an out of sight out of mind issue).
2. Moreover, disk drive data cannot be interpreted without a considerable amount of sophisticated state-of-the-art technology: precision drive mechanics, head amplifiers, data separators, error-correction electronics etc. Even then, this 'raw' data needs the additional processing of a PC and all the paraphernalia that goes with it including software, word processors, spreadsheets etc. before a human can correctly interpret it. Only one aspect of this 'decoding' chain has to go wrong and the data cannot be interpreted at all.
3. Disk drives have a very short life in the grand scheme of things, so does the PC and its 'interpretation' software. Unless owners take active steps to ensure the continuing viability of the data, it will be lost and or its integrity violated.
4. Although hard disks have error correction, they do not have redundancy built in (error correction is necessary for drives to actually function the way they do but it is useless if the drive electronics fails). For example, there are no drives on the market with inbuilt redundancy such as dual head actuators, dual electronics, dual power supplies etc. all of which would operate totally independently of each other. Nor are there any drives made which would enable the user to trade bit density and speed for increased reliability let alone also have dual-hardware redundancy.
4.1 Here's an illustration everyone should understand. A 7200RPM 1TB drive made with dual actuators, dual independent electronics and the ability to lower the storage density to say 300MB (by resetting manufacturer's low-level formatting) and perhaps even the ability to lower its rotational speed to say 5600RPM, would be a much more reliable product, and I believe a very saleable one at that (albeit being more expensive). Imagine a Windows setting where you could set a drive reliability factor based on the type of data you are storing. There is no fundamental technical limitation preventing the manufacture of such a drive. That they do not exist backs up my assertion that our here-and-now society doesn't care too much about data integrity or its longevity.
4.2 That not many users care about data integrity or data archiving also backs up the similar point made by Robin Harris:
'The one remaining piece is for hard drive vendors to get serious about building archive-quality hard disks. I love their technology, but they aren't the most forward looking group.'
Drive manufactures aren't the most forward looking of groups because no none demands them to be. The only thing most buyers seem to care about is raw data capacity and concomitantly manufacturers respond by cramming as many gigabits per square inch of platter area as state-of-the art engineering permits. The common attitude is one of 'hang reliability, we'll worry about that if or when the disk crashes.'
4.3 It's a paradox that users have such a cavalier attitude to storing their data on vulnerable disk drives and that they don't give a damn about data archiving (backups) either, yet a high level of paranoia exists about viruses and other data security issues. These positions are almost diametrically opposed.
4.4 I have just about come to the conclusion that subtle marketing by disk drive manufactures together with an underreporting of drive failures accounts for the former attitude, lots of hype and FUD from antivirus manufacturers and the billion-dollar security industry accounts for the high profile of the latter. (I'm not saying that viruses and security issues don't exist, of course they do; but my own practical experience has been that hard disk failures--of which I've had many--have been, by far, a much greater threat to my data than have viruses and other security threats.)
5. As mentioned, hard disks sell on data density. However, if one examines the raw data coming directly off a modern disk drive head with an oscilloscope one is amazed at what one sees; recognizing data amongst the noise is nigh on impossible. In traditional engineering terms we'd say that the signal to noise ratio is about zilch. With such high data densities, it is only through the most sophisticated heads, head amplifier design and cleaver data separation techniques that these drives actually work at all. Trusting one's data to such fine margins without any hardware redundancy or proper backup procedures is quite a gamble, yet many--probably most of us operate this way for most of the time and we do so with little or no thought as to the consequences.
6. Because of the fiercely competitive nature of hard disk manufacture, manufacturers hide the exact nature of the way your data is laid down on the platter surface; it is both very proprietary and secret. This used not be the case, for example, Shugart Associates/Corporation produced beautifully clear and concise reference-grade manuals for its 8" floppy disk drives. These manuals were so good that I still keep a set on my office bookshelf to show staff how things were and still ought to be done. Today, Seagate Technologies, having long since morphed from its former Shugart identity, not only provides precious little information about its hard disks, but also has been caught out obfuscating information over firmware bugs in its 7200.11 hard disk drives. My, my, how things have changed for the worse over the past 20 or so years.
6.1 Proprietary and secret error correction and low-level formatting systems, whether justifiable or not, are a dangerous threat to data integrity. Not only do they obfuscate the actual storage process and thus thwart or make any independent analysis of a drive's dependability or long term reliability possible but also they can significantly complicate data recovery processes.
6.2 Drive manufactures don't advertise the fact that when you commit your storage to their disk drives that the only entity to know about the storage process is the manufacturer and it is not going to tell anyone--let alone a longsuffering user whose disk has crashed--about the way these storage processes work. We consumers still buy hard disk drives without even a whimper of complaint (as we've been conditioned by marketing propaganda to think that the current configuration is the only way in which these devices can exist). Moreover, the technical press, ZDNet et al, rarely give time to such 'mundane' matters of hard disk reliability; security holes in M$'s Internet Explorer are much more juicy and exciting news.