Long-term personal data storage
Summary: You can stick a newspaper clipping in a folder and read it in 50 years. Not so with digital content: both the media AND the format can become unreadable.
You can stick a newspaper clipping in a folder and read it in 50 years. Not so with digital content: both the media AND the format can become unreadable. With so much of the world's data - and yours - in digital form, more people wonder: how do I keep my pictures, music, videos, documents and more around for decades? Here's how.
The proper mindset Your data is valuable. Storage is cheap. Scrimping on capacity to save a few bucks is silly. If money is a real problem, plan to copy your most important data first. In a few months, when storage is cheaper, buy some more.
Remember, you will soon forget about the cost of the storage, but you may never forgive yourself for losing irreplaceable family or legal files.
One word, my friend: copies Neatness is one of the most common causes of data loss. You get the new external drive - or worse, RAID array - copy everything to it and then delete the originals. The drive or array goes south - and your data goes with it.
A RAID array is NOT a substitute for a data archive. RAID arrays break and all too often a single mistake - oops, pulled the wrong disk! - and your data is gone forever.
Cheap optical disks can slowly scramble your data. Hard drives crash. Even if your data is readable, if your application can't read it you are still out of luck.
Unnecessary neatness Instead of "everything in its place and a place for everything" you want "every thing in every place." The best policy is several copies across different media, preferably in different locations.
Storage is cheap. Use lots.
Privacy and encryption Encryption is a good idea only if you know you can remember a password for 10 years. Otherwise use physical security - locked doors - rather than encryption. Also, encryption is an application: are you sure that app will be around in 10 years?
If you must encrypt, you are likely better off with a free, open-source tool like TrueCrypt. Or check out 10 free open source encryption utilities.
A better bet than encryption for most of us: save data onto dual-layer DVDs - 8GB each - and store in a safe deposit box. Good density, high bandwidth and a disaster tolerant location.
Data and media formats Once you buy into the multiple copy strategy you are ready to consider the next issues. The first problem is picking data formats for maximum longevity. A file format is indicated by the file extension, such as .doc or .pdf, that follows the file name.
Documents For documents simple formats are best. An original copy printed on good quality, acid-free paper will last the longest and be readable by anyone.
Text formats are best for digital documents. ASCII text - from programs like Notebook or TextEdit - has been around for decades. Text files don't preserve complex formatting or graphics, but if it is content you want to preserve - like the typescript of my great-grandmother's Civil War letters - plain text files are likely to be readable 100 years from now.
If you must preserve formatting Rich Text Format (.rtf) documents are likely to be readable for a couple of decades. RTF is a Microsoft creation that uses text-based formatting - analogous to HTML - that is relatively easy to decode.
If preserving formatting, graphics and images is important, the Portable Document Format (PDF) is today's best bet. Now an ISO standard, PDF reading and writing software can be developed without royalties to Adobe. The downside is that Adobe keeps adding features to their PDF software - Acrobat - which could create compatibility issues down the road.
Audio The MP3 format is the best bet for the long haul. MP3 is widely supported and playable on most every media player.
The iTunes native AAC format can be converted to MP3 by right-clicking and selecting "create MP3" in the contextual menu (I don't own iTunes music, so that may not work with DRM'd AAC files).
When ripping, save to the highest MP3 quality - 320 kbps - that few ears can tell from uncompressed. Takes a little more space today, but in year you won't notice the difference. Storage is cheap.
Pictures There aren't any perfect solutions for pictures. Portable Network Graphics (PNG - pronounced "ping") is lossless and probably the best bet, but it doesn't have the widest software support. If the pictures are really important, print on acid-free paper with pigment inks and store in a cool, dry and dark place.
Avoid using proprietary formats from apps such as Photoshop, Illustrator or Autodesk. Those files are specialized and application dependent: no app, no picture.
Video Video is tough: none of the digital formats have been around that long; the file sizes are large; and creating copies non-trivial. I produce videos, but for this I spoke to an expert.
A 100,000 hour video archive I spoke to Sam Gustman, the Chief Technology Officer of the USC Shoah Foundation Institute for Visual History and Education. The Foundation Archive
. . . contains nearly 52,000 visual history testimonies of survivors and other witnesses of the Holocaust videotaped in 56 countries and in 32 languages.
They have over 100,000 hours of video on 235,000 tapes. Taping started some 20 years ago, so they've been dealing with the media and format issues for years.
USCSF is transferring all their original tapes - many in now-obsolete formats like BetaSP and VHS - to the 75Mbit motionJPEG2000 format. MJPEG2000 is the format chosen by the Library of Congress and is the basis for format used in the Digital Cinema Initiative. Translation: it has massive dollars and content repositories behind it.
In addition they are also making copies of all tapes in 5Mbit MPEG-2, Flash, QuickTime and Windows Media. The latter are heavily compressed for serving over the web and dispersing copies to other sites.
The complete archive requires 8,000 Terabytes of capacity on 2 high-end Sun StorageTek tape silos - each costing about million bucks. Every 3 years they copy everything to new tapes to ensure preservation.
They also maintain a set of tapes at an offsite repository in Pennsylvania - just in case the Big One hits LA.
Update: For more about magnetic media see The bell tolls for your magnetic media by Jason Perlow. End update.
What should you do? The basic idea is the same for personal video content: make 2 copies; keep them in different places; use a common format; and plan on making copies every few years.
Some specific tips:
- Media. Taiyo Yuden says it expects its DVD media to last more than 50 years ". . .under proper conditions like humidity, temperature, [no] direct sun light and recording status." T-Y media is well-regarded among video pros.
- Format. Motion JPEG is clearly a good choice if available. Otherwise use your OS vendor's most common format - Windows Media or Apple's QuickTime - is a reasonably safe choice.
- Hard drives? Disk drives aren't as cheap per GB as bulk DVD media, but they are convenient. AFAIK drive vendors don't spec drives for archiving, but Copan Systems, a disk archive vendor that spins disks down for extended periods, spins them up monthly for "disk aerobics" and sees a seven year disk life. Disk vendors I've spoken to have said - unofficially - that an annual spin up, copy, erase and rewrite should be fine.
What does Robin do? I maintain 3 backup systems on my multi-terabyte Mac Pro: hourly Time Machine file backups; daily Carbon Copy Cloner runs for bootable backups; and an offsite cloud backup. I replace disks at 3 years of age or sooner.
I also make DVDs with completed video projects for archive. Never used them, hope I never will, but there they are.
The Storage Bits take We're living in a digital age. Maintaining digital data is more complex than sticking paper in a file folder. But the rewards - easy search, massive capacity, multi-formats - are worth it.
Comments welcome, of course.
Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.
Talkback
How about flash memory?
No Good at all ...
Most Flash Drives are MLC, because it is substantially cheaper, but not nearly as reliable or fast as SLC (I paid $150 for the cheapest SLC 8 Gig when the MLC was $50) ...
Neither actually last very long - I've had at least two occasions where I was forced to reformat my Flash drive due to corruption - the O/S on multiple machines wouldn't recognize it ...
Flash drives are a convenient way to transfer files from one location to another, but should never be used for "permanent" storage.
Ludo
Ludo is correct: flash is not an archive medium
slowest, MLC flash is today about $1.75/GB vs $0.10/GB for
the cheapest disk. 8x Taiyo Yuden DVD-R can be had in 100
packs on line for $0.07/GB or less on sale.
HTH,
Robin
I think our ears were ringing yesterday, Robin
RE: Long-term personal data storage
DVDs? Are you kidding?
I have decided to store all my irreplaceable data online (Amazon S3), along with two copies at home, one on the main hard drive, and the other on a removable hard drive as backups to the main hard drive.
DVDs, as part of a balanced diet . . .
keep my completed videos on DVD and hard disk, as well
as on a Internet backup provider.
That said, not all "name brand" media is alike. T-Y seems
to have earned its reputation for quality media.
Robin
DVD's are NOT archival in the traditional sense
5 times?
Backup software for removable hard drives
Evaporating data on writeable optical disks
Nowadays, I burn everything TWICE on optical disks. I've had pretty good luck with Verbatim brand media, (Most of my lost data was on Taio Yuden.)
I have a friend who is trying to recover plans he drew in the '80s. He has good backups of the CAD files, he has good backups of the CAD program, what he doesn't have is the MS/DOS machine which is needed to run 'em.
BTW: I somehow have more confidence in the continuing support from open software projects than I do with proprietary software. If worst comes to worst with open source I can always take the old code and recompile and putz with it if need be to get it to work with the current OS & hardware. Proprietary software? You are just up the creek.
RE: MSDOS machine
Edit the two files: (back them up first)
CONFIG.NT
AUTOEXEC.NT
they're in the 'Windows\System32' folder
Then proceed to edit the original files to load ANSI.SYS, HIMEM.SYS etc.
WOW16 will kick in when he tries to run his program.
Much of this info was removed from XP, even though it is still supported on all the 32bit versions.
Amazon s3
Low-energy lightbulbs
Anyone reading this know anything more?
UV
Some halogen lamps also produce a noticeable UV output.
I haven't heard of UV from LEDs unless they were specifically designed to produce UV.
RE: UV
Any idea of the sort of UV power from low-energy Bulbs?
Re: UV
The problem of UV from fluorescent bulbs is a lot harder to solve, even with CFL's.
In reality though, it isn't just UV that threatens photos and paper. If anything printed is valuable to you, the best way to store it is in the dark of a box or cabinet.
Each revision to the PDF spec is backward-compatible
Surely an Acrobat document created in 2009 will be readable by Adobe Reader in 2040. Perhaps you mean that Adobe is layering feature upon feature that is not supported by the ISO standard for PDF. Thus, if someone in 2040 were to use a non-Adobe reader to view PDFs, a reader based on the 2032 ISO standard, some of the 2009 features--embedded video, rotatable 3D vectors, dissolves--won't be rendered. Good point.
Backward compatible!!!
Besides, backward compatibility only adds more and more head-aches for the software developers. The legacy stuff then soon becomes the cause of head-aches for them. That then means more bugs in the software, more crashes, more differences between "supposed" backward compatibility and real life.
I therefore never see backward compatibility as a long-term solution to file formats.
Zip 100 disks and the Clinton-Bush-Obama economic depression
still that I can't read. My DVD drive has died, so I transfer
coveted files over the net to other sysems, and, when I
can, burn CDs, because they're what I have... at least until
the end of the Clinton-Bush-Obama economic depression.