Long-term personal data storage

Summary:You can stick a newspaper clipping in a folder and read it in 50 years. Not so with digital content: both the media AND the format can become unreadable.

You can stick a newspaper clipping in a folder and read it in 50 years. Not so with digital content: both the media AND the format can become unreadable. With so much of the world's data - and yours - in digital form, more people wonder: how do I keep my pictures, music, videos, documents and more around for decades? Here's how.

The proper mindset Your data is valuable. Storage is cheap. Scrimping on capacity to save a few bucks is silly. If money is a real problem, plan to copy your most important data first. In a few months, when storage is cheaper, buy some more.

Remember, you will soon forget about the cost of the storage, but you may never forgive yourself for losing irreplaceable family or legal files.

One word, my friend: copies Neatness is one of the most common causes of data loss. You get the new external drive - or worse, RAID array - copy everything to it and then delete the originals. The drive or array goes south - and your data goes with it.

A RAID array is NOT a substitute for a data archive. RAID arrays break and all too often a single mistake - oops, pulled the wrong disk! - and your data is gone forever.

Cheap optical disks can slowly scramble your data. Hard drives crash. Even if your data is readable, if your application can't read it you are still out of luck.

Unnecessary neatness Instead of "everything in its place and a place for everything" you want "every thing in every place." The best policy is several copies across different media, preferably in different locations.

Storage is cheap. Use lots.

Privacy and encryption Encryption is a good idea only if you know you can remember a password for 10 years. Otherwise use physical security - locked doors - rather than encryption. Also, encryption is an application: are you sure that app will be around in 10 years?

If you must encrypt, you are likely better off with a free, open-source tool like TrueCrypt. Or check out 10 free open source encryption utilities.

A better bet than encryption for most of us: save data onto dual-layer DVDs - 8GB each - and store in a safe deposit box. Good density, high bandwidth and a disaster tolerant location.

Data and media formats Once you buy into the multiple copy strategy you are ready to consider the next issues. The first problem is picking data formats for maximum longevity. A file format is indicated by the file extension, such as .doc or .pdf, that follows the file name.

Documents For documents simple formats are best. An original copy printed on good quality, acid-free paper will last the longest and be readable by anyone.

Text formats are best for digital documents. ASCII text - from programs like Notebook or TextEdit - has been around for decades. Text files don't preserve complex formatting or graphics, but if it is content you want to preserve - like the typescript of my great-grandmother's Civil War letters - plain text files are likely to be readable 100 years from now.

If you must preserve formatting Rich Text Format (.rtf) documents are likely to be readable for a couple of decades. RTF is a Microsoft creation that uses text-based formatting - analogous to HTML - that is relatively easy to decode.

If preserving formatting, graphics and images is important, the Portable Document Format (PDF) is today's best bet. Now an ISO standard, PDF reading and writing software can be developed without royalties to Adobe. The downside is that Adobe keeps adding features to their PDF software - Acrobat - which could create compatibility issues down the road.

Audio The MP3 format is the best bet for the long haul. MP3 is widely supported and playable on most every media player.

The iTunes native AAC format can be converted to MP3 by right-clicking and selecting "create MP3" in the contextual menu (I don't own iTunes music, so that may not work with DRM'd AAC files).

When ripping, save to the highest MP3 quality - 320 kbps - that few ears can tell from uncompressed. Takes a little more space today, but in year you won't notice the difference. Storage is cheap.

Pictures There aren't any perfect solutions for pictures. Portable Network Graphics (PNG - pronounced "ping") is lossless and probably the best bet, but it doesn't have the widest software support. If the pictures are really important, print on acid-free paper with pigment inks and store in a cool, dry and dark place.

Avoid using proprietary formats from apps such as Photoshop, Illustrator or Autodesk. Those files are specialized and application dependent: no app, no picture.

Video Video is tough: none of the digital formats have been around that long; the file sizes are large; and creating copies non-trivial. I produce videos, but for this I spoke to an expert.

A 100,000 hour video archive I spoke to Sam Gustman, the Chief Technology Officer of the USC Shoah Foundation Institute for Visual History and Education. The Foundation Archive

. . . contains nearly 52,000 visual history testimonies of survivors and other witnesses of the Holocaust videotaped in 56 countries and in 32 languages.

They have over 100,000 hours of video on 235,000 tapes. Taping started some 20 years ago, so they've been dealing with the media and format issues for years.

USCSF is transferring all their original tapes - many in now-obsolete formats like BetaSP and VHS - to the 75Mbit motionJPEG2000 format. MJPEG2000 is the format chosen by the Library of Congress and is the basis for format used in the Digital Cinema Initiative. Translation: it has massive dollars and content repositories behind it.

In addition they are also making copies of all tapes in 5Mbit MPEG-2, Flash, QuickTime and Windows Media. The latter are heavily compressed for serving over the web and dispersing copies to other sites.

The complete archive requires 8,000 Terabytes of capacity on 2 high-end Sun StorageTek tape silos - each costing about million bucks. Every 3 years they copy everything to new tapes to ensure preservation.

They also maintain a set of tapes at an offsite repository in Pennsylvania - just in case the Big One hits LA.

Update: For more about magnetic media see The bell tolls for your magnetic media by Jason Perlow. End update.

What should you do? The basic idea is the same for personal video content: make 2 copies; keep them in different places; use a common format; and plan on making copies every few years.

Some specific tips:

  • Media. Taiyo Yuden says it expects its DVD media to last more than 50 years ". . .under proper conditions like humidity, temperature, [no] direct sun light and recording status." T-Y media is well-regarded among video pros.
  • Format. Motion JPEG is clearly a good choice if available. Otherwise use your OS vendor's most common format - Windows Media or Apple's QuickTime - is a reasonably safe choice.
  • Hard drives? Disk drives aren't as cheap per GB as bulk DVD media, but they are convenient. AFAIK drive vendors don't spec drives for archiving, but Copan Systems, a disk archive vendor that spins disks down for extended periods, spins them up monthly for "disk aerobics" and sees a seven year disk life. Disk vendors I've spoken to have said - unofficially - that an annual spin up, copy, erase and rewrite should be fine.

What does Robin do? I maintain 3 backup systems on my multi-terabyte Mac Pro: hourly Time Machine file backups; daily Carbon Copy Cloner runs for bootable backups; and an offsite cloud backup. I replace disks at 3 years of age or sooner.

I also make DVDs with completed video projects for archive. Never used them, hope I never will, but there they are.

The Storage Bits take We're living in a digital age. Maintaining digital data is more complex than sticking paper in a file folder. But the rewards - easy search, massive capacity, multi-formats - are worth it.

Comments welcome, of course.

Topics: Data Centers, Hardware, Storage

About

Harris has been working with computers for over 35 years and selling and marketing data storage for over 30 in companies large and small. He introduced a couple of multi-billion dollar storage products (DLT, the first Fibre Channel array) to market, as well as a many smaller ones. Earlier he spent 10 years marketing servers and networks.... Full Bio

zdnet_core.socialButton.googleLabel Contact Disclosure

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Related Stories

The best of ZDNet, delivered

You have been successfully signed up. To sign up for more newsletters or to manage your account, visit the Newsletter Subscription Center.
Subscription failed.