Long-term personal data storage

Long-term personal data storage

Summary: You can stick a newspaper clipping in a folder and read it in 50 years. Not so with digital content: both the media AND the format can become unreadable.

SHARE:

You can stick a newspaper clipping in a folder and read it in 50 years. Not so with digital content: both the media AND the format can become unreadable. With so much of the world's data - and yours - in digital form, more people wonder: how do I keep my pictures, music, videos, documents and more around for decades? Here's how.

The proper mindset Your data is valuable. Storage is cheap. Scrimping on capacity to save a few bucks is silly. If money is a real problem, plan to copy your most important data first. In a few months, when storage is cheaper, buy some more.

Remember, you will soon forget about the cost of the storage, but you may never forgive yourself for losing irreplaceable family or legal files.

One word, my friend: copies Neatness is one of the most common causes of data loss. You get the new external drive - or worse, RAID array - copy everything to it and then delete the originals. The drive or array goes south - and your data goes with it.

A RAID array is NOT a substitute for a data archive. RAID arrays break and all too often a single mistake - oops, pulled the wrong disk! - and your data is gone forever.

Cheap optical disks can slowly scramble your data. Hard drives crash. Even if your data is readable, if your application can't read it you are still out of luck.

Unnecessary neatness Instead of "everything in its place and a place for everything" you want "every thing in every place." The best policy is several copies across different media, preferably in different locations.

Storage is cheap. Use lots.

Privacy and encryption Encryption is a good idea only if you know you can remember a password for 10 years. Otherwise use physical security - locked doors - rather than encryption. Also, encryption is an application: are you sure that app will be around in 10 years?

If you must encrypt, you are likely better off with a free, open-source tool like TrueCrypt. Or check out 10 free open source encryption utilities.

A better bet than encryption for most of us: save data onto dual-layer DVDs - 8GB each - and store in a safe deposit box. Good density, high bandwidth and a disaster tolerant location.

Data and media formats Once you buy into the multiple copy strategy you are ready to consider the next issues. The first problem is picking data formats for maximum longevity. A file format is indicated by the file extension, such as .doc or .pdf, that follows the file name.

Documents For documents simple formats are best. An original copy printed on good quality, acid-free paper will last the longest and be readable by anyone.

Text formats are best for digital documents. ASCII text - from programs like Notebook or TextEdit - has been around for decades. Text files don't preserve complex formatting or graphics, but if it is content you want to preserve - like the typescript of my great-grandmother's Civil War letters - plain text files are likely to be readable 100 years from now.

If you must preserve formatting Rich Text Format (.rtf) documents are likely to be readable for a couple of decades. RTF is a Microsoft creation that uses text-based formatting - analogous to HTML - that is relatively easy to decode.

If preserving formatting, graphics and images is important, the Portable Document Format (PDF) is today's best bet. Now an ISO standard, PDF reading and writing software can be developed without royalties to Adobe. The downside is that Adobe keeps adding features to their PDF software - Acrobat - which could create compatibility issues down the road.

Audio The MP3 format is the best bet for the long haul. MP3 is widely supported and playable on most every media player.

The iTunes native AAC format can be converted to MP3 by right-clicking and selecting "create MP3" in the contextual menu (I don't own iTunes music, so that may not work with DRM'd AAC files).

When ripping, save to the highest MP3 quality - 320 kbps - that few ears can tell from uncompressed. Takes a little more space today, but in year you won't notice the difference. Storage is cheap.

Pictures There aren't any perfect solutions for pictures. Portable Network Graphics (PNG - pronounced "ping") is lossless and probably the best bet, but it doesn't have the widest software support. If the pictures are really important, print on acid-free paper with pigment inks and store in a cool, dry and dark place.

Avoid using proprietary formats from apps such as Photoshop, Illustrator or Autodesk. Those files are specialized and application dependent: no app, no picture.

Video Video is tough: none of the digital formats have been around that long; the file sizes are large; and creating copies non-trivial. I produce videos, but for this I spoke to an expert.

A 100,000 hour video archive I spoke to Sam Gustman, the Chief Technology Officer of the USC Shoah Foundation Institute for Visual History and Education. The Foundation Archive

. . . contains nearly 52,000 visual history testimonies of survivors and other witnesses of the Holocaust videotaped in 56 countries and in 32 languages.

They have over 100,000 hours of video on 235,000 tapes. Taping started some 20 years ago, so they've been dealing with the media and format issues for years.

USCSF is transferring all their original tapes - many in now-obsolete formats like BetaSP and VHS - to the 75Mbit motionJPEG2000 format. MJPEG2000 is the format chosen by the Library of Congress and is the basis for format used in the Digital Cinema Initiative. Translation: it has massive dollars and content repositories behind it.

In addition they are also making copies of all tapes in 5Mbit MPEG-2, Flash, QuickTime and Windows Media. The latter are heavily compressed for serving over the web and dispersing copies to other sites.

The complete archive requires 8,000 Terabytes of capacity on 2 high-end Sun StorageTek tape silos - each costing about million bucks. Every 3 years they copy everything to new tapes to ensure preservation.

They also maintain a set of tapes at an offsite repository in Pennsylvania - just in case the Big One hits LA.

Update: For more about magnetic media see The bell tolls for your magnetic media by Jason Perlow. End update.

What should you do? The basic idea is the same for personal video content: make 2 copies; keep them in different places; use a common format; and plan on making copies every few years.

Some specific tips:

  • Media. Taiyo Yuden says it expects its DVD media to last more than 50 years ". . .under proper conditions like humidity, temperature, [no] direct sun light and recording status." T-Y media is well-regarded among video pros.
  • Format. Motion JPEG is clearly a good choice if available. Otherwise use your OS vendor's most common format - Windows Media or Apple's QuickTime - is a reasonably safe choice.
  • Hard drives? Disk drives aren't as cheap per GB as bulk DVD media, but they are convenient. AFAIK drive vendors don't spec drives for archiving, but Copan Systems, a disk archive vendor that spins disks down for extended periods, spins them up monthly for "disk aerobics" and sees a seven year disk life. Disk vendors I've spoken to have said - unofficially - that an annual spin up, copy, erase and rewrite should be fine.

What does Robin do? I maintain 3 backup systems on my multi-terabyte Mac Pro: hourly Time Machine file backups; daily Carbon Copy Cloner runs for bootable backups; and an offsite cloud backup. I replace disks at 3 years of age or sooner.

I also make DVDs with completed video projects for archive. Never used them, hope I never will, but there they are.

The Storage Bits take We're living in a digital age. Maintaining digital data is more complex than sticking paper in a file folder. But the rewards - easy search, massive capacity, multi-formats - are worth it.

Comments welcome, of course.

Topics: Data Centers, Hardware, Storage

About

Robin Harris has been a computer buff for over 35 years and selling and marketing data storage for over 30 years in companies large and small.

Kick off your day with ZDNet's daily email newsletter. It's the freshest tech news and opinion, served hot. Get it.

Talkback

68 comments
Log in or register to join the discussion
  • How about flash memory?

    How does flash memory compare to optical and HDD storage, as far as longevity is concerned?
    pjotr123
    • No Good at all ...

      Flash Drives use one of two types of NAND Memory, SLC (Single Layer Cell) and MLC (Multi-Layer Cell) ...

      Most Flash Drives are MLC, because it is substantially cheaper, but not nearly as reliable or fast as SLC (I paid $150 for the cheapest SLC 8 Gig when the MLC was $50) ...

      Neither actually last very long - I've had at least two occasions where I was forced to reformat my Flash drive due to corruption - the O/S on multiple machines wouldn't recognize it ...

      Flash drives are a convenient way to transfer files from one location to another, but should never be used for "permanent" storage.

      Ludo
      Ludovit
    • Ludo is correct: flash is not an archive medium

      And, in addition, there is the cost of flash. The cheapest,
      slowest, MLC flash is today about $1.75/GB vs $0.10/GB for
      the cheapest disk. 8x Taiyo Yuden DVD-R can be had in 100
      packs on line for $0.07/GB or less on sale.

      HTH,

      Robin
      Robin Harris
  • I think our ears were ringing yesterday, Robin

    http://blogs.zdnet.com/perlow/?p=9364
    jperlow
  • RE: Long-term personal data storage

    I signed up to Carbonite - offsite backup over the internet which makes me feel much more secure on top of my two local copies, one of which is on a portable drive that I always take on vacation with me.
    kde_uk
  • DVDs? Are you kidding?

    I wouldn't trust a writable DVD, let alone a dual layered DVD with ANY important data. I have already had writable DVDs fail that are only a year or two old, and these were name brand media, and no, they weren't stored improperly. I'm finding DVDs to be just as bad as 3.5" floppies were. CDs aren't much better - I've got tons of those that can't be read too.

    I have decided to store all my irreplaceable data online (Amazon S3), along with two copies at home, one on the main hard drive, and the other on a removable hard drive as backups to the main hard drive.
    WindowWasher
    • DVDs, as part of a balanced diet . . .

      Note that you shouldn't rely on any 1 medium for backup. I
      keep my completed videos on DVD and hard disk, as well
      as on a Internet backup provider.

      That said, not all "name brand" media is alike. T-Y seems
      to have earned its reputation for quality media.

      Robin
      Robin Harris
    • DVD's are NOT archival in the traditional sense

      DVD's WILL erase after about 5 reads. It is a crystal media. My theory is that the heat of the LASER slightly remelts the crystal during a read. HP states somewhere that it is archival for 20 or 30 years if read less than 5 times.
      BilboRT
      • 5 times?

        Wow... you must have a hardware issue. I have DVDs that have been read hundreds of times and not had any issue. I haven't experienced DVDs going bad unless left in direct sunlight or scratched...
        notlehs
    • Backup software for removable hard drives

      I too am using disconnected hard drives as I have had problem with DVDs. I was glad the aurthor found information on how we need to refresh the data on the drive. I was wondering if there is software to copy, erase, and re-write of the data? I could write a perl script to do it, but an application would be better.
      richard.christensen@...
    • Evaporating data on writeable optical disks

      I've lost a fair amount of data, including several audio projects on CD-R disks which have become unreadable.

      Nowadays, I burn everything TWICE on optical disks. I've had pretty good luck with Verbatim brand media, (Most of my lost data was on Taio Yuden.)

      I have a friend who is trying to recover plans he drew in the '80s. He has good backups of the CAD files, he has good backups of the CAD program, what he doesn't have is the MS/DOS machine which is needed to run 'em.

      BTW: I somehow have more confidence in the continuing support from open software projects than I do with proprietary software. If worst comes to worst with open source I can always take the old code and recompile and putz with it if need be to get it to work with the current OS & hardware. Proprietary software? You are just up the creek.
      CodeCurmudgeon
      • RE: MSDOS machine

        I can't see why he couldn't use a 32bit Windows OS to run his DOS stuff, unless it depends on a version of DOS before V5.0.

        Edit the two files: (back them up first)
        CONFIG.NT
        AUTOEXEC.NT

        they're in the 'Windows\System32' folder

        Then proceed to edit the original files to load ANSI.SYS, HIMEM.SYS etc.

        WOW16 will kick in when he tries to run his program.

        Much of this info was removed from XP, even though it is still supported on all the 32bit versions.
        V@...
    • Amazon s3

      How much is it and how much do you like using it?
      Christian_<><
  • Low-energy lightbulbs

    Actually, since I learned that energy-saving lightbulbs emit UV, I'm concerned that some of my recordable media (DVD/CD) has already been damaged.

    Anyone reading this know anything more?
    V@...
    • UV

      Fluorescent lights (be they regular tubes or compact) produce UV. The mercury vapor in them emits primarily in the ultraviolet spectrum, which is converted into visable light by the fluorescence of the phosphor coating. The phosphor coating inside the tubes should convert nearly all of it to visible light, but some small amount always escapes.

      Some halogen lamps also produce a noticeable UV output.

      I haven't heard of UV from LEDs unless they were specifically designed to produce UV.
      CodeCurmudgeon
      • RE: UV

        Well I would never have expected halogens to emit UV. Thought they were at the opposite end of the light-spectrum for emissions.

        Any idea of the sort of UV power from low-energy Bulbs?
        V@...
        • Re: UV

          Halogens have a sufficiently broad spectrum that they do indeed emit UV. However, UV filters are very effective at removing it, good enough that some art museums use halogen MR16's with UV disks, and an MR16 UV filter disk is cheap.

          The problem of UV from fluorescent bulbs is a lot harder to solve, even with CFL's.

          In reality though, it isn't just UV that threatens photos and paper. If anything printed is valuable to you, the best way to store it is in the dark of a box or cabinet.
          esobocinski
  • Each revision to the PDF spec is backward-compatible

    Robin wrote: "The downside is that Adobe keeps adding features to their PDF software - Acrobat - which could create compatibility issues down the road."

    Surely an Acrobat document created in 2009 will be readable by Adobe Reader in 2040. Perhaps you mean that Adobe is layering feature upon feature that is not supported by the ISO standard for PDF. Thus, if someone in 2040 were to use a non-Adobe reader to view PDFs, a reader based on the 2032 ISO standard, some of the 2009 features--embedded video, rotatable 3D vectors, dissolves--won't be rendered. Good point.
    paul613
    • Backward compatible!!!

      It is NOT backward compatible. It is "supposed" to be backward compatible. So far, the two may not have been any different, but in 2040, I doubt if this will be the case.

      Besides, backward compatibility only adds more and more head-aches for the software developers. The legacy stuff then soon becomes the cause of head-aches for them. That then means more bugs in the software, more crashes, more differences between "supposed" backward compatibility and real life.

      I therefore never see backward compatibility as a long-term solution to file formats.
      alokgovil
  • Zip 100 disks and the Clinton-Bush-Obama economic depression

    Yah, tell me about it. I've got some data on Zip 100 disks
    still that I can't read. My DVD drive has died, so I transfer
    coveted files over the net to other sysems, and, when I
    can, burn CDs, because they're what I have... at least until
    the end of the Clinton-Bush-Obama economic depression.
    Professor8