The future of data storage: Coding information in DNA

Scientists coded an entire book into DNA, giving a glimpse of a future in which a thumb-sized device could store as much information as the entire internet.
Written by Laura Shin, Contributor

We love the Internet because it puts information right at our fingertips.

But when faced with the actualities of downloading a movie or uploading photos, we're quickly reminded just what a pain it is to deal with all this data.

Enter the future.

Harvard researchers just announced that they have recorded the information from a whole book into the genes of DNA, and then even read back the text.

The breakthrough, published in the journal Science, points to a future in which "a device the size of your thumb could store as much information as the whole Internet," said Harvard University molecular geneticist George Church, the project's senior researcher.

The experiment

The researchers put Church's forthcoming book on genetics, called "Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves," into DNA.

They started with two kinds of information. They had the 0s and 1s that composed the digital version of the book.

They also had the four chemicals in which DNA codes its genetic instructions: adenine (A), guanine (G), cytosine (C) and thymine (T).

The Wall Street Journal reports:

Next, on paper, they translated the zeros into either the A or C of the DNA base pairs, and changed the ones into either the G or T. Then, using now-standard laboratory techniques, they created short strands of actual DNA that held the coded sequence—almost 55,000 strands in all. Each strand contained a portion of the text and an address that indicated where it occurred in the flow of the book.

What they ended up with was a viscous liquid that held a billion copies of the book, could comfortably fit into test tube and could last for centuries without requiring, say, extreme cold or tremendous energy to preserve it, unlike some other experimental forms of storage.

"You can drop it wherever you want, in the desert or your backyard, and it will be there 400,000 years later," Church said in a press release.

The group did, however, purposely avoid putting the DNA inside a living cell, where the information would be a small fraction of the whole cell creating what Church calls "wasted space." He also added, "But more importantly, almost as soon as a DNA goes into a cell, if that DNA doesn't earn its keep, if it isn't evolutionarily advantageous, the cell will start mutating it, and eventually the cell will completely delete it."


Other researchers have previously stored information in DNA. Some have created micro-organisms that have the song, "It's a Small World (After All)" in their DNA.

The first synthetic cell, created in 2010, had, coded in its DNA, the names of its inventors, who included the genomics pioneer Craig Venter, plus three literary quotations and a website address.

But the recent Harvard experiment was of a much bigger scale. The book has 53,426 words, 11 illustrations and a JavaScript computer program, making it 600 times bigger than any previous amount of data encoded in DNA. Its storage capacity is equivalent to that of a 3.5-inch floppy computer disk.

Potential applications and challenges

While we're not going to be storing the whole internet on devices the size of our thumbs any time relatively soon, DNA could someday give us a stable, long-term way to store information ranging from medical files, financial records, books, photographs and vidoes.

Before we get there, however, scientists would need to address the fact that reading and writing in DNA takes longer than it does with other media. This particular book took several days to "write" and then even longer to read back.

The stored data "is sequential, like a magnetic tape, where you have to spool through stuff to get at the data," bioengineer Sriram Kosuri at the Wyss Institute for Biologically Inspired Engineering at Harvard, told the Wall Street Journal.

On top of that, it was very expensive to synthesize and sequence. But Church says that cost has been dropping and should continue to do so.

The Journal reports, "Already, the production costs of generating raw, unassembled DNA sequence data, such as might be used to archive data, have dropped from $10,000 per million base pairs of DNA in 2001 to about 10 cents per million base pairs in 2012, according to the National Human Genome Institute."

The group has filed a patent on its technique.

Related on SmartPlanet:

via: The Wall Street Journal, Eurekalert

photo: JohnGoode/Flickr

This post was originally published on Smartplanet.com

Editorial standards