DNA is so flexible that it can be used to create everything from an amoeba to a human, a dinosaur to a dandelion. And it's so small that the DNA needed to create all these life forms can be fitted into a single cell just micrometers large.
And while DNA has existed for billions of years as a way of storing the blueprints for life, now researchers are beginning to explore its potential as a storage medium for digital data.
Some of tech's biggest names are already looking into DNA's potential for long-term data storage. Microsoft, for example, earlier this year announced it had worked with the University of Washington (UW) to store and retrieve 200MB of data -- including a song by OK Go! and the Declaration of Human Rights -- on DNA.
"We got to thinking that there's a trend in storage -- storage media is not growing the way we want it to. We know DNA is a very dense storage medium, and we know it has other properties that make it very interesting, like durability, and we go to this question of we should look at this as a potential storage medium," James Bornholt, a graduate student working on DNA storage in the department of computer science and engineering at the University of Washington, told ZDNet.
As you'd expect for a storage medium found in cells so small that can't be seen with the naked eye, DNA offers one major advantage over conventional tape or SSD storage: density. According to Washington University researchers, if DNA was its storage medium, all the data found in Facebook's recently-built Oregon cold storage datacenter could be fitted into the space of a sugar cube.
Another advantage: DNA is pretty much the oldest known storage medium there is. For those that have the algorithms to decode it, it can be read thousands of years after its creation if it's stored in the right way.
Built of four units -- adenine, guanine, cytosine, and thymine -- that are linked to each other by hydrogen bonds and branch from a sugar-phosphate backbone, DNA is deceptively simple. But, by reading the way in which one base follows another, scientists have already begun to encode and decode information in much the same way as conventional computer storage uses a sequence of ones and zeroes.
With DNA being a base four system and binary base two, encoding digital data in DNA is relatively simple -- the zeros and ones are translated into cytosine, thymine, guanine, or adenine.
Like traditional storage, DNA storage uses built-in redundancy to help maintain the integrity of the data in the event some of it is lost in the reading or writing, said Dr Robert Grass, a lecturer in the department of chemistry and applied biosciences at the Swiss research institute ETH Zurich, who works on DNA storage.
"You over-represent your data. Take a simple example: if you want to represent a triangle, if I give you the length of the three sides, it's perfectly defined. Now if I give you the length of the three sides and one of the angles, it's over-defined. If you lose one of those data points, you can still reconstruct the data, whatever data point you lost," he said. DNA storage works in a similar way, over-representing the data it holds to the extent where, if some data points were lost, it could still be reconstructed.
However, there are still new pitfalls that need to be surmounted before DNA storage can be used commercially.
"The downside of DNA is that it is not as stable as we would like to have it," Grass said. "DNA is a vulnerable chemical: if you left it on the table, after a few days it would have oxidised and decayed, and your information would be lost."
The DNA sequence can be disrupted by chemical agents, temperature, or other factors, and corruption in the DNA means corruption in the data. As a result, the way in which DNA is stored is key to how long is can remain viable and how likely errors are to develop. Dehydration is one way to prolong its life, creating a shell around the DNA modelled on bone -- a technique developed by a team led by Grass at ETH Zurich -- is another.
"We know by looking at fossils that, if stored correctly, DNA can last hundreds of thousands of years. By investigating this ancient fossil DNA and thinking about how it is in the bone, where it's usually found, we generated a material that simulates the conditions in which DNA is found in ancient fossils," Grass said. "It's encapsulated in inorganic materials -- we use glass. It's very well protected from degradation."
But, due to its biological nature, it can still throw up unexpected reactions during the storage process.
"Algorithmically, [DNA encoding] works great, but the way that DNA works is non-uniform. This is where entire chunks of data we had written had gone missing entirely. Sometimes, it was glitches in the process because we are still new to this, sometimes, more scarily, we had written DNA sequences that had just happened to be for example complementary to each other - they bind to each other and you can't read them any more... problems like this, we're just scratching the surface of," Bornholt said.
In a future where our data is stored on DNA, could computer viruses make the leap from digital to biological or vice versa? With current DNA storage technologies, that's not likely to be a security threat: there are only hundreds of bases in the DNA chains used for storage, viruses however are made of chains tens of thousands of bases long. "It's pretty difficult to manufacture a viable virus -- we're not too worried about that," Bornholt added.
But the difficulties in handling DNA are not likely to be the only reason that DNA storage remains niche for some time to come.
The chemical processes used to create DNA and then read it -- derived from the polymerase chain reaction technology used for sequencing genomes -- are very time consuming. Storing a megabyte of data initially took University of Washington scientists a week, UW's Bornholt said, and more work is needed in biology and computer science to scale up DNA technologies before they can be used to make DNA a practical storage medium.
DNA storage techniques are also expensive -- and they become more expensive the longer you want your DNA to remain readable: the more copies of the DNA storage strand are made, the longer that DNA will last, but the cost increases proportionally.
Consequently, it's not likely to dislodge hard disks as our storage medium of choice, but it may one day replace tape, storing data for tens or hundreds of years.
"The archiving of information or the storage of really important information where you want to be really sure you never lose it in your life, or in the company's life, that's where the storage of DNA has its real advantages and where we see first future of DNA storage," ETF Zurich's Grass said.