Researchers from Columbia University and the New York Genome Center have devised a new coding system, dubbed DNA Fountain, which is capable of stuffing 215 petabytes of data onto one gram of DNA.
That's about 100 times more than previous researchers have stored on DNA, and was achieved by customizing an algorithm for streaming video on a smartphone, Science Daily reports.
DNA holds promise for data storage because of its superior density to tape, disk, and optical media. It can also store information for thousands of years if it's kept in the right conditions.
While information in computers is written as ones and zeros, researchers have devised different algorithms for encoding data to conform with DNA's four base nucleotides: adenine, A, guanine, G, cytosine, C, and thymine, T. Using this method, Microsoft last year claimed a record by storing 200MB of data including a music video, on synthetic DNA strands.
DNA Fountain was created by Yaniv Erlich, a computer science professor at Columbia Engineering, who's also a core member of the NYGC, and Dina Zielinski, an associate scientist at NYGC.
Within the 2MB compressed file, the pair wrote to DNA included graphical operating system KolibriOS, an old French film, a $50 Amazon gift card, a computer virus, and a Pioneer plaque. It also included the 1948 study, A Mathematical Theory of Communication, by Bell Lab information theorist Claude Shannon, in a nod to his pioneering work on encoding, noise and decoding in information transmission.
The researchers looked at the challenge through Shannon's theory on information capacity of DNA storage, which says the maximum capacity each nucleotide could reach in an ideal world is two bits. However, as with communications, DNA storage capacity is obstructed by various noise factors.
"DNA storage is basically a communication channel," write Erlich and Zielinski. "We transmit information over the channel by synthesizing DNA oligos. We receive information by sequencing the oligos and decoding the sequencing data. The channel is noisy due to various experimental factors, including DNA synthesis imperfections, PCR dropout, stutter noise, degradation of DNA molecules over time, and sequencing errors."
They say DNA Fountain, so named because it uses fountain codes, which are used for video streaming to mobile devices "approaches the Shannon capacity while providing robustness against data corruption".
According to Science Daily, they used DNA Fountain to generate 72,000 DNA strands or oligos that were sent to Twist Bioscience, the DNA synthesis firm that supplied Microsoft's synthetic DNA.
According to the paper, they achieved an information density of 1.57 bits per nucleotide, just shy of Shannon capacity. That density translates to 215 petabytes per gram of DNA and is also 60 percent more than previous studies.
Despite more evidence of DNA's superior storage density, disk still has one major advantage.
As the researchers highlight, DNA storage in this study cost $3,500 per megabyte. However, they see the cost falling with improvements to DNA synthesis chemistry, as well as "quick-and-dirty oligo synthesis methods" that consume less machine time.