The Promise of DNA-based Storage
…or How to Put the Internet in a Van…
One of the foremost issues facing science and industry today is storing the ever-increasing amounts of data that are created globally. Google’s Eric Schmidt claims that every two days humanity generates as much information as it did from the dawn of civilization up until 2003. There may be a way to address this challenge thanks to the world’s oldest storage medium – DNA. Studies conducted by pioneering researchers George Church, a professor of genetics at Harvard Medical School, and Ewan Birney, associate director of the European Bioinformatics Institute (EBI), show DNA-based storage to be remarkably effective and efficient.
Delivering the keynote talk for the EUDAT 2nd Conference in October of last year, Ewan Birney discussed the exciting activity that is occurring at the intersection of biology and big data science. In an interview with ISGTW, Birney shares additional details about his DNA encoding projects, working with source material like Shakespeare’s sonnets, an excerpt from Martin Luther King’s ‘I have a dream’ speech, a PDF of Watson and Crick’s famous paper describing the double helix structure of DNA, a picture of the EBI, and a piece of code that explains the encoding procedure.
The beauty of DNA as a storage mechanism, according to Birney, is that it’s electricity-free, incredibly dense, and stable. DNA that’s over 700,000 years old has been recovered. “You’ve just got to keep it cold, dry and in the dark,” Birney told ISGTW.
Birney goes on to explain that the technology to read and write DNA has existed since bacteria were first genetically engineered in 1973. A 2003 project, lead by Pak Chung Wong from the Pacific Northwest National Laboratory, transferred encrypted text into DNA by converting each character into a base-4 sequence of numbers, each corresponding to one of the four DNA bases (Adenine, Cytosine, Thymine, and Guanine – also known by the abbreviations A, C, T, and G). Bacteria were considered to be an optimal host because they replicate quickly, generating multiple copies of the data in the process, and if a mutation occurs within an individual bacterium, the remaining bacteria will still retain the original information.
Live DNA is not without problems, though. Fast replication rates threaten to compromise data over long periods of time. There is also a risk that the inserted DNA could interfere with the host bacteria’s normal cellular processes, destabilizing the bacterial genome. As Geoff Baldwin, a reader in biochemistry at Imperial College London, UK, explains “This does not bode well for the use of bacteria as a mass data storage device.”
Researchers proposed using ‘naked’ DNA instead since living cells are not necessary for DNA to remain intact. Unlike bacteria, naked DNA doesn’t require genetic manipulations to safely insert it into a host. Birney and his team encoded computer files totaling 739 kilobytes of unique data – including all 154 of Shakespeare’s sonnets – into naked DNA code, synthesized the DNA, sequenced it and reconstructed the the files with over 99 percent accuracy.
With the current high costs of reading and writing DNA, this technology is not yet suitable for mass storage. It is, however, already economically viable for very long term (1,000 years or more) applications, such as nuclear site location data, and other governmental, legal and scientific archives that need to be kept long-term but are infrequently accessed. Furthermore, the researchers note that current trends are reducing DNA synthesis costs at a pace that should make DNA-based storage cost-effective for long-term archiving (~50 year periods) within a decade.
“DNA is remarkable,” observes Birney, “just one gram of DNA can store about a petabyte’s worth of data, and that’s with the redundancy required to ensure that it’s fully error tolerant. It’s estimated that you could put the whole internet into the size of a van! You can also copy trivially. The only problem at the moment is cost: it’s prohibitively expensive to write DNA. Nevertheless, this technology is expected to come down in price dramatically over the coming years. The only question is: how quickly will it come down in price?”