Storing digital data inside of DNA has been an idea since the 1960s, and recent developments have addressed some of the obstacles facing its scaled implementation. Now, researchers at the Technion-Israel Institute of Technology and the Interdisciplinary Center Herzliya have crossed another major milestone by using new techniques to store 10 petabytes of data in one gram of DNA. The researchers used a striking comparison to illustrate the scale of this achievement: theoretically, this means that all of the data currently stored on YouTube could be captured in a single teaspoon.
In DNA-based storage, binary code is translated into the four nucleotides (marked A, G, C and T) that constitute DNA, then synthesized into actual DNA molecules consisting of those nucleotide sequences. To interpret the data, the DNA is then put into a sequencing machine. While the sequencing process has swiftly improved over the last decade, the synthesization process is still cumbersome and expensive.
In their breakthrough, the researchers effectively increased the number of nucleotides beyond the four core building blocks by using unique combinations of the original four, essentially allowing for more information to be stored in each letter – and, crucially, increasing the efficiency of the writing process. Using this new technique, synthesis rounds per unit of information were decreased by 20 percent, and the researchers are hopeful that a 75 percent reduction is possible in the near future.
“The current synthesis and sequencing processes are inherently redundant,” said Professor Zohar Yakhini of the Technion Faculty of Computer Science, who helped guide the research, “because each molecule is produced in large numbers and is read in multiple copies during sequencing. The method we developed leverages this redundancy to increase the effective number of letters well over the original four letters, making it possible for us to encode and write each unit of information in fewer cycles of synthesis.”
Interestingly, the researchers also used this new technique to implement advanced error correction mechanisms, improving error detection.
“Thanks to the use of error-correction codes that are tailored to the unique encoding we created, we were able to perform highly efficient coding and to successfully recover the information,” said Leon Anavy, lead researcher and a student in the Technion Faculty of Computer Science. “When working in a system consisting of millions of parts (molecules), even one-in-a-million events occur, which can disrupt the reading. Careful coding allowed us to overcome these problems.”
To conduct the research, they turned to a variety of resources, ranging from Twist Bioscience to Technion’s Genome Center to funding from the European Commission. The researchers are hopeful that development constitutes one more major step toward at-scale use of synthetic DNA.
About the research
The research referenced in this article was published as “Data storage in DNA with fewer synthesis cycles using composite DNA letters” in the September 2019 issue of Nature Biotechnology. The research article was written by Leon Anavy, Inbal Vaknin, Orna Atar, Roee Amit and Sohar Yakhini.
The original press release highlighting this research can be found at this link.