In the race for increasingly dense data storage solutions, DNA-based storage is surely one of the most curious – and a team of North Carolina State University (NCSU) researchers just brought it two steps closer to being a practical, scalable reality.
Over the past few years, researchers have become more successful at encoding data into genomes – most famously, perhaps, in 2017, when a team of genomics researchers from Harvard encoded a short movie into the genomes of living bacteria, where it replicated and was successfully extracted and decoded.
“DNA systems are attractive because of their potential information storage density; they could theoretically store a billion times the amount of data stored in a conventional electronic device of comparable size,” said James Tuck, an associate professor of electrical and computer engineering at NC State. “But two of the big challenges here are, how do you identify the strands of DNA that contain the file you are looking for? And once you identify those strands, how do you remove them so that they can be read – and do so without destroying the strands?”
These are precisely the problems that Tuck and other researchers at NCSU sought to solve. To tackle the file identification challenge, they used two nested primer-binding sequences – first identifying the strands containing the initial binder sequence, then identifying the subset of those strands that contain the second binder sequence. “This increases the number of estimated file names from approximately 30,000 to approximately 900 million,” Tuck said.
For the extraction challenge, the researchers abandoned existing techniques, which make many copies of the relevant DNA strands, causing their signal to overwhelm the rest of the sample and allowing for easy extraction – at the cost of efficiency. Instead, they attached small molecular tags to the primers used to identify targeted strands. The primer finds the targeted strand, copies it, and leaves the copy attached to the molecular tag. Then, molecular microbeads designed to bind to individual tags collect the tagged strands, allowing them to be retrieved by a magnet.
“This system allows us to retrieve the DNA strands associated with a specific file without having to make many copies of each strand, while also preserving the original DNA strands in the database,” said Albert Keung, assistant professor of chemical and biomolecular engineering at NCSU.
Together, these techniques came to be referred to as DNA Enrichment and Nested Separation – or DENSe. The researchers hope to scale up DENSe and test it with larger databases, but anticipate that cost may be a major limitation.
About the paper
The paper discussed in this article, “Driving the Scalability of DNA-Based Information Storage Systems,” was published in the May 2019 issue of ACS Synthetic Biology. It was written by Kyle J. Tomek, Kevin Volkel, Alexander Simpson, Austin G. Hass, Elaine W. Indermaur, James M. Tuck and Albert J. Keung.