Even as storage density reaches new heights, many researchers have their eyes set on a paradigm shift in high-density information storage: storing data in the four nucleotides (A, T, G and C) that constitute DNA, a method that promises millions of times greater efficiency than current data storage techniques. While DNA-based data storage has been achieved a number of times, a number of hurdles remain to move it from a proof of concept to a scalable technology ready for production and regular use. Now, a team of researchers at the University of Texas at Austin has leapt one of those hurdles, improving the reliability of DNA data retrieval even when the strands are damaged.
“We need a way to store this data so that it is available when and where it’s needed in a format that will be readable,” said Stephen Jones, a research scientist who worked on the project. “This idea takes advantage of what biology has been doing for billions of years: storing lots of information in a very small space that lasts a long time. DNA doesn’t take up much space, it can be stored at room temperature and it can last for hundreds of thousands of years.”
DNA, however, is prone to errors – and errors in DNA shift the entire sequence, proving much more disruptive than simple missing data in traditional data storage media. This means that in prior DNA data storage experiments, many copies of the data would be stored so that the retrieval program could assess the duplicates against each other to find errors.
“The key breakthrough [in this research] is an encoding algorithm that allows accurate retrieval of the information even when the DNA strands are partially damaged during storage,” said Ilya Finkelstein, an associate professor of molecular biosciences and one of the study’s authors.
“We found a way to build the information more like a lattice,” Jones said. “Each piece of information reinforces other pieces of information. That way, it only needs to be read once.” Furthermore, they explained, their technique helps them prioritize certain kinds of information and avoid problematic or error-prone sections of DNA.
To test their storage approach, the researchers stored a copy of The Wizard of Oz (translated into Esperanto), then subjected it to high temperatures and extreme humidity, damaging the DNA strands. Finally, they retrieved the information – successfully, and with high accuracy.
“We tried to tackle as many problems with the process as we could at the same time,” said Hawkins, who recently was with UT’s Oden Institute for Computational Engineering and Sciences. “What we ended up with is pretty remarkable.”