With 3 billion base pairs of DNA on hand, it’s no wonder that genes are able to program nearly ever detail of our physical makeup, from constructing organs to fighting off disease. But how can a system so vast find the right operating manual for one body part, and ignore all the data meant for another?
It turns out that the secret is folding. While we’re used to viewing DNA in a long, untangled double helix, a genome looks more like an impossibly complex knot when it’s inside of a cell’s nucleus. But this genome jumble has much more rhyme and reason than first meets the eye.
By gathering a portion of the genome into a specific shape, scientists have found that these specific segments can be turned on or off to best serve the task at hand. And now, by using NVIDIA GPUs, researchers from Baylor College of Medicine, Rice University, MIT and Harvard University have mapped this sophisticated system of genome folds in unprecedented detail.
Among the folded shapes that the team was able to identify was the “3D loop,” where two sections of DNA that are usually far apart snap together.
Under the leadership of Erez Aiden, assistant professor of genetics at Baylor and assistant professor of computer science and computational and applied mathematics at Rice, the team has unraveled roughly 10,000 loops that the human genome folds into.
“Our maps of looping have revealed thousands of hidden switches that scientists didn’t know about before,” said Co-first author Miriam Huntley, a doctoral student at the Harvard School of Engineering and Applied Sciences (SEAS) in a Harvard press release. “In the case of genes that can cause cancer or other diseases, knowing where these switches are is vital.”
With it, scientists hope they can uncover clues to cell function that could combat complex diseases such as cancer.
Of course, obtaining such a high resolution of a 3 billion base pairs didn’t come without computational challenges. Using HPC clusters and custom algorithms, the team set off to work, but soon realized that CPUs alone wouldn’t get them to their goal.
“Ordinary computer CPUs are not well-adapted for the task of loop detection,” said Suhas Rao, a researcher at Baylor’s Center for Genome Architecture. To indentify the special places in the genome where loops can form, Rao said the team had to turn to NVIDIA GPUs to get the job done.
“We faced a real challenge because we were asking, ‘How do each of the millions of pieces of DNA in the database interact with each of the other millions of pieces?’” said Miriam Huntley, a doctoral student at Harvard’s School of Engineering and Applied Sciences. “Most of the tools that we used for this paper we had to create from scratch because the scale at which these experiments are performed is so unusual.”
Among these customs tools, such as algorithms and data structures, Rao commented that data-visualization tools created by co-authors Neva Durand and James Robinson played a vital role in their research.
The results of the team’s study were published in the December 2014 issue of Cell Magazine.