Through methylation, the behavior of DNA changes, but its overall structure remains the same. This process is central to many normal, essential processes, but errors in methylation are associated with a wide variety of diseases – and normal sequencing tools fail to notice those changes. Now, researchers from the New Jersey Institute of Technology (NJIT) and Children’s Hospital of Philadelphia (CHOP) have developed a tool to predict where DNA methylation will occur.
The software, called Deep6mA (after N6-methyladenin, or 6mA, an important DNA modification form), uses a neural network to predict methylation sites on strands of DNA. The researchers developed and trained the neural network using supercomputer resources allocated through an umbrella grant from the Extreme Science and Engineering Discovery Environment (XSEDE). Through the grant, the researchers have major allocations on a wide range of supercomputers – such as Bridges at the Pittsburgh Supercomputing Center, Stampede2 at the Texas Advanced Computing Center and Comet at the San Diego Supercomputer Center – for deep learning projects involving high-throughput genomic data.
“We are very pleased that NSF-supported artificial intelligence-focused computational capabilities contributed to advance this important research,” said Amy Friedlander, acting director of the NSF’s Office of Advanced Cyberinfrastructure.
Zhi Wei, a computer science professor at NJIT and co-author on the study, says that Deep6mA offers a number of benefits, including automation of various functions, integration of flanking genes, visualization improvements and more.
“Previously, methods developed to identify methylation sites in the genome could only look at certain nucleotide lengths at a given time, so a large number of methylation sites were missed,” said Hakon Hakonarson, director of the Center for Applied Genomics at CHOP and another co-author of the study. “We needed a better way of identifying and predicting methylation sites with a tool that could identify these motifs throughout the genome that are potentially disease-causing.”
When testing Deep6mA on a series of representative organisms, the researchers found that it identified methylation sites at a granular level – down to a single nucleotide – and allowed them to visualize patterns that were out of reach for previous models. The authors, however, acknowledge that Deep6mA has drawbacks.
“One limitation is that our proposed prediction is purely based on sequence information,” Wei said. “Whether a candidate is a 6mA site or not will also depend on many other factors. Methylation, including 6mA, is a dynamic process, which will change with cellular context.” The researchers are looking to integrate other factors, such as gene expression, into future iterations of the tool.