Dec. 20, 2017 — Jian Peng, NCSA Faculty Fellow and Assistant Professor in the Department of Computer Science at Illinois and graduate student, Yang Liu, Department of Computer Science, have discovered a major breakthrough in protein structure predictions using deep learning data processed by NCSA’s Blue Waters supercomputer published in Cell Systems journal.
Peng’s research proposes to largely explore a more accurate function for evaluating predicted protein structures through his development of the deep learning tool, DeepContact. DeepContact automatically leverages local information and multiple features to discover patterns in contact map space and embeds this knowledge within the neural network. Furthermore, in subsequent prediction of new proteins, DeepContact uses what it has learned about structure and contact map space to impute missing contacts and remove spurious predictions, leading to significantly more accurate inference of residue-residue contacts.
Essentially, this tool converts hard-to-interpret coupling scores into probabilities, moving the field toward a consistent process to assess contact prediction across diverse proteins.
Applying the existing protein structure prediction algorithms and sampling techniques generates a massive dataset that is then processed and scaled up by the Blue Waters supercomputer. Based on this dataset, Peng hopes to develop a new structure motif-based deep neural network to assess the structural quality of predictions and to strengthen existing structure prediction algorithms.
Peng’s team, iFold, was top-ranked at the 12th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP12) last year. “We greatly improved the prediction accuracy for protein residue contact,” said Peng, “We believe that the improved contact prediction will further help us get closer to the ultimate goal of protein folding.” When proteins coil and fold into specific three-dimensional shapes they are able to perform their biological function, however, when misfolding happens in proteins, it then causes the proteins to malfunction, resulting in diseases like Alzheimer’s Disease. Peng’s research will use DeepContact to improve models for protein folding, that will facilitate a paradigm shift in protein structure prediction.
Peng plans to collaborate with NCSA affiliate, Dr. Matthew Turk using NCSA’s highperformance CPU and GPU resources, expanding on more efficient distributed implementations to accelerate both structure generation and training of deep neural networks.
Earlier this year, NCSA was awarded a $2.7 million grant from the National Science Foundation for deep learning research, which included Peng as a co-PI.
About the National Center for Supercomputing Applications
The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign provides supercomputing and advanced digital resources for the nation’s science enterprise. At NCSA, University of Illinois faculty, staff, students, and collaborators from around the globe use advanced digital resources to address research grand challenges for the benefit of science and society. NCSA has been advancing one third of the Fortune [email protected]; for more than 30 years by bringing industry, researchers, and students together to solve grand challenges at rapid speed and scale.
About the Blue Waters Project
The Blue Waters petascale supercomputer is one of the most powerful supercomputers in the world, and is the fastest sustained supercomputer on a university campus. Blue Waters uses hundreds of thousands of computational cores to achieve peak performance of more than 13 quadrillion calculations per second. Blue Waters has more memory and faster data storage than any other open system in the world. Scientists and engineers across the country use the computing and data power of Blue Waters to tackle a wide range of challenges. Recent advances that were not possible without these resources include computationally designing the first set of antibody prototypes to detect the Ebola virus, simulating the HIV capsid, visualizing the formation of the first galaxies and exploding stars, and understanding how the layout of a city can impact supercell thunderstorms.