Protein simulations have dominated the supercomputing conversation of late as supercomputers around the world race to simulate the viral proteins of COVID-19 as accurately as possible and simulate potential bindings in the hopes of finding a therapeutic to treat the pandemic. Even in many supercomputer-powered protein simulations, however, the computer work is still a stepping stone to lab work, which tends to be more accurate. Now, new work powered by Extreme Science and Engineering Discovery Environment (XSEDE) supercomputers is challenging that, with researchers at Michigan State University producing predictions that rival the most precise lab measurements.
The researchers – Michigan State University’s Lim Heo (a postdoctoral fellow) and Michael Feig (his advisor) used molecular dynamics simulations to refine existing predictions made by other research groups. Many of the protein simulations conducted, it turned out, still had the proteins in forms that exerted unnecessary energy – and thus, did not represent the proteins truly at rest.
“Proteins carry out biological functions. You can measure and analyze those functions, but to really understand them you have to look at the details of how proteins operate,” Feig told XSEDE’s Ken Chiacchia in an interview. “We know a lot of structures from experiments, but haven’t [done this, for example, with] most of the proteins in bacteria. We need to fill the gap by generating models quickly and efficiently.”
The Michigan State duo took a set of 27 predicted protein structures (generated by researchers at the 13th Critical Assessment of Techniques for Protein Structure Prediction competition in 2018) and used molecular dynamics to simulate those predicted structures over a longer time period, essentially allowing them to settle into their most natural states.
After refining the predicting structures, the researchers found that accuracy had been improved across the board, between three and 30 percent for every structure. On the higher end of this range, the predicted structures came close to the accuracy of x-ray based lab analysis of proteins.
An example of one of the resulting structures (red and blue) superimposed on the laboratory measurement (yellow and pink).
Of course, simulating those protein structures over a long period requires immense computing power. For that, the research team turned to the GPU nodes on a pair of supercomputers allocated through XSEDE: the Pittsburgh Supercomputing Center’s Bridges system, which delivers 1.35 peak petaflops, and the San Diego Supercomputer Center’s Comet system, which delivers 2.76 peak petaflops.
“Our modeling efforts are compute-intensive,” Feig explained. “XSEDE resources provided a significant amount of the resources we need to achieve significant model improvements. In brief, we can improve protein structures more with more computing time. Having access to XSEDE resources in addition to our local resources allowed us to improve structures than what we could have done otherwise.”
Next, the researchers are aiming to optimize their simulations in order to run them for longer simulated periods to increase the accuracy even more.