August 21, 2012

Modeling Proteins at Supercomputing Speeds on Your PC

Robert Gelber

A group of researchers at the University of California, San Diego (UCSD) has established a new approach to simulating molecular behavior. By running an enhanced sampling algorithm on a GPU-equipped desktop, the team was able to achieve millisecond-scale protein simulations. Prior to this, similar research required the use of Anton, a multi-million dollar, purpose-built supercomputer specifically designed for molecular modeling. HPCwire spoke with project members Ross Walker and Romelia Salomon-Ferrer about their research.

A primary challenge in the study of protein dynamics is the ability to simulate interactions over relatively long time periods. “The problem we’ve always had is that the biological timescale is really at the high-microsecond/low-millisecond time scale,” said Walker. “That’s where most of the interesting large-scale motions in proteins are occurring.”

He went on to explain that conventional CPU clusters could handle a 50-nanosecond simulation per day. Hybrid systems (those accelerated by GPUs) perform slightly better, achieving around 75 to 100 nanoseconds in a day. But that’s still 100 times shorter than a microsecond. 

Eventually the simulations hit a wall, limiting their ability to model interactions past a given amount of time. The primary issue lies with interconnect technology, according to Walker. He said that additional GPUs could be added to the nodes, but it would only help if system bandwidth was doubled and latency cut in half.

This dilemma prompted D.E. Shaw Research (a company founded by hedge fund billionaire David Shaw), to advance drug discovery by focusing on molecular dynamics, and to then create the Anton supercomputer. The system consists of specialized ASICs and a custom Torus interconnect. Using this unique architecture, Anton has the ability to outperform traditional supercomputers by two to three orders of magnitude, simulating up to 25 microseconds per day.

While Shaw’s design has obvious benefits in speed and accuracy, its proprietary approach makes gaining access to an Anton machine rather difficult.  For academic researchers, there is but a single machine in production, at the Pittsburgh Supercomputing Center (PSC).

So the team at UCSD considered changing the algorithms, enabling them to be run on basic commodity hardware. “Do we really have to stick with the equations we’ve been using for the past 30 years?” asked Walker. “Could we try and act smarter with these equations and tailor them for specific things we want to look at?”

They developed a technique called accelerated molecular dynamics (aMD), which optimizes the conformational space sampling of a given protein molecule. The technique was developed based on a collaboration with Howard Hughes Medical Institute (HHMI), and UCSD professor Andrew McCammon, co-author of the research. According to an official statement, the group ran an aMD simulation on a desktop equipped with just a pair of NVIDIA GTX580s.

The researchers analyzed the bovine pancreatic trypsin inhibitor (BPTI), a relatively small molecule as proteins go. It took around 10 days of computation to capture 500 nanoseconds of protein folding, which is 2,000 times shorter than millisecond-scale simulation performed by Anton. However, the aMD run accurately represented all the different structural states returned by the much longer supercomputer simulation. While the UCSD team used a Fermi-based GPU to complete their application run, according to Walker and and Salomon-Ferrer, a Kepler-generation unit, like the K10, would improve processing time by about 30 percent.

The most obvious advantage to this approach is its ability to perform accurate protein simulations on thousand-dollar desktop systems. That opens up this type of research to thousands of scientists, rather than just those select few with custom-built supercomputers at their disposal.