This post contributed by Christian Kniep, Sr. Developer Advocate for HPC and AWS Batch at AWS, Carsten Kutzner, staff scientist at the Max Planck Institute for Biophysical Chemistry, and Vytautas Gapsys, project group leader at the Max Planck Institute for Biophysical Chemistry in Göttingen.
Early stage drug discovery helps to identify compounds and optimize them to further develop new drugs within pharmaceutical research. What used to be a slow and exclusively manual discovery process involving laborious chemical synthesis is now accelerated by Computer Aided Drug Design (CADD). In the past, a technique called molecular docking was most widely used. While it’s fast, molecular docking is less accurate compared to molecular dynamics (MD) simulations, where the dynamics of the protein-ligand interaction is simulated explicitly in atomic detail. That accuracy makes CADD a viable and scalable option.
In this blog post, we’ll describe how early stage drug discovery can be accelerated while also optimizing the cost using two popular open-source packages running at scale on AWS. We used GROMACS, which does a molecular dynamics simulations, and pmx, a free-energy calculation package from the Computational Biomolecular Dynamics Group at Max Planck Institute in Germany.
Accelerating early-stage drug discovery with CADD
For each of the tens of thousands of potential compounds in the MD-based drug discovery process, simulations help select candidates for the next pre-clinical development stage. This later stage is much more costly, and therefore only a few hundred compounds can be tested. Applying selection criteria in the initial discovery phases is key for focusing on the right candidates for the later clinical development stage which focuses on only a handful of compounds.
Historically, early stage simulations were performed on static on-premises shared compute infrastructure. Today, more research institutions are looking to the AWS Cloud for a scalable alternative that is both cost effective and provides faster turn-around times for getting results.
Optimizing for cost and runtime
In collaboration with the team from Max Planck, we ran an ensemble of twenty thousand molecular dynamics simulations evaluating free energies for drug-like compounds binding to the target proteins. We achieved this in 3 days using an innovative setup to leverage several AWS Regions at once. Using multiple AWS Regions produced a result that was both low cost, but also fast. Optimizing both these variables is important if we’re to cure disease faster.
Instance selection through generic benchmarks
On-premises HPC compute environments tend to be homogenous with respect to the CPU and GPU resources for each node. They also tend to leverage the highest-end CPUs and multiple GPUs at the time of procurement to pack as much compute in the data center as possible.
In AWS, though, there’s a much larger selection of instance types (over four hundred, currently) all optimized to fit different use-cases. With our ensemble run comprising ~20k jobs, it is not the performance of an individual simulation that needs to be maximized. Instead, we aim to minimize the time-to-result to run the complete ensemble while keeping the costs as small as possible. To identify which AWS instances offer a good price-to-performance ratio with GROMACS, the team from Max Planck tested dozens of different instance types using a set of benchmark simulation systems.
The data showed that the “usual” high-performance CPU or GPU instances were actually not the best choice for a single simulation. That’s because GROMACS achieves its best performance with a particular ratio of CPU and GPU cores. The analysis showed that the graphics-optimized G4dn instance family ranks highest in performance-to-price. For a deeper dive into our single-node benchmarking, please visit the blog post Running GROMACS on GPU instances: single-node price-performance.
Read the full blog to learn more about how the team benchmarked protein-ligand systems, and optimized workload price performance with a mix of Spot and On-Demand Instances.