Molecular dynamics (MD) is a simulation method for analyzing the movement and tracing trajectories of atoms and molecules where the dynamics of a system evolve over time. MD simulations are used across various domains such as material sciences, biochemistry, biophysics and are typically used in two broad ways to study a system. The importance of MD came to bear on recent efforts for the SARS-COV-2 vaccine where MD applications such as GROMACS helped researchers identify molecules that bind to the spike protein of the virus and block it from infecting human cells.
The typical time scales of the simulated system are in the order of micro-seconds (Ms) or nano-seconds (Ns). We measure the performance of an MD simulation as nano-seconds per day (Ns/day). The simulations run for hours (sometimes days) in order to get to meaningful lengthier timescales, and gain insights on final confirmation of a molecule. MD applications are typically tightly coupled workloads where the system of atoms are distributed into multiple domains to attain parallelism and there is significant information exchanged across domains.
You can reduce time to results by parallelizing the simulation across multiple compute instances. This method necessitates the use of special high-performance interconnects keeping the inter process communication overhead low to linearly scale out the simulation. To get results faster when trying to arrive at an average understanding of the system across multiple parameters, we run hundreds of copies of the simulation in parallel. This ensemble method relies mainly on throughput of how many simulations a high performance computing (HPC) system is able to complete in a given timeframe. This blog post provides details on how different types of compute instance configurations perform on a given problem. We will also make architectural recommendations for different types of simulations, based on the performance and price characteristics of the different instances tested.
This blog presents a study of performance and price of running GROMACS, a popular open-source MD application, across a variety of Amazon Elastic Compute Cloud (Amazon EC2) instance types. We also assess how other service components, such as high-performance networking, aide the overall performance of the system. We detail performance comparisons between various EC2 instance types to arrive at optimal configurations that are targeted to single and multi-instance HPC system configurations. The blog post will only focus on CPU based EC2 instance types, not covered are the effects of leveraging GPU-based instances (which we will cover in a future blog post).
After reading this post you will have a more informed opinion about the performance of GROMACS on AWS, and will be able to narrow down configuration choices to match your requirements.
Learn how to optimize the price performance of running GROMACS on AWS by reading the full blog here.