Running cost-effective GROMACS simulations using Amazon EC2 Spot Instances with AWS ParallelCluster

This post was contributed by Santhan Pamulapati, Solutions Architect, and Sean Smith, Sr. Solutions Architect at AWS.

GROMACS is a popular open-source software package designed for simulations of proteins, lipids, and nucleic acids. To effectively perform the simulations, users may require access to high-performance computing (HPC) environment. AWS provides such HPC environment through AWS ParallelCluster and in the previous workshops, we’ve shown examples of running GROMACS simulations using AWS ParallelCluster.

By using Amazon EC2 Spot Instances for simulations, you can cost-effectively run GROMACS simulations and save up to 90% off the on-demand price. The trade off with Spot Instances is that they can be interrupted at any time with a 2-minute warning.

Depending on simulation complexity, the run time can be for hours or even multiple days. As simulation run time grows, their interruption risk potentially increases. Therefore, it becomes critical to preserve the progress of the simulation as it completes. Simulation checkpointing is a way to achieve this. By combining GROMACS native checkpointing with Spot instances, you can resume simulations and lose minimal progress if you experience an interruption. Spot instances offer an alternative scaling solution to an accelerated graphics processing unit (GPU) approach.

Solution Overview

In this post, we demonstrate that by combining GROMACS checkpointing with Spot pricing and scheduling with Slurm, we architected a cost-effective solution for running GROMACS simulations. We built the solution with GROMACS installed on AWS ParallelCluster.

Prerequisites

The following pre-requisites are recommended:

AWS account with appropriate user privileges in a test environment
Familiarity with AWS ParallelCluster deployment as outlined in the AWS HPC workshop
Familiarity with running GROMACS simulations in a Linux environment

Solution Walkthrough

In this section, we show an example setup to get started with GROMACS with checkpointing and Spot Instances. This solution works for single-node and multi-node instance runs. The sample setup involves following steps:

Create and deploy AWS ParallelCluster with Spot instances

AWS ParallelCluster orchestrates and manages HPC clusters in the AWS Cloud. In this post, a Slurm based cluster is deployed using AWS ParallelCluster with one head node and multiple Spot Instance worker nodes. To create and deploy AWS ParallelCluster with Spot instances, refer to AWS ParallelCluster Documentation. An example template with SPOT capacity type is following…

Read the full blog to learn more. Reminder: You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube channel, and following the AWS HPC Blog channel.