Previous versions of AWS ParallelCluster enabled HPC clusters in a single Amazon EC2 Availability Zone.
With AWS ParallelCluster 3.4.0, you can now create clusters that use multiple Availability Zones in a Region. This gives you more options to provision computing capacity for your HPC workloads.
AWS ParallelCluster helps you build HPC clusters that can elastically scale to the size of your computing workload. The Amazon EC2 capacity in a single Availability Zone is sufficient for many customers’ HPC workloads. But larger scale-out workloads can require even bigger resource pools. Customers running such workloads have asked for the ability to combine Amazon EC2 capacity across Availability Zones.
For example, Electronic Design Automation (EDA) jobs typically spawn thousands of independent, loosely coupled tasks. These typically have no inter-process communication. That means they don’t need be restricted to a single Availability Zone. Instead, they can benefit from combining Amazon EC2 Spot capacity from multiple Availability Zones. Monte Carlo simulations in the Financial Services sector often follow a similar pattern.
Some customers manage access to multiple Availability Zones by creating an HPC cluster in each. However, it can be helpful to be able to manage these multi-Zone resources in a single HPC cluster. That way, common aspects like user management, job accounting, and software installation only have to be done once, while the cluster has access to instances running in multiple Availability Zones.
Tightly coupled workloads like Computational Fluid Dynamics (CFD) and Weather Modeling need many homogeneous instances powered by high-performance networking. You may still benefit from being able to launch instances across multiple Availability Zones, but you would need to take care to restrict any given multi-node job request to instances in a single Availability Zone.
Using Multi-Availability Zone Clusters
In November 2022 AWS ParallelCluster 3.3 added the ability to specify multiple instance types in a Slurm job queue as a way to aggregate compute capacity. AWS ParallelCluster 3.4 adds to this flexibility by allowing you to associate one or more subnets from different Availability Zones to job queues. Doing so will allow ParallelCluster to launch instances of the requested type(s) in the requested Availability Zone(s) as it scales up those queues.
To get started with multi-zone clusters, you’ll need ParallelCluster 3.4.0 or higher. You can follow this online guide to help you upgrade. Next, edit your cluster configuration as described in the examples below and the AWS ParallelCluster documentation. Finally, create a cluster using the new configuration.
Let’s look at two examples…
Read the full blog to learn more. Reminder: You can learn a lot from AWS HPC engineers by subscribing to the HPC Tech Short YouTube channel, and following the AWS HPC Blog channel.