Running HPC workloads, like computational fluid dynamics (CFD), molecular dynamics, or weather forecasting typically involves a lot of moving parts. You need a hundreds or thousands of compute cores, a job scheduler for keeping them fed, a shared file system that’s tuned for throughput or IOPS (or both), loads of libraries, a fast network, and a head node to make sense of all this. These are just the table stakes, too, because when you move to the cloud, you’re expecting to do more ambitious things – most likely because you’re a researcher with a problem to solve and a lab full of colleagues waiting for the answer.
Since 2018, AWS ParallelCluster has simplified the orchestration of HPC environments and helped researchers and engineers tackle some of the most ambitious problems facing the world today. Watching customers discover what “infrastructure as code” means in the context of HPC has really propelled us to find new ways to delight them. When a single shell command can create a complex thing like an HPC cluster, and a Lustre file system, and a visualization studio, it leads to more people trying cloud than ever before, and they’re asking us for new functionality.
So today we’re announcing AWS ParallelCluster 3. Customers, systems integrators, and other builders have told us they want to build end-to-end “recipes” for HPC, spanning the whole gamut from infrastructure to middleware, libraries, and runtime codes. They also explained to us their need for a API-like interface so they can interact with ParallelCluster programmatically to create interfaces and services for their users. As we’re known for doing, we worked backwards from this feedback, using thousands of conversations with customers to create what we’re showing you today.
There are a lot of changes you’ll notice – large and small. Here’s some highlights before we dive deeper later in this post:
- A new flexible AWS ParallelCluster API – This simplifies building solutions and interfaces on top of ParallelCluster, or including your clusters lifecycle as part of a pipeline. We’ve also changed the CLI to match, so scripted or event-driven workflows are easy.
- Build custom AMIs with EC2 Image Builder – Support for custom AMIs in ParallelCluster has grown from a feature in 2018 into a mainstream process now. With the introduction of EC2 Image Builder, we now have a way to automate this process without anyone needing to invent the automation. This will make clusters using custom AMIs faster to scale because it front-loads the image creation stage. It’ll improve reliability too, and you’ll find it easier to stay patched and even harder to mess up your security posture.
- A new configuration file format – ParallelCluster configurations now use a YAML format, and each one defines just one cluster. Along with several other changes we think it’ll be easier to keep your cluster configurations organized and readable.
- Simplified network configuration options – we’ve streamlined support for networking to enable the use of private, pre-existing Route 53 zones and provided some more flexibility for how we use Elastic IPs.
- Finer-grained IAM permissions – we’ve changed how we do permissions. We let you specify an IAM role or an Instance Profiles, and we let you do that separately for the head node and compute nodes. We support IAM permission boundaries on-creation for organizations that require specific limits when roles are applied.
- Runtime customization scripts – you can now tweak the pre- and post-install scripts separately for the compute nodes on a live running cluster, and they’ll get updated when you issue the ‘pcluster update’ command.
These features simplify initial cluster setup and ensure easier organization and reproducibility of clusters, saving customers time as they build out custom environments. Read the full blog to learn about the latest features released in ParallelCluster 3.