High Performance Computing on AWS leverages the power of cloud computing and the extreme scale it offers to achieve optimal HPC price/performance. With AWS you can right size your services to meet exactly the capacity requirements you need without having to overprovision or compromise capacity. It’s easy to choose services that meet your existing workload needs, and as your demands change, you can quickly shift to the services option that meets your new requirements. You can also run multiple service options concurrently, helping you reduce costs and still maintain optimal performance – keeping in mind that in many cases, performance usually does not mean brute force benchmarking, but better time-to results.
There are two AWS services that play a key role in driving efficiency and cost-optimization for HPC workloads. AWS Batch is a fully managed service offering that enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources (e.g., CPU or memory optimized instances) based on the volume and specific resource requirements of the batch jobs submitted. With AWS Batch, there is no need to install and manage batch computing software or server clusters that you use to run your jobs, allowing you to focus on analyzing results and solving problems. AWS Batch plans, schedules, and executes your batch computing workloads across the full range of AWS compute services.
Amazon EC2 Spot Instances offer spare AWS compute capacity at up to a 90% versus On-Demand instance pricing. As Spot Instances can be reclaimed with a two-minute warning, they are ideal for fault tolerant applications, one of the first questions that veteran AWS HPC users ask is how much of any given workload is “Spot-friendly”? Spot-friendly HPC workloads are the ones that can tolerate interruptions or terminations of a few of the cores in your HPC cluster. Examples for these workloads are typically loosely-coupled or embarrassingly-parallel workloads like genome sequencing or algorithmic trading. If a workload is interrupted, AWS Batch will automatically spin-up another Spot Instance you’ve specified. Running Spot Instances on AWS Batch allows you to take advantage of all AWS Batch has to offer, but for a fraction of the cost.
If you have a spot-friendly HPC workload, this video by my colleague Chad Schmutzer explains how to use AWS Batch to submit batch jobs and use EC2 Spot instances.