We routinely test a lot of HPC applications to make sure they work well on AWS, because we think that’s the only benchmark that matters. Hardware improvements from generation to generation are always important, but software is more frequently the place we find gains for customers.
Last summer, our performance teams alerted us to a discrepancy in runtimes when a single application ran with several different MPIs, while on the same hardware. At a high level, codes running with Open MPI were coming in a little slower than when using other commercial or open-source MPIs. This concerned us, because Open MPI is the most popular MPI option for customers using AWS Graviton (our home-grown Arm64 CPU) and we don’t want customers working on that processor to be disadvantaged.
In today’s post, we’ll explain how deep our engineers had to go to solve this puzzle (spoiler: it didn’t end with Open MPI). We’ll show you the outcomes from that work with some benchmark results including a real workload.
Along the way, we’ll give you a peek at how our engineers work in the open-source community to improve performance for everyone, and you’ll get a sense of the valuable contributions from the people in those communities who support these MPIs.
And – of course – we’ll show you how to get your hands on this so your code just runs faster – on every CPU (not just Graviton).
First, some architecture
If you’ve been paying close attention to the posts on this channel, you’ll know that the Elastic Fabric Adapter (EFA) is key to much of the performance you see in our posts. EFA works underneath your MPI and is often (we hope) just plain invisible to you and your application…