When we started designing our Elastic Fabric Adapter (EFA) several years ago, I was sceptical about its ability to support customers who run all the difficult-to-scale, “latency-sensitive” codes – like weather simulations and molecular dynamics.
Partly, this was due to these codes emerging from traditional on-premises HPC environments with purpose-built machine architectures. These machines designs are driven by every curve ball and every twist in some complicated algorithms. This is “form meets function”, though maybe “hand meets glove” is maybe more evocative. It’s common in HPC to design machines to fit a code, and also common to find that over time the code comes to fit the machine. My other hesitation came from the HPC hardware community (of which I’m a card-carrying member) getting very focused in recent years on the impact of interconnect latency on application performance. Over time, we came to assume it was a gating factor.
But boy was I wrong. It turns out: EFA rocks at these codes. Ever since we launched EFA we’ve been learning new lessons about whole classes of codes that are more throughput-constrained than limited by latency. Mainly what we learned was that single-packet latency measured by micro-benchmarks is distracting when trying to predict code performance on an HPC cluster.
Today, I want to walk you through some of the decisions we made and give you an insight into how we often solve problems in a different way than others do.
Latency isn’t as important as we thought
Latency isn’t irrelevant, though. It’s just that so many of us in the community overlooked the real goal, which is for MPI ranks on different machines to exchange chunks of data quickly. Single-packet latency measures are only a good proxy for that when your network is perfect, lossless, and uncongested. Real applications send large chunks of data to each other. Real networks are busy. What governs “quickly” is whether the hundreds, or thousands of packets that make up that data exchange arrive intact. They also must arrive soon enough (for the next timestep in a simulation, say) so they’re not holding up the other ranks.
Read the full blog to learn how AWS built its own network interface optimized for the scale of the cloud.