This week cloud leader Amazon Web Services announced production availability of Elastic Fabric Adapter (EFA) across multiple AWS regions. Unveiled at re:Invent 2018 as a preview release, EFA and has since been “put through its paces on a variety of tightly-coupled HPC workloads, providing us with valuable feedback and helping us to fine-tune the final product,” stated AWS Chief Evangelist Jeff Barr in a blog post.
“[EFA] is ready to support demanding HPC workloads that need lower and more consistent network latency, along with higher throughput, than is possible with traditional TCP communication. This launch lets you apply the scale, flexibility, and elasticity of the AWS Cloud to tightly-coupled HPC apps and I can’t wait to hear what you do with it. You can, for example, scale up to thousands of compute nodes without having to reserve the hardware or the network ahead of time,” wrote Barr.
Barr provides a brief overview of EFA:
An Elastic Fabric Adapter is an AWS Elastic Network Adapter (ENA) with added capabilities (read my post, Elastic Network Adapter – High Performance Network Interface for Amazon EC2, to learn more about ENA). An EFA can still handle IP traffic, but also supports an important access model commonly called OS bypass. This model allows the application (most commonly through some user-space middleware) access the network interface without having to get the operating system involved with each message. Doing so reduces overhead and allows the application to run more efficiently.
AWS customer CFD Direct has been testing EFA and recently shared some of its benchmarking results, showing strong and weak scaling. For simulating external aerodynamics around a car (97 million total cells), super-linear scaling was achieved past 200 cores, gradually declining to linear scaling at 1,008 cores (about 100,000 simulation cells per core). The simulation of flow over a weir with hydraulic jump (1,008 cores and 100M cells) scales at between 67 percent and 72.6 percent, depending on the “data write” setting.
EFA can be used with c5n.18xlarge and p3dn.24xlarge instances in all regions where those instances are available. Amazon said it will be expanding support to additional EC2 instance types, specifically for the largest sizes of “n” instances of any given type, and for bare metal instances.
More information: https://aws.amazon.com/blogs/aws/now-available-elastic-fabric-adapter-efa-for-tightly-coupled-hpc-workloads/