Supercomputers are the essential tools we need to conduct research, enable scientific discoveries, design new products, and develop self-learning software algorithms. Supercomputing leadership means scientific leadership, which explains the investments made by many governments and research institutes to build faster and more powerful supercomputing platforms.
The heart of a supercomputer is the network that connects the compute elements together, enabling parallel and synchronized computing cycles. Over the past decades, multiple HPC proprietary network technologies have been created, and many of them have disappeared. InfiniBand, an industry standard developed in 1999, continues to show strong presence in the high-performance computing market and to expand its presence in deep learning and cloud infrastructures. Back in 2003, it connected one of the top three supercomputers. According to the November 2019 Top500 list, it connects six of the top ten supercomputers in the world. InfiniBand has been chosen to connect several Exascale programs around the world, one of the world’s most powerful meteorological supercomputers in the European Centre for Medium-Range Weather Forecasts – ECMWF (to be deployed this year), the world-leading supercomputing platforms at Meteo France and Eni, and many more.
Being a standard-based interconnect, InfiniBand enjoys continuous development of new capabilities, better performance, and high scalability, demonstrating 96% network utilization with probably the most advanced adaptive routing capabilities (source “The Design, Deployment, and Evaluation of the CORAL Pre-Exascale Systems”), and delivering leading performance for the most demanding compute-intensive applications.
As mentioned in the previous three articles on “Super-Connecting the Supercomputers,” published in HPCwire [1],[2],[3], InfiniBand technology can be divided into three main pillars: connectivity, network, and communication. The connectivity pillar refers to the elements around the interconnect infrastructure such as topologies. The network pillar refers to the network transport and routing, for example. And the communication pillar refers to co-design elements related to communication frameworks such as MPI, SHMEM/PGAS and more. The first two pillars were discussed in the previous articles, and the third pillar is discussed in this one.
The early focus of InfiniBand technology development was to offload the network functions from the CPU to the network. With the new efforts in the co-design approach, the new generation of smart InfiniBand solutions and technology expand offload capabilities to include the execution of data algorithms within the network. These additional capabilities, referred to as In-Network Computing engines, allow users to run the algorithms as the data is being transferred within the system’s high-performance interconnect, rather than waiting for the data to reach the CPU. In-Network Computing transforms the data center interconnect to become a “distributed CPU” and “distributed memory”, or, in other words, an I/O processing Unit (IPU). The combination of CPUs, GPUs, and IPUs serves as the basis for the next generation of data center and edge computing architectures. The first generation of IPUs is already used in leading HPC and deep learning data centers, has been integrated into multiple MPI and deep learning frameworks, and has demonstrated accelerated performance with a variety of compute and data intensive applications.
HDR 200G InfiniBand technology provides innovative In-Network Computing engines that accelerate and improve applications performance, such as Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), smart hardware based MPI Tag Matching and rendezvous protocol, and more.
Figure 1 showcases the performance advantage of InfiniBand SHARP. MPI AllReduce latency measurements, performed on the InfiniBand Dragonfly+ supercomputer at the University of Toronto, demonstrates seven times higher performance with InfiniBand SHARP (using HPC-X MPI) versus the software MPI (which executes MPI AllReduce on the host CPU).
Figure 2 compares MPI AllReduce latency performance between Ethernet RoCE (RDMA), InfiniBand, and InfiniBand with SHARP. InfiniBand technology was designed for high performance and scalability, whereas Ethernet was designed more for Enterprise applications. As demonstrated, even before adding its smart In-Network Computing engines, InfiniBand demonstrates 1.5 times better latency compared to Ethernet RoCE. With SHARP, InfiniBand demonstrates 4 times higher performance compared to Ethernet RoCE.
Figure 3 showcases the performance advantages of InfiniBand SHARP for deep learning applications – GNMT (neural machine translation) and VAE (variable auto-encoder). The measurements were performed on InfiniBand-connected DGX systems, comparing InfiniBand without SHARP and InfiniBand with SHARP. The performance of InfiniBand SHARP for data reduction operations leads to an increase in the applications performance by nearly 20 percent in both cases.
Figure 4 showcases the performance advantages of another type of In-Network Computing engines –hardware-based MPI Tag Matching. The MVAPICH team of Ohio State University has already demonstrated a 35% improvement in MPI Eager protocol latency. Recently, the team presented yet another advantage of the InfiniBand Tag Matching hardware engine; namely, Tag Matching enabled nearly 100 percent of overlap between compute and communications for MPI Iscatterv operations over 256 nodes, while without InfiniBand Tag Matching the overlap performance was less than 25% at 1MB message size.
The suite of InfiniBand In-Network Computing engines described in this article is part-and-parcel of the HDR InfiniBand technology and solution, and does not exist in any other network such as Ethernet, or proprietary networks (sometimes referred to as “HPC Ethernet” for marketing purposes). Aside from InfiniBand’s extremely low latency advantage over Ethernet or proprietary networks, and its advanced adaptive routing and congestion control mechanisms, InfiniBand’s In-Network Computing technology, which transforms the InfiniBand network into an IPU, is the main reason for the growing usage of InfiniBand in supercomputing, deep learning and large scale cloud platforms.
References:
[1] https://www.hpcwire.com/2019/06/10/super-connecting-the-supercomputers/