The amount of data being generated and consumed in all areas of real-time processing, simulation, and AI is placing heavy networking demands on data center server and storage infrastructures. In addition, supercomputing centers are increasingly opening their data centers to multitudes of users, many from outside their organizations. At the same time, the world’s cloud service providers are beginning to offer more supercomputing services to their millions of customers.
The requirements of today’s supercomputing centers and public clouds are converging and must be provided with the greatest performance possible for these next-generation high performance computing (HPC), AI, and data analytics challenges—while also securely isolating workloads and responding to the varying demands of user traffic.
During GTC Fall 2021 and SC21, NVIDIA announced NVIDIA Quantum-2, the next generation of its InfiniBand networking platform, which offers the extreme performance, broad accessibility, and strong security needed by cloud computing providers and supercomputing centers. NVIDIA Quantum-2 is paving the way with software-defined networking, In-Network Computing acceleration, remote direct memory access (RDMA), and the fastest speeds and feeds—including impressive advancements over the previous InfiniBand generation. NVIDIA Quantum-2 InfiniBand effectively doubles the network speed to 400Gb/s and triples the number of network ports from the previous generation. It accelerates performance by 3X and reduces the need for data center fabric switches by 6X, while cutting data center power consumption and reducing data center space by 7 percent each. And that’s just the beginning.
Data center operators can also utilize NVIDIA Quantum-2’s cloud-native supercomputing architecture, which applies an advanced telemetry-based congestion control system that enables performance isolation so users have guaranteed and repeatable application performance—regardless of spikes in users or workload demands on the shared resources. A nanosecond-precision timing system integrated into NVIDIA Quantum-2 can synchronize distributed applications—like database processing—helping to reduce the overhead of wait and idle times. NVIDIA Quantum-2 is indeed a cloud-native HPC and AI platform that delivers uncompromised performance on an infrastructure platform that meets cloud services requirements
Among the first to plug into the NVIDIA Quantum-2 InfiniBand platform are Texas A&M University and Mississippi State University. Texas A&M’s new ACES supercomputer will use NVIDIA Quantum-2 to connect researchers to a mix of five accelerators from four vendors and ensure that a single job on ACES can scale up using all the computing cores and accelerators. Funded by the U.S. National Oceanic and Atmospheric Administration (NOAA), Mississippi State’s new system will help keep the university at the leading edge in HPC and will conduct work for NOAA’s missions, as well as research for MSU. The intense workload requirements rely on the unparalleled bandwidth, latency, and In-Network Computing acceleration that only InfiniBand can provide to make their research possible.
NVIDIA Quantum-2 InfiniBand will also be used in the new Atos and NVIDIA Excellence AI Lab (EXAIL). Atos will develop an exascale-class BullSequana X supercomputer to help advance European computing technologies, education, and research. The lab’s first research projects will focus on five key areas enabled by advances in HPC and AI, including climate research, healthcare and genomics, hybridization with quantum computing, edge AI/computer vision, and cybersecurity. We look forward to the new system and its capabilities for advancing vital research to address pressing global challenges surrounding climate change.
Finally, as reported in the Nov 2021 TOP500 Supercomputers list, the number of InfiniBand-connected systems grew 14 percent year-over-year, representing 179 systems, including seven of the TOP10. The list highlights InfiniBand’s continued leadership as the most used interconnect solution for HPC platforms, but also highlights the emergence of cloud-native supercomputing systems.
Additionally, Microsoft’s InfiniBand-accelerated Azure supercomputer, ranked tenth on the list, is the first TOP10 showing for a cloud-based system. As more cloud systems make the TOP500 list, leveraging the NVIDIA Quantum-2 InfiniBand networking platform and its cloud-native supercomputing architecture, the future of HPC is starting to look a bit “cloudy.”
Brian Sparks, Sr. Director, HPC and InfiniBand Marketing, NVIDIA
Brian Sparks is a senior marketing and corporate communications executive with over 20 years of experience in the HPC, hyperscale and cloud data center markets. Brian has previously held Marketing Working Group Chair positions in the InfiniBand Trade Association (IBTA) and the OpenFabrics Alliance (OFA) and is the current Marketing Working Group Chair for the Unified Communication Framework (UCF) Consortium. Brian holds a B.A. degree in Communications from San Jose State University.