Nvidia to Offer a ‘1 Exaflops’ AI Supercomputer with 256 Grace Hopper Superchips

By Tiffany Trader

May 28, 2023

We in HPC sometimes roll our eyes at the term “AI supercomputer,” but a new system from Nvidia might live up to the moniker: the DGX GH200 AI supercomputer. Announced tonight (mid-day Monday in Taiwan) at Computex in Taipei, the newest addition to Nvidia’s burgeoning portfolio ties together 256 Grace-Hopper Superchips, connected by 36 NVLink Switches, to provide over 1 exaflops of FP8 AI performance (or nearly 9 petaflops of FP64 performance). The system further touts 144TB of unified memory, 900 GB/s of GPU-to-GPU bandwidth and 128 TB/s bisection bandwidth. Nvidia is readying the product for end-of-year availability, and notes its Grace Hopper Superchips have entered full production.

Nvidia CEO Jensen Huang shows off the Grace Hopper Superchip live at Computex (May 29, Taipei)

We’re building it now, said Nvidia CEO Jensen Huang during the livestreamed Computex keynote. Every component is in production.

“We [designed] DGX GH200 as a new tool for the development of the next generation generative AI models and capabilities,” said Ian Buck, VP and general manager of hyperscale and HPC at Nvidia, in a press briefing held earlier.

Hyperscalers Google, Meta and Microsoft – not surprisingly – have already queued up to kick the proverbial tires. AWS is not on that early access list.

“[These hyperscalers] will be the first to get access to the DGX GH200 to understand the new capabilities of Grace Hopper, and the multi-node NVLink that allows all those GPUs to work together as one,” said Buck.

One of the system’s top attributes is its 144TB of addressable memory, a combination of the HBM memory of the Hopper GPUs and the LPDDR5X memory of all the Grace CPUs in the network.

Figure shows DGX memory over generations. The DGX GH200 includes the memory of the Arm “Neoverse V2” CPUs, each with 480GB of LPDDR5X memory, and the H100 GPUs, each with 96GB HBM3 memory for this variant. Source: Nvidia.

DGX President Charlie Boyle highlighted the significance of this leap in memory (see chart at right). For some applications, one of our [customers’] historic issues has been ‘my GPU acceleration is fantastic, but my application working space has to fit into GPU memory.’ We have customers with massive applications that they need to accelerate that are much bigger than the 640 gigabytes of memory provided by the Nvidia DGX H100 [equipped with 2 Intel CPUs and 8 H100 GPUs].

The foundation of the DGX GH200 is being able to connect both via hardware in our NVLink Switch system, and software with all of the CUDA primitives, all of our communication libraries, so that customers can operate that system, because there’s 256 discrete computers, there’s 256 operating systems running in there, but our software works together with all of that so that you can launch a single job on the entire memory space using all the GPU capacity and either get a job you couldn’t do previously done or highly accelerate [one you could], said Boyle.

To illustrate the potential speedups, Nvidia shared the following internal benchmarking projections, showing improvements from 2.2x (for the 1T GPT3) all the way to 6.3x (for the 40TB Distributed Join).

Performance comparisons going from a DGX H100 cluster with InfiniBand to a DGX GH200 fully NVLinked system. For each workload, the number of GPUs is the same for grey and green columns. Source: Nvidia.

Boyle compared the introduction of the fully NVLink-connected DGX GH200 to the arrival of NVLink in 2016. For NVLink, which debuted with Pascal, we said every system beyond this is going to have this technology in it, and future Nvidia’s systems, future DGX systems, will also have this external NVLink capability.

Nvidia is presenting the DGX GH200 as a 256-GPU system, which is the fully-configured version. But customers can buy in at 32 nodes, 64 nodes or 128 nodes, and can upgrade at any point along the way. “If somebody started with a 32-, they can buy another 32-; all the switching is already there,” said Boyle. You plug in a few cables, you’ve got 64- and up and up.

The fully configured (256-GPU) Nvidia DGX GH200 has 96 L1 NVLink Switches and 36 L2 NVLink Switches. Source: Nvidia.

This DGX GH200 is Nvidia’s first multi-rack DGX system. Each rack houses 16 Grace Hopper GH200 nodes and 256 nodes fill up 16 racks, as shown in the full article-header image, here. Bigger systems are also possible using InfiniBand to connect multiple DGX GH200s.

In fact, Nvidia is building one such mega-system – its own DGX GH200-based AI supercomputer, named Helios (the name was predicted by HPCwire) – to advance research and development and power the training of very large AI models. The system connects four DGX GH200 systems – for a total of 1,024 Grace Hopper Superchips – using Nvidia’s Quantum-2 InfiniBand networking. Nvidia is planning to bring the system online by the end of the year. Helios will provide about 4 exaflops of AI performance (FP8), and while it’s not the intended use case, would deliver ~34.8 theoretical peak petaflops of traditional FP64 performance. That would put it in the running for a spot in the top decile of the Top500 list if Nvidia opts to submit to the bi-annual list.

No HGX version of the new DGX has been announced yet, but it sounds like one is in the works. Similar to the HGX H100 design that is the foundation of Nvidia’s DGX H100 and which Nvidia makes available to hyperscalers and other system partners to customize to their specifications, Nvidia indicated that the DGX GH200 would be likewise available in an “HGX” form but the company is not making a specific announcement at this time.

On that point, Buck said, “all of these hyperscalers have their own system designs and datacenter designs and further optimize and take what we build in DGX as a blueprint in general, to build their own servers and optimize further for their datacenter infrastructure,” adding, “the components and building blocks and pieces inside of DGX will be made available to these hyperscalers so that they can take it and expand on the design for their custom datacenters and server designs. That product is known as HGX.”

That said, Nvidia is announcing the modular-focused MGX server specification. MGX is “an open, flexible and forward-compatible system reference architecture for accelerated computing,” said Buck.

(Informally, the ‘D’ in DGX stands for datacenter, the ‘H’ for hyperscale, ‘C’ is for cloud, there’s an Omniverse-targeted OVX, and now ‘M’ for modular, but this usage is not officially sanctioned by Nvidia’s PR department).

The modular architecture design was created to standardize server designs – pertaining to the mechanical, thermal and power aspects of servers – which may be outfitted with GPUs, CPUs and DPUs from Nvidia and others (both x86 and Arm). “By standardizing, the different components become replaceable, interchangeable and future-proofed to forward Nvidia products and others. With the new MGX reference architecture, we see the possibility to create a new design in as little as two months at a fraction of the cost,” said Buck, who had cited that the design process as it exists currently can take as long as 18 months.

MGX will support the following form factors:

  • Chassis: 1U, 2U, 4U (air or liquid cooled)
  • GPUs: Full Nvidia GPU portfolio including the latest H100, L40, L4
  • CPUs: Nvidia Grace CPU Superchip, GH200 Grace Hopper Superchip, x86 CPUs
  • Networking: Nvidia BlueField-3 DPU, ConnectX-7 network adapters
Three MGX server designs. Source: Nvidia.

If you’re wondering how MGX is different from HGX, Nvidia provided this explanation: “MGX differs from Nvidia HGX in that it offers flexible, multi-generational compatibility with Nvidia products to ensure that system builders can reuse existing designs and easily adopt next-generation products without expensive redesigns. In contrast, HGX is based on an NVLink-connected multi-GPU baseboard tailored to scale to create the ultimate in AI and HPC systems.”

In other words, HGX is just a baseboard, while MGX is a full reference architecture.

MGX is compatible with the Open Compute Project and Electronic Industries Alliance server racks and is supported by Nvidia’s full software stack, including Nvidia AI Enterprise.

ASRock Rack, ASUS, GIGABYTE, Pegatron, QCT and Supermicro have begun to implement MGX into their product design process. Two such products were announced today with planned availability in August: QCT’s S74G-2U system is based on the Nvidia GH200 Grace Hopper Superchip, and Supermicro’s ARS-221GL-NR system employs the Nvidia Grace CPU Superchip.

Another launch partner, Softbank, is relying on MGX to craft custom servers that will be deployed in its hyperscale datacenters across Japan. The design that Softbank created using the blueprints will help them dynamically allocate GPU resources in multi-use settings, for example to support both generative AI and 5G workloads.

Nvidia also declared that the GH200 Grace Hopper Superchip is in full production. In total, Nvidia says it now has more than 400 system configurations based on its latest CPU and GPU architectures including Nvidia Grace, Nvidia Hopper and Nvidia Ada Lovelace targeting the demand for generative AI. The systems all align with Nvidia’s software stack, including Nvidia AI Enterprise, Omnivere and the RTX platform.

No external system wins have yet been announced for the DGX GH200, but several Grace Hopper Superchip systems have been announced previously, using the same GH200 Superchips. Alps, the new Swiss supercomputing infrastructure at CSCS, is slated to debut the hybrid Arm-GPU architecture, while the U.S. prepares for the arrival of its first Grace Hopper system, “Venado,” at Los Alamos National Laboratory. Grace Hopper Superchips will also power the new Shaheen III supercomputer at KAUST. All three supercomputers are being built by HPE and are exected to be fully operational and available to researchers next year.

Nvidia declined to answer a question about the power draw of the DGX GH200 system, and likewise didn’t provide pricing info, but noted that all DGX products are sold through partners who help set the final customer prices. Regarding TDP, if you take the DGX H100 as a proxy, and multiple the max power consumption of 10.2kW by 32 (to get to 256 GPUs), that comes out to 326.4kW. We’ll update when the actual power spec becomes available.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Watsonx Brings AI Visibility to Banking Systems

September 21, 2023

A new set of AI-based code conversion tools is available with IBM watsonx. Before introducing the new "watsonx," let's talk about the previous generation Watson, perhaps better known as "Jeopardy!-Watson." The origi Read more…

Researchers Advance Topological Superconductors for Quantum Computing

September 21, 2023

Quantum computers process information using quantum bits, or qubits, based on fragile, short-lived quantum mechanical states. To make qubits robust and tailor them for applications, researchers from the Department of Ene Read more…

Fortran: Still Compiling After All These Years

September 20, 2023

A recent article appearing in EDN (Electrical Design News) points out that on this day, September 20, 1954, the first Fortran program ran on a mainframe computer. Originally developed by IBM, Fortran (or FORmula TRANslat Read more…

Intel’s Gelsinger Lays Out Vision and Map at Innovation 2023 Conference

September 20, 2023

Intel’s sprawling, optimistic vision for the future was on full display yesterday in CEO Pat Gelsinger’s opening keynote at the Intel Innovation 2023 conference being held in San Jose. While technical details were sc Read more…

Intel Showcases “AI Everywhere” Strategy in MLPerf Inferencing v3.1

September 18, 2023

Intel used the latest MLPerf Inference (version 3.1) results as a platform to reinforce its developing “AI Everywhere” vision, which rests upon 4th gen Xeon CPUs and Gaudi2 (Habana) accelerators. Both fared well on t Read more…

AWS Solution Channel

Shutterstock 1679562793

How Maxar Builds Short Duration ‘Bursty’ HPC Workloads on AWS at Scale

Introduction

High performance computing (HPC) has been key to solving the most complex problems in every industry and has been steadily changing the way we work and live. Read more…

QCT Solution Channel

QCT and Intel Codeveloped QCT DevCloud Program to Jumpstart HPC and AI Development

Organizations and developers face a variety of issues in developing and testing HPC and AI applications. Challenges they face can range from simply having access to a wide variety of hardware, frameworks, and toolkits to time spent on installation, development, testing, and troubleshooting which can lead to increases in cost. Read more…

Survey: Majority of US Workers Are Already Using Generative AI Tools, But Company Policies Trail Behind

September 18, 2023

A new survey from the Conference Board indicates that More than half of US employees are already using generative AI tools, at least occasionally, to accomplish work-related tasks. Yet some three-quarters of companies st Read more…

Watsonx Brings AI Visibility to Banking Systems

September 21, 2023

A new set of AI-based code conversion tools is available with IBM watsonx. Before introducing the new "watsonx," let's talk about the previous generation Watson Read more…

Intel’s Gelsinger Lays Out Vision and Map at Innovation 2023 Conference

September 20, 2023

Intel’s sprawling, optimistic vision for the future was on full display yesterday in CEO Pat Gelsinger’s opening keynote at the Intel Innovation 2023 confer Read more…

Intel Showcases “AI Everywhere” Strategy in MLPerf Inferencing v3.1

September 18, 2023

Intel used the latest MLPerf Inference (version 3.1) results as a platform to reinforce its developing “AI Everywhere” vision, which rests upon 4th gen Xeon Read more…

China’s Quiet Journey into Exascale Computing

September 17, 2023

As reported in the South China Morning Post HPC pioneer Jack Dongarra mentioned the lack of benchmarks from recent HPC systems built by China. “It’s a we Read more…

Nvidia Releasing Open-Source Optimized Tensor RT-LLM Runtime with Commercial Foundational AI Models to Follow Later This Year

September 14, 2023

Nvidia's large-language models will become generally available later this year, the company confirmed. Organizations widely rely on Nvidia's graphics process Read more…

MLPerf Releases Latest Inference Results and New Storage Benchmark

September 13, 2023

MLCommons this week issued the results of its latest MLPerf Inference (v3.1) benchmark exercise. Nvidia was again the top performing accelerator, but Intel (Xeo Read more…

Need Some H100 GPUs? Nvidia is Listening

September 12, 2023

During a recent earnings call, Tesla CEO Elon Musk, the world's richest man, summed up the shortage of Nvidia enterprise GPUs in a few sentences.  "We're us Read more…

Intel Getting Squeezed and Benefiting from Nvidia GPU Shortages

September 10, 2023

The shortage of Nvidia's GPUs has customers searching for scrap heap to kickstart makeshift AI projects, and Intel is benefitting from it. Customers seeking qui Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

ISC 2023 Booth Videos

Cornelis Networks @ ISC23
Dell Technologies @ ISC23
Intel @ ISC23
Lenovo @ ISC23
Microsoft @ ISC23
ISC23 Playlist
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire