Nvidia Serves Up Its First Arm Datacenter CPU ‘Grace’ During Kitchen Keynote

By Tiffany Trader

April 12, 2021

Today at Nvidia’s annual spring GPU Technology Conference (GTC), held virtually once more due to the pandemic, the company unveiled its first ever Arm-based CPU, called Grace in honor of the famous American programmer Grace Hopper. The announcement of the new Arm CPU follows Nvidia’s 2019 declaration of intent to fully embrace Arm and its September 2020 bid to acquire Arm for $40 billion.

Grace is expected to debut in 2023 with two HPC centers leading the way. The Swiss National Supercomputing Centre (CSCS) and the U.S. Department of Energy’s Los Alamos National Laboratory are the first to announce plans to build Grace-powered supercomputers in partnership with HPE and Nvidia.

Using future-generation Arm Neoverse cores and next-generation Nvidia NVLink interconnect technology, Grace has been designed for tight coupling with Nvidia GPUs to power the very largest AI and HPC workloads, according to Nvidia.

In his third virtual GTC “kitchen keynote,” Nvidia CEO Jensen Huang said the chip, combined with Nvidia’s GPUs and high-performance networking from its Mellanox division, gives Nvidia “the third foundational technology for computing, and the ability to re-architect every aspect of the datacenter for AI.”

Using fourth-generation NVLink technology, Grace enables 900 GB/s of bidirectional bandwidth between the CPU and GPU, driving significantly higher aggregate bandwidth over today’s standard servers (~30x higher says Nvidia). The new architecture also provides cache coherence with a single memory address space, unifying system and HBM GPU memory to simplify programmability.

“Grace highlights the beauty of Arm,” Huang said. “Their IP model allowed us to create the optimal CPU for this application, which achieves x-factor speed up.” He said the Grace CPU will deliver a score of over 300 on the SPECrate2017_int_base benchmark and over 2,400 SPECrate2017_int_base CPU performance per eight-GPU DGX. In comparison, today’s eight-GPU DGX A100 achieves 450 SPECint rate.

Unlike most GPU-accelerated systems on the market today, which have a two-to-one or higher ratio of GPUs to CPUs (with four-to-one being something of a sweet spot), Grace-based systems will be architected with a one-to-one ratio of CPU to GPU. While the company is not yet announcing products, based on Jensen’s SPECint performance claims, it seems an eight-GPU DGX server — with eight Grace CPUs — is in the works.

Nvidia’s fourth-generation NVLink fabric connects CPU to CPU, CPU to GPU and GPU to GPU. The only other CPU to offer native NVLink support is the IBM Power platform (Power8+ and Power9). IBM’s NVLink’d Power9 server (AC922) forms the basis of the Summit and Sierra supercomputers (currently ranked #2 and #3 in the world), installed at Oak Ridge National Lab and Lawrence Livermore Lab, respectively.

Grace also has a new memory subsystem, leveraging LPDDR5X memory technology, which has has twice the bandwidth of today’s DDR4, and is 10 times more energy efficient, according to Nvidia. “We optimize this memory subsystem to support server class reliability through mechanisms like ECC and redundancy,” said Paresh Kharya, senior director of accelerated computing at Nvidia, in a pre-briefing last week.

“This efficiency means you can divert more power towards compute rather than moving the bits around,” said Kharya.

Grace will be supported by Nvidia’s HPC software development kit and its CUDA and CUDA-X libraries.

CSCS and Los Alamos both have Grace-based supercomputers under development with expected delivery in 2023. The CSCS “Alps” system is being billed as the world’s most powerful AI-capable supercomputer, expected to deliver 20 exaflops of performance for AI, using Nvidia’s mixed-precision arithmetic and sparsity features. Based on the HPE Cray XE (formerly Shasta) architecture, Alps will advance the boundaries of whole-earth scale weather and climate simulation, quantum chemistry and quantum physics for the Large Hadron Collider.

Nvidia reports that due to its scale and tight coupling between the CPUs and GPUs, Alps will be able to train the massive GPT-3 language processing model in only two days. That is seven times faster than the Nvidia Selene supercomputer, which is currently ranked number five on the Top500 (with 63.5 Linpack petaflops and 2.8 “AI exaflops”), according to Nvidia.

(Read about CSCS’s software-defined strategy for Alps in this interview with the center’s director Thomas Schultess.)

Scientists at Los Alamos report they are taking delivery of Nvidia A100 GPUs as a first step to receiving a Grace CPU-based system that will facilitate modeling, simulation, and data analysis in support of the lab’s mission. Los Alamos expects to be the first U.S. customer for the new Grace CPUs and will be part of a multi-year codesign collaboration that will inform hardware and software design choices for the benefit of scientific discovery. The lab’s Grace system is also being built by HPE, implementing its Cray EX architecture.

“We’re thrilled by the enthusiasm of the supercomputing community, welcoming us to make Arm a top-notch scientific computing platform,” said Huang today.

“Arm is the most popular CPU in the world, for good reason. It’s super energy-efficient and it’s open licensing model inspires a world of innovators to create product around it,” the CEO said.

Nvidia’s roadmap now includes three chips: the GPU, CPU and DPU. “Each chip architecture has a two-year rhythm, and likely a kicker in between,” Huang said. “One year we’ll focus on x86 platforms, one year we’ll focus on Arm platforms. The Nvidia architecture and platforms will support x86 and Arm, whatever customers and markets prefer.”

The arrival of Grace has in some sense been a decade in the making, stretching back to Nvidia’s 2011 “Project Denver,” the company’s plan to build an integrated CPU+GPU processor (with Arm Neoverse and Nvidia GPU cores) capable of powering personal computers, workstations, servers and supercomputers. The full scope of that project wasn’t realized, but Nvidia did end up making Arm+GPU chips (Tegra/Xaviar and Jetson), for the embedded worlds of mobile, robotics, portable gaming and autonomous vehicles.

In addition to revealing its very own Arm CPU today, Nvidia continues to strengthen its support of Arm-based technologies with partners. Huang announced that together with Amazon Web Services, it is bringing Graviton2 Arm CPUs and Nvidia GPUs together in an EC2 instance, expected later this year. The new instances target demanding cloud workloads, AI, and cloud gaming, said Huang.

Nvidia also announced a partnership with Ampere Computing to create a scientific and cloud computing SDK and reference system. Ampere Computing’s Altra CPU has 80 Neoverse-N1 cores and delivers 285 SPECint rate, “right up there with the highest performance x86,” said Huang.

In addition, Nvidia said it’s entered into a partnership with chip company Marvell to create an edge and enterprise computing SDK and reference system. Marvell’s Octeon chip targets IO storage and 5G processing, and the system is ideal for hyperconverged edge servers, noted Huang.

Absent from today’s news raft was the Cambridge AI research center announced last September, the centerpiece of which is to be an Arm-based supercomputer. Nvidia told HPCwire that the project is still on track, but did not disclose any further details. In October 2020, a company representative told HPCwire “plans are still evolving for the [Cambridge] Arm-based supercomputer,” and said the project was not tied to the closing of the Arm acquisition.

Nvidia also shared that the Cambridge-1 AI SuperPod computer is approaching readiness with updates likely to made during the GTC21 conference proceedings.

With robust support for Arm across its entire ecosystem and the debut of a homegrown Arm CPU, all of the pieces are falling into place for Nvidia’s full-stack datacenter solution. While the pending deal to acquire Arm is still under review, Nvidia is showing it has a strong Arm play with or without actually owning Arm — and that after all is the beauty of the IP licensing model.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Q&A with Altair CEO James Scapa, an HPCwire Person to Watch in 2021

May 14, 2021

Chairman, CEO and co-founder of Altair James R. Scapa closed several acquisitions for the company in 2020, including the purchase and integration of Univa and Ellexus. Scapa founded Altair more than 35 years ago with two Read more…

HLRS HPC Helps to Model Muscle Movements

May 13, 2021

The growing scale of HPC is allowing simulation of more and more complex systems at greater detail than ever before, particularly in the biological research spheres. Now, researchers at the University of Stuttgart are le Read more…

Behind the Met Office’s Procurement of a Billion-Dollar Microsoft System

May 13, 2021

The UK’s national weather service, the Met Office, caused shockwaves of curiosity a few weeks ago when it formally announced that its forthcoming billion-dollar supercomputer – expected to be the most powerful weather and climate-focused supercomputer in the world when it launches in 2022... Read more…

AMD, GlobalFoundries Commit to $1.6 Billion Wafer Supply Deal

May 13, 2021

AMD plans to purchase $1.6 billion worth of wafers from GlobalFoundries in the 2022 to 2024 timeframe, the chipmaker revealed today (May 13) in an SEC filing. In the face of global semiconductor shortages and record-high demand, AMD is renegotiating its Wafer Supply Agreement and bumping up capacity. Read more…

Hyperion Offers Snapshot of Quantum Computing Market

May 13, 2021

The nascent quantum computer (QC) market will grow 27 percent annually (CAGR) reaching $830 million in 2024 according to an update provided today by analyst firm Hyperion Research at the HPC User Forum being held this we Read more…

AWS Solution Channel

Numerical weather prediction on AWS Graviton2

The Weather Research and Forecasting (WRF) model is a numerical weather prediction (NWP) system designed to serve both atmospheric research and operational forecasting needs. Read more…

Hyperion: HPC Server Market Ekes 1 Percent Gain in 2020, Storage Poised for ‘Tipping Point’

May 12, 2021

The HPC User Forum meeting taking place virtually this week (May 11-13) kicked off with Hyperion Research’s market update, covering the 2020 period. Although the HPC server market had been facing a 6.7 percent COVID-re Read more…

Behind the Met Office’s Procurement of a Billion-Dollar Microsoft System

May 13, 2021

The UK’s national weather service, the Met Office, caused shockwaves of curiosity a few weeks ago when it formally announced that its forthcoming billion-dollar supercomputer – expected to be the most powerful weather and climate-focused supercomputer in the world when it launches in 2022... Read more…

AMD, GlobalFoundries Commit to $1.6 Billion Wafer Supply Deal

May 13, 2021

AMD plans to purchase $1.6 billion worth of wafers from GlobalFoundries in the 2022 to 2024 timeframe, the chipmaker revealed today (May 13) in an SEC filing. In the face of global semiconductor shortages and record-high demand, AMD is renegotiating its Wafer Supply Agreement and bumping up capacity. Read more…

Hyperion Offers Snapshot of Quantum Computing Market

May 13, 2021

The nascent quantum computer (QC) market will grow 27 percent annually (CAGR) reaching $830 million in 2024 according to an update provided today by analyst fir Read more…

Hyperion: HPC Server Market Ekes 1 Percent Gain in 2020, Storage Poised for ‘Tipping Point’

May 12, 2021

The HPC User Forum meeting taking place virtually this week (May 11-13) kicked off with Hyperion Research’s market update, covering the 2020 period. Although Read more…

IBM Debuts Qiskit Runtime for Quantum Computing; Reports Dramatic Speed-up

May 11, 2021

In conjunction with its virtual Think event, IBM today introduced an enhanced Qiskit Runtime Software for quantum computing, which it says demonstrated 120x spe Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Fast Pass Through (Some of) the Quantum Landscape with ORNL’s Raphael Pooser

May 7, 2021

In a rather remarkable way, and despite the frequent hype, the behind-the-scenes work of developing quantum computing has dramatically accelerated in the past f Read more…

IBM Research Debuts 2nm Test Chip with 50 Billion Transistors

May 6, 2021

IBM Research today announced the successful prototyping of the world's first 2 nanometer chip, fabricated with silicon nanosheet technology on a standard 300mm Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

CERN Is Betting Big on Exascale

April 1, 2021

The European Organization for Nuclear Research (CERN) involves 23 countries, 15,000 researchers, billions of dollars a year, and the biggest machine in the worl Read more…

HPE Launches Storage Line Loaded with IBM’s Spectrum Scale File System

April 6, 2021

HPE today launched a new family of storage solutions bundled with IBM’s Spectrum Scale Erasure Code Edition parallel file system (description below) and featu Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

Quantum Computer Start-up IonQ Plans IPO via SPAC

March 8, 2021

IonQ, a Maryland-based quantum computing start-up working with ion trap technology, plans to go public via a Special Purpose Acquisition Company (SPAC) merger a Read more…

Leading Solution Providers

Contributors

AMD Launches Epyc ‘Milan’ with 19 SKUs for HPC, Enterprise and Hyperscale

March 15, 2021

At a virtual launch event held today (Monday), AMD revealed its third-generation Epyc “Milan” CPU lineup: a set of 19 SKUs -- including the flagship 64-core, 280-watt 7763 part --  aimed at HPC, enterprise and cloud workloads. Notably, the third-gen Epyc Milan chips achieve 19 percent... Read more…

Can Deep Learning Replace Numerical Weather Prediction?

March 3, 2021

Numerical weather prediction (NWP) is a mainstay of supercomputing. Some of the first applications of the first supercomputers dealt with climate modeling, and Read more…

Livermore’s El Capitan Supercomputer to Debut HPE ‘Rabbit’ Near Node Local Storage

February 18, 2021

A near node local storage innovation called Rabbit factored heavily into Lawrence Livermore National Laboratory’s decision to select Cray’s proposal for its CORAL-2 machine, the lab’s first exascale-class supercomputer, El Capitan. Details of this new storage technology were revealed... Read more…

African Supercomputing Center Inaugurates ‘Toubkal,’ Most Powerful Supercomputer on the Continent

February 25, 2021

Historically, Africa hasn’t exactly been synonymous with supercomputing. There are only a handful of supercomputers on the continent, with few ranking on the Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized in his keynote, Nvidia doesn’t plan to build... Read more…

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

The History of Supercomputing vs. COVID-19

March 9, 2021

The COVID-19 pandemic poses a greater challenge to the high-performance computing community than any before. HPCwire's coverage of the supercomputing response t Read more…

Microsoft to Provide World’s Most Powerful Weather & Climate Supercomputer for UK’s Met Office

April 22, 2021

More than 14 months ago, the UK government announced plans to invest £1.2 billion ($1.56 billion) into weather and climate supercomputing, including procuremen Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire