Top500: Frontier Gains 92 Petaflops; Henri Gets a Little Greener

By Tiffany Trader

May 22, 2023

It’s not quite homeostasis, but it’s close. There was little movement in the latest Top500, released today from the International Supercomputing Conference (ISC) in Hamburg, Germany. The 61st list saw few changes: no new systems in the top 10 and only three new systems within the top 50 cohort (including one from Microsoft Azure and one from Nvidia at numbers 11 and 14, respectively). We’ll look more closely at those, detail two interesting system speedups (Frontier and LUMI) and discuss other list trends, including the Green500 rankings.

At the top of the list – for the third time – is Frontier, the first “official” exascale system, hosted by the DOE’s Oak Ridge National Laboratory. The HPE-AMD system is 8.35% percent flopsier, going from 1.102 Linpack exaflops on the November 2022 list to 1.194 Linpack exaflops on the new list. That is a gain of 92 petaflops. Interestingly, the Rpeak for the machine’s Top500 configuration actually dropped marginally, from 1.686 exaflops to 1.680 exaflops, a 0.6% decrease. This makes the Linpack improvement all the more impressive.

ORNL’s optimizations to Frontier resulted in a not-insignificant boost to Linpack efficiency: now 71.1%, up from 65.3% previously. The new, tuned-up configuration only uses slightly more power, too (22.7 MW versus 21.1 MW on the previous list), and it improved its Green500 metric (52.59 gigaflops-per-watt versus 52.23 gigaflops-per-watt) – more on the Green500 below. Frontier also aced the HPL-MxP benchmark (formally known as HPL-AI) with a score of 9.95 exaflops, topping its previous result of 7.9 exaflops.

Not on the list (as many had already surmised) is the Aurora supercomputer, a collaboration of Intel, HPE and Argonne National Laboratory. We have speculated that even if the Argonne Lab system had achieved a “decent” Linpack run for the new machine in time to submit results (and, to be clear, we have no inside knowledge as to whether they did or not), Argonne would not want to submit the system to the list if it didn’t surpass Frontier’s run, given that Aurora – which promises “more than 2 exaflops peak” – is intended to be 25-50% larger than Frontier. (Further, Aurora has had a long runway and was at one point slated to be the first U.S. exascale system.) Some clues as to the relative size difference of the two machines came via a recent presentation from Argonne’s Rick Stevens, delivered as part of the Dell HPC Community webinar series. Stevens said Frontier would provide about ~78 million quad-GPU node hours per year while Aurora would provide ~118 million quad-GPU node hours per year. Ceteris paribus, that suggests that Aurora will be about 50% bigger than its Tennessee cousin. We may have more to report on Aurora after the keynote from Intel HPC chief Jeff McVeigh later today.

The other lingering question about Frontier is whether it’s passed its formal acceptance. There were rumors that acceptance was somewhat delayed on account of the complexity of an exascale shake-out and break-in; now we’re hearing that it might have passed acceptance. We’ve reached out to Oak Ridge and hope to have confirmation soon. (And here it is.)

May 2023 Top500 list. Source: Top500.

Moving on to the other system amplification: we turn our attention to the EuroHPC-backed Leonardo system, hosted by CINECA in Bologna, Italy. This Atos (Eviden) BullSequana XH2000 system, powered by 32-core Intel CPUs and Nvidia A100 GPUs, now delivers 238.7 Linpack petaflops, up from 174.7 Linpack petaflops six months ago. Despite the growth spurt that added an additional 36.6% computing mass and brought Leonardo to its full configuration, the system maintains its fourth-place ranking. It did not eclipse CSC’s LUMI, which is still in third place with 309.1 Linpack petaflops.

In the number two spot for Linpack rankings, Fugaku, Riken’s Arm-based machine, is still the HPCG (High Performance Conjugate Gradients) champ, an honor it has held since it joined the list three years ago in June 2020. In November, Frontier turned in an impressive (for the notably “challenging” HPCG) 14.05 petaflops result. But Fugaku maintains the lead, achieving 16.00 petaflops on this benchmark. The fraction-of-Rpeak efficiencies here are 0.08% (for Frontier) and 2.98% (for Fugaku). The next highest HPCG score, turned in by LUMI, is 3.4 petaflops, which is 0.08% percent of peak. (Kudos to all who submit to this benchmark despite its low-scoring nature. According to its backers, as a companion benchmark to Linpack, the HPCG is “designed to exercise computational and data access patterns that more closely match a different and broad set of important applications.”)

Other than these two noteworthy enhancements, the top 10 lineup itself hasn’t changed (see inset above, right). Represented countries include the United States, Japan, Finland, Italy and China. Manufacturers include HPE, Fujitsu, HPE, Atos (Eviden), Nvidia and IBM, as well as Chinese government organizations NRCPC and NUDT. Speaking of China, there have been no new large system announcements from the nation since the its two exascale machines were made public via the Gordon Bell Prize proceedings but withheld from the Top500 list.

New systems

There are 43 new systems on the list. Only three appear in the top 50 grouping. Explorer-WUS3 from Microsoft Azure debuts at number 11. And Nvidia’s Eos – or rather the “Pre-Eos 128 Node DGX SuperPOD” – comes in at number 14. The next two highest-ranked new systems are a pair of unnamed Fujitsu machines in 50th and 51st positions. Out of this inaugural cohort, some of the notable systems we’ve been watching are NCAR’s Derecho CPU partition (#59, 10.3 Linpack petaflops), Petrobras’ Gaia (#93, 7.0 Linpack petaflops), NCAR’s Derecho GPU partition (#130, 4.7 petaflops), Atos’ Berzelius2 (#147, 4.2 petaflops), and the University of Tsukuba’s Pegasus (#190, 3.5 petaflops).

Microsoft Azure’s new system – Explorer-WUS3 – runs on the cloud provider’s ND96_amsr_MI200_v4 virtual machines (the MI200 variant of Azure’s NDv4-series virtual machine), powered by AMD second-generation (Rome) Epyc 7V12 48-core CPUs and AMD Instinct MI250X GPUs. The hardware is Frontier-esque in its design, except Azure’s servers leverage the OAM (OCP Accelerator Module) form factor, while Frontier employs a custom board. The cluster was spun up in Azure’s West US3 datacenter, and Microsoft has confirmed it is a permanent system that has or will run real HPC and AI workloads. Explorer-WUS3 delivers 53.96 Linpack petaflops out of a potential theoretical peak of 86.99 petaflops, giving it a Linpack efficiency of 62%. This is the second time that Microsoft has provided the highest new entry on a Top500 list; the last time was in November 2021 with a #10 ranked system.

Nvidia’s new “Pre-Eos 128 Node DGX SuperPOD” is another one we’ve been waiting to hear word on, as Nvidia was notably quiet about Eos during their latest GTC proceedings. The apparent precursor to the full Eos system, Pre-Eos aggregates 40.66 Linpack petaflops out of a potential Rpeak of 58.05 (for a Linpack efficiency of 70.0%), using SuperPod-integrated Nvidia DGX H100 systems, equipped with Intel CPUs and Nvidia H100 GPUs, networked with Nvidia’s ConnectX-7 NDR 400G InfiniBand fabric.

Eos system rendering. Courtesy of Nvidia. Click here for the full slide.

Nvidia announced the Eos system, named for the Greek goddess of the dawn, 14 months ago at its spring GTC 2022, noting at the time the system would be “online in a few months.” Eos is based on the fourth-generation DGX system – the DGX H100 – which hooks together eight GPUs using Nvidia’s proprietary high-speed NVLink interconnects. In its full configuration, Eos will span 18 32-DGX H100 Pods, for a total of 576 DGX H100 systems, 4,608 H100 GPUs, 500 Quantum-2 InfiniBand switches, and 360 NVLink switches. According to Nvidia, the full system will provide 18 exaflops of FP8 performance, 9 exaflops of FP16 performance or 275 petaflops of FP64 performance (all peak numbers). We did the math and came up with 156 petaflops FP64 peak when we plugged in 34 teraflops of traditional FP64 per H100 SXM (reference) and multiplied it by the machine’s planned 4,608x GPUs.

Currently, between 31 and 127 DGX H100 systems can be brought together into a SuperPod (in Nvidia’s parlance) using NVLink switches. (Nvidia addresses the seeming discrepancy between the 127-node number and the 128-node Top500 system in an update note below.) Nvidia’s product specifications doc, dated May 2023, notes that “a pair of NVIDIA Unified Fabric Manager (UFM) appliances displaces one DGX H100 system in the DGX SuperPOD deployment pattern, resulting in a maximum of 127 DGX H100 systems per full DGX SuperPOD.” Fully configured and populated, a 127-node SuperPod provides 3.2 exaflops of peak FP8 computing, 34.5 peak traditional FP64 petaflops, or 68.1 peak FP64 tensor core petaflops.

Eos is the planned successor to Selene, which was named for the Greek goddess of the moon and sister of Eos. Selene debuted in seventh place on the Top500 in June 2020 and is currently ranked ninth. The system, which comprises 280 DGX A100 systems, delivers 63.46 petaflops out of an Rpeak of 79.22 petaflops, which translates to a Linpack efficiency of 80.1%. [Note: Nvidia’s Rpeak is configured differently than its marketing peak.]

Update (May 22): An Nvidia spokesperson followed up in response to our question about the 128-node Pre-Eos versus the standard 127-node SuperPod configuration, stating that the “Pre-Eos Top500 submission is an early preview of our full Eos DGX H100 supercomputer currently being built out. It represents a slice of the final machine, so it has a larger fabric to support the rest of the build out to follow.

“The UFM details are correct for a typical DGX SuperPOD built with DGX H100 systems,” the statement read. “But we don’t have the same port limit in the Eos design. Our Pre-Eos system contains 128 DGX H100 supercomputers, 4 SUs (scalable units), out of the final 576 DGX H100s, 18 SUs.”

Green500 highlights

The Green500 stayed similarly static to the Top500. The Flatiron Institute’s Henri system, built by Lenovo and powered by Nvidia’s H100 GPUs (the system marked the chips’ list debut last November) retained the top spot. Henri’s efficiency grew slightly – 65.40 gigaflops-per-watt, up from 65.09 gigaflops-per-watt – but interestingly, its peak petaflops actually dropped (5.42 peak to 3.58 peak) while its Linpack petaflops rose (2.04 Linpack to 2.89 Linpack). This corresponds to more than a doubling in Linpack efficiency – 37.62% to 80.52%. We expected to see something like this increase in efficiency when Henri first hit the lists.

As mentioned, Frontier also got a tiny bit of a glow-up, going from 52.59 gigaflops-per-watt from 52.23 gigaflops-per-watt.

Outside of those efficiency improvements, the top ten of the Green500 actually did see a couple of new entries. In 8th place is the GPU partition of the amplitUDE system at the University of Duisburg-Essen (UDE – get it?) with 51.34 gigaflops-per-watt. At 1.95 Linpack petaflops, amplitUDE – which is also based on Intel Xeon CPUs and Nvidia H100 GPUs – squeaked onto the Top500 list in 483rd. Just after amplitUDE, in 9th: Goethe-NHR at the University of Frankfurt. Based on AMD Epyc CPUs and AMD Instinct MI210 GPUs, Goethe-NHR delivered 46.52 gigaflops-per-watt and ranked 70th on the Top500 list with 9.09 Linpack petaflops. Previous list-topper system MN-3 has left the top ten, ranking 11th this go-around.

All of the top ten are accelerated, as has been the case for five iterations of the list now. This list sees seven accelerated by AMD and three by Nvidia. Eight of the top ten deliver over 50 gigaflops-per-watt, which extrapolates to an exaflop in 20 megawatts or less – a threshold previously thought unrealistic.

Additional Top500 trends

The entry point to the list is now 1.87 petaflops (up from 1.73 petaflops six months ago), and the current 500th-place system on the list held a 456th place ranking on the previous (November 2022) list, which was detailed at SC in Dallas. The aggregate Linpack performance provided by all 500 systems on the 61st edition of the Top500 list is 5.24 exaflops up from 4.86 exaflops six months ago and 4.40 exaflops 12 months ago.

The entry point for the Top100 segment increased to 6.31 petaflops up from 5.78 petaflops six months ago and 5.39 petaflops a year ago. The aggregate Linpack performance of the top 100 systems is 4.07 exaflops out of 5.66 theoretical exaflops, which comes out to a Linpack efficiency of 71.88%. The Linpack efficiency of the top 10 systems is slightly higher at 73.88%, while the overall Linpack efficiency of the entire list is 66.94%.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Point. The system includes Intel's research chip called Loihi 2, Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Poin Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire