At ISC, the Green500 Witnesses a New Frontier in Efficient Computing

By Oliver Peckham

June 8, 2022

Back in 2008, the U.S. Defense Advanced Research Projects Agency (DARPA) set an ambitious target: an exascale supercomputer in a 20-megawatt envelope. That target, once viewed by many with skepticism—research at the time predicted that exascale systems would require hundreds of megawatts—has now officially been met by the exascale Frontier supercomputer at Oak Ridge National Laboratory (ORNL). At ISC 2022, the organizers of the Green500 list—which ranks supercomputers based on their flops-per-watt efficiency—discussed this development and more.

A new Frontier for efficiency

The “June” (late May, actually) Green500 list was led by ORNL. In first place: Frontier’s test and development system, Frontier TDS—though we prefer its less official name, “Borg.” Borg (which is effectively just a single cabinet of the same design as Frontier’s 74 main cabinets) delivered 62.68 gigaflops per watt at a total of 19.20 Linpack petaflops. “If you were to naively extrapolate this to an exaflop, it comes in at about 16 megawatts,” said Wu Feng, custodian of the Green500 list and an associate professor at Virginia Tech, during his virtual appearance at ISC 2022. This is a staggering accomplishment of computing efficiency, eclipsing the previous Green500 champion—Preferred Networks’ MN-3—by nearly 60 percent.

Representatives accept the certificate for Frontier TDS/Borg’s Green500 win at ISC 2022. On the left, Rafael Ferreira da Silva, senior research scientist for data lifecycle and scalable workflows at ORNL; on the right, Nicolas Dube, fellow, chief technologist for HPC and chief strategist for HPC at HPE.

Perhaps more impressive, however, is that Frontier itself placed second with 52.23 gigaflops per watt. “Frontier on the Green500 is the highest-placed number-one Top500 supercomputer on the Green500 list in its existence,” Feng said. According to the Green500 list, Frontier delivered 1.102 Linpack exaflops in a 21.1-megawatt envelope, which interpolates to one exaflop at 19.15 megawatts. However, Al Geist—CTO of the Oak Ridge Leadership Computing Facility (OLCF)—revealed during the session that this was a “very conservative number” and that the average power use that Oak Ridge submitted to the Green500 was actually 20.2 megawatts. That works out to 54.5 gigaflops per watt and interpolates to an exaflop in 18.33 megawatts. By this measurement, Frontier is more than 3.5× more efficient than the previous Top500 topper, Riken’s Fugaku system (15.42 gigaflops per watt).

The Frontier supercomputer. Image courtesy of HPE/ORNL.

This efficiency, Geist explained, speaks to a long legacy at ORNL. “Oak Ridge has really been working on energy efficient computing for about a decade,” he said, charting out how this ten-year effort had paid off from the use of GPUs in the Titan system back in 2012 through Frontier today. “Exascale has really been made possible by this sort of 200× improvement in energy-efficient computing.” Geist further credited AMD’s work into making its CPUs and GPUs more efficient, such as by allowing the chips to turn off unused resources at a very granular level. He also credited the list itself: “I think the Green500 has done a remarkable job of making the entire community much more aware of power efficiency and the importance of it.”

Frontier’s shadow even extends beyond the top two systems on the Green500. Frontier—and, by extension, Borg—are HPE Cray EX systems with AMD Milan “Trento” Epyc CPUs, AMD Instinct MI250X GPUs and HPE Slingshot-11 networking. That exact same architecture also appears in the third-place system, the 151.90 Linpack petaflops LUMI supercomputer in Finland (51.63 gigaflops per watt, third place on the Top500). It also appears in the fourth-place system, the 46.10 Linpack petaflops Adastra system in France (50.03 gigaflops per watt, tenth place on the Top500). “All four of these systems all use the same technology that was actually developed for Frontier,” Geist said. Both LUMI and Adastra also extrapolate to an exaflop under 20 megawatts.

The new Green500 top ten, with new entries and measurements in yellow. Image courtesy of Wu Feng.

Green500 trends

All of the top ten systems are accelerated: four with the aforementioned AMD MI250X GPUs, five with Nvidia’s A100 GPUs and one between them in fifth place using the Preferred Networks MN-Core accelerator. Further, Feng said, it was the first time that all of the top ten machines from the previous list stayed on the list—and not just on the list, but in the top 20. However, those four Frontier-type systems shot past the rest of the pack on the list: the average power efficiency of the top ten systems extrapolates to exascale at around 40 megawatts, showcasing the gap between the Frontier architecture and the competition. As shown in the box-and-whisker plot below, the remaining systems on the Green500 list showed modest improvements in efficiency compared to the November list.

Gigaflops per watt over time on the Green500 list. Image courtesy of Wu Feng.

There was another encouraging trend on the new list. The Green500 uses three tiers of efficiency reporting, with a level one measurement representing the whole system across a full run, a level three measurement representing a smaller fraction of the system across the core phase of a run, and a level two measurement somewhere in-between. “The total number of level 2 and level 3 entries continues to grow relative to level 1, so that’s really great,” said Natalie Bates, chair of the Energy Efficient HPC Working Group (EEHPCWG), during the Green500 session. This Green500 list included 102 measured submissions: 57 at level one, 31 at level two and 14 at level three.

Higher stakes, new strategies

Founded 16 years ago, the Green500 list aims to “raise awareness (and encourage reporting) of the energy efficiency of supercomputers” and to “drive energy efficiency as a first-order design constraint (on par with performance).” But when the Green500 list was being conceived, supercomputers rated in single-digit kilowatts; now, systems like Frontier are pulling down double-digit megawatts. ORNL Director Thomas Zacharia said in a press briefing that “when you start the [Linpack] run [on Frontier], the machine, in less than ten seconds, begins to draw an additional 15 megawatts of power … that’s a small city in the U.S., that’s roughly about how much power the city of Oak Ridge consumes.”

The sheer scale of systems like Frontier has put increased urgency on not only how much power the systems themselves consume, but also the efficiency of their supporting infrastructure and the sourcing of the power itself. Indeed, DARPA’s 20-megawatt target for exascale was predicated on costs, as Geist recounted during ORNL’s Advanced Technologies Section webinar last year: “The number that came back from the head of [the] Office of Science at the time was that they weren’t willing to pay over $100 million over the five years, so it’s simple math [based on an average cost of $1 million per megawatt per year]. The 20 megawatts had nothing to do with what might be possible, it was just that stake that we drove in the ground.”

In the Green500 session last week, Geist elaborated that Oak Ridge was dedicated to “not only reducing the amount of energy it takes to run the computer, but reducing the amount of energy it takes to cool the datacenter back down.” As a result, the Frontier datacenter achieves a power usage effectiveness (PUE) of just 1.03. “A lot of work has gone into trying to make this machine as well as the datacenter itself just as efficiently as possible,” Geist said.

The new LUMI datacenter. Image courtesy of CSC.

EuroHPC’s aforementioned LUMI system, meanwhile, is housed in a new datacenter designed with power efficiency and sustainability in mind (pictured above). Sited in an old paper mill in Kajaani, Finland, LUMI—which currently requires less than 10 megawatts to operate—is powered by 100 percent renewable energy (local hydropower) and is designed to sell its waste heat back to the town of Kajaani, further reducing energy costs and resulting in a net-negative carbon footprint. The location in northern Finland also, of course, reduces the need for artificial cooling. During a session on EuroHPC at ISC 2022, Anders Jensen—executive director of the EuroHPC JU—stressed the importance of these holistic energy “stories” for European supercomputers. “[The] Green500 is great,” he said, “but it doesn’t take into account where the energy came from.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use of Rigetti’s Novera 9-qubit QPU. The approach by a quantum Read more…

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire