At ISC, the Green500 Witnesses a New Frontier in Efficient Computing

By Oliver Peckham

June 8, 2022

Back in 2008, the U.S. Defense Advanced Research Projects Agency (DARPA) set an ambitious target: an exascale supercomputer in a 20-megawatt envelope. That target, once viewed by many with skepticism—research at the time predicted that exascale systems would require hundreds of megawatts—has now officially been met by the exascale Frontier supercomputer at Oak Ridge National Laboratory (ORNL). At ISC 2022, the organizers of the Green500 list—which ranks supercomputers based on their flops-per-watt efficiency—discussed this development and more.

A new Frontier for efficiency

The “June” (late May, actually) Green500 list was led by ORNL. In first place: Frontier’s test and development system, Frontier TDS—though we prefer its less official name, “Borg.” Borg (which is effectively just a single cabinet of the same design as Frontier’s 74 main cabinets) delivered 62.68 gigaflops per watt at a total of 19.20 Linpack petaflops. “If you were to naively extrapolate this to an exaflop, it comes in at about 16 megawatts,” said Wu Feng, custodian of the Green500 list and an associate professor at Virginia Tech, during his virtual appearance at ISC 2022. This is a staggering accomplishment of computing efficiency, eclipsing the previous Green500 champion—Preferred Networks’ MN-3—by nearly 60 percent.

Representatives accept the certificate for Frontier TDS/Borg’s Green500 win at ISC 2022. On the left, Rafael Ferreira da Silva, senior research scientist for data lifecycle and scalable workflows at ORNL; on the right, Nicolas Dube, fellow, chief technologist for HPC and chief strategist for HPC at HPE.

Perhaps more impressive, however, is that Frontier itself placed second with 52.23 gigaflops per watt. “Frontier on the Green500 is the highest-placed number-one Top500 supercomputer on the Green500 list in its existence,” Feng said. According to the Green500 list, Frontier delivered 1.102 Linpack exaflops in a 21.1-megawatt envelope, which interpolates to one exaflop at 19.15 megawatts. However, Al Geist—CTO of the Oak Ridge Leadership Computing Facility (OLCF)—revealed during the session that this was a “very conservative number” and that the average power use that Oak Ridge submitted to the Green500 was actually 20.2 megawatts. That works out to 54.5 gigaflops per watt and interpolates to an exaflop in 18.33 megawatts. By this measurement, Frontier is more than 3.5× more efficient than the previous Top500 topper, Riken’s Fugaku system (15.42 gigaflops per watt).

The Frontier supercomputer. Image courtesy of HPE/ORNL.

This efficiency, Geist explained, speaks to a long legacy at ORNL. “Oak Ridge has really been working on energy efficient computing for about a decade,” he said, charting out how this ten-year effort had paid off from the use of GPUs in the Titan system back in 2012 through Frontier today. “Exascale has really been made possible by this sort of 200× improvement in energy-efficient computing.” Geist further credited AMD’s work into making its CPUs and GPUs more efficient, such as by allowing the chips to turn off unused resources at a very granular level. He also credited the list itself: “I think the Green500 has done a remarkable job of making the entire community much more aware of power efficiency and the importance of it.”

Frontier’s shadow even extends beyond the top two systems on the Green500. Frontier—and, by extension, Borg—are HPE Cray EX systems with AMD Milan “Trento” Epyc CPUs, AMD Instinct MI250X GPUs and HPE Slingshot-11 networking. That exact same architecture also appears in the third-place system, the 151.90 Linpack petaflops LUMI supercomputer in Finland (51.63 gigaflops per watt, third place on the Top500). It also appears in the fourth-place system, the 46.10 Linpack petaflops Adastra system in France (50.03 gigaflops per watt, tenth place on the Top500). “All four of these systems all use the same technology that was actually developed for Frontier,” Geist said. Both LUMI and Adastra also extrapolate to an exaflop under 20 megawatts.

The new Green500 top ten, with new entries and measurements in yellow. Image courtesy of Wu Feng.

Green500 trends

All of the top ten systems are accelerated: four with the aforementioned AMD MI250X GPUs, five with Nvidia’s A100 GPUs and one between them in fifth place using the Preferred Networks MN-Core accelerator. Further, Feng said, it was the first time that all of the top ten machines from the previous list stayed on the list—and not just on the list, but in the top 20. However, those four Frontier-type systems shot past the rest of the pack on the list: the average power efficiency of the top ten systems extrapolates to exascale at around 40 megawatts, showcasing the gap between the Frontier architecture and the competition. As shown in the box-and-whisker plot below, the remaining systems on the Green500 list showed modest improvements in efficiency compared to the November list.

Gigaflops per watt over time on the Green500 list. Image courtesy of Wu Feng.

There was another encouraging trend on the new list. The Green500 uses three tiers of efficiency reporting, with a level one measurement representing the whole system across a full run, a level three measurement representing a smaller fraction of the system across the core phase of a run, and a level two measurement somewhere in-between. “The total number of level 2 and level 3 entries continues to grow relative to level 1, so that’s really great,” said Natalie Bates, chair of the Energy Efficient HPC Working Group (EEHPCWG), during the Green500 session. This Green500 list included 102 measured submissions: 57 at level one, 31 at level two and 14 at level three.

Higher stakes, new strategies

Founded 16 years ago, the Green500 list aims to “raise awareness (and encourage reporting) of the energy efficiency of supercomputers” and to “drive energy efficiency as a first-order design constraint (on par with performance).” But when the Green500 list was being conceived, supercomputers rated in single-digit kilowatts; now, systems like Frontier are pulling down double-digit megawatts. ORNL Director Thomas Zacharia said in a press briefing that “when you start the [Linpack] run [on Frontier], the machine, in less than ten seconds, begins to draw an additional 15 megawatts of power … that’s a small city in the U.S., that’s roughly about how much power the city of Oak Ridge consumes.”

The sheer scale of systems like Frontier has put increased urgency on not only how much power the systems themselves consume, but also the efficiency of their supporting infrastructure and the sourcing of the power itself. Indeed, DARPA’s 20-megawatt target for exascale was predicated on costs, as Geist recounted during ORNL’s Advanced Technologies Section webinar last year: “The number that came back from the head of [the] Office of Science at the time was that they weren’t willing to pay over $100 million over the five years, so it’s simple math [based on an average cost of $1 million per megawatt per year]. The 20 megawatts had nothing to do with what might be possible, it was just that stake that we drove in the ground.”

In the Green500 session last week, Geist elaborated that Oak Ridge was dedicated to “not only reducing the amount of energy it takes to run the computer, but reducing the amount of energy it takes to cool the datacenter back down.” As a result, the Frontier datacenter achieves a power usage effectiveness (PUE) of just 1.03. “A lot of work has gone into trying to make this machine as well as the datacenter itself just as efficiently as possible,” Geist said.

The new LUMI datacenter. Image courtesy of CSC.

EuroHPC’s aforementioned LUMI system, meanwhile, is housed in a new datacenter designed with power efficiency and sustainability in mind (pictured above). Sited in an old paper mill in Kajaani, Finland, LUMI—which currently requires less than 10 megawatts to operate—is powered by 100 percent renewable energy (local hydropower) and is designed to sell its waste heat back to the town of Kajaani, further reducing energy costs and resulting in a net-negative carbon footprint. The location in northern Finland also, of course, reduces the need for artificial cooling. During a session on EuroHPC at ISC 2022, Anders Jensen—executive director of the EuroHPC JU—stressed the importance of these holistic energy “stories” for European supercomputers. “[The] Green500 is great,” he said, “but it doesn’t take into account where the energy came from.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

HPE and NVIDIA Join Forces and Plan Conquest of Enterprise AI Frontier

June 20, 2024

The HPE Discover 2024 conference is currently in full swing, and the keynote address from Hewlett-Packard Enterprise (HPE) CEO Antonio Neri on Tuesday, June 18, was an unforgettable event. Other than being the first busi Read more…

Slide Shows Samsung May be Developing a RISC-V CPU for In-memory AI Chip

June 19, 2024

Samsung may have unintentionally revealed its intent to develop a RISC-V CPU, which a presentation slide showed may be used in an AI chip. The company plans to release an AI accelerator with heavy in-memory processing, b Read more…

ASC24 Student Cluster Competition: Who Won and Why?

June 18, 2024

As is our tradition, we’re going to take a detailed look back at the recently concluded the ASC24 Student Cluster Competition (Asia Supercomputer Community) to see not only who won the various awards, but to figure out Read more…

Qubits 2024: D-Wave’s Steady March to Quantum Success

June 18, 2024

In his opening keynote at D-Wave’s annual Qubits 2024 user meeting, being held in Boston, yesterday and today, CEO Alan Baratz again made the compelling pitch that D-Wave’s brand of analog quantum computing (quantum Read more…

Apple Using Google Cloud Infrastructure to Train and Serve AI

June 18, 2024

Apple has built a new AI infrastructure to deliver AI features introduced in its devices and is utilizing resources available in Google's cloud infrastructure.  Apple's new AI backend includes: A homegrown foun Read more…

Argonne’s Rick Stevens on Energy, AI, and a New Kind of Science

June 17, 2024

The world is currently experiencing two of the largest societal upheavals since the beginning of the Industrial Revolution. One is the rapid improvement and implementation of artificial intelligence (AI) tools, while the Read more…

HPE and NVIDIA Join Forces and Plan Conquest of Enterprise AI Frontier

June 20, 2024

The HPE Discover 2024 conference is currently in full swing, and the keynote address from Hewlett-Packard Enterprise (HPE) CEO Antonio Neri on Tuesday, June 18, Read more…

Slide Shows Samsung May be Developing a RISC-V CPU for In-memory AI Chip

June 19, 2024

Samsung may have unintentionally revealed its intent to develop a RISC-V CPU, which a presentation slide showed may be used in an AI chip. The company plans to Read more…

Qubits 2024: D-Wave’s Steady March to Quantum Success

June 18, 2024

In his opening keynote at D-Wave’s annual Qubits 2024 user meeting, being held in Boston, yesterday and today, CEO Alan Baratz again made the compelling pitch Read more…

Shutterstock_666139696

Argonne’s Rick Stevens on Energy, AI, and a New Kind of Science

June 17, 2024

The world is currently experiencing two of the largest societal upheavals since the beginning of the Industrial Revolution. One is the rapid improvement and imp Read more…

Under The Wire: Nearly HPC News (June 13, 2024)

June 13, 2024

As managing editor of the major global HPC news source, the term "news fire hose" is often mentioned. The analogy is quite correct. In any given week, there are Read more…

Labs Keep Supercomputers Alive for Ten Years as Vendors Pull Support Early

June 12, 2024

Laboratories are running supercomputers for much longer, beyond the typical lifespan, as vendors prematurely deprecate the hardware and stop providing support. Read more…

MLPerf Training 4.0 – Nvidia Still King; Power and LLM Fine Tuning Added

June 12, 2024

There are really two stories packaged in the most recent MLPerf  Training 4.0 results, released today. The first, of course, is the results. Nvidia (currently Read more…

Highlights from GlobusWorld 2024: The Conference for Reimagining Research IT

June 11, 2024

The Globus user conference, now in its 22nd year, brought together over 180 researchers, system administrators, developers, and IT leaders from 55 top research Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Atos goes under, HPE will be the only major option for building large-scale systems. Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Leading Solution Providers

Contributors

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Intel’s Next-gen Falcon Shores Coming Out in Late 2025 

April 30, 2024

It's a long wait for customers hanging on for Intel's next-generation GPU, Falcon Shores, which will be released in late 2025.  "Then we have a rich, a very Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire