HPC Power Efficiency and the Green500

By Kirk W. Cameron

November 20, 2013

The first Green500 List was launched in November 2007 ranking the energy efficiency of supercomputers. Co-founder Kirk W. Cameron discusses the events that led to creation of the Green500 List, its maturation, and future directions.

An Early Supercomputer Efficiency “List”

In 2001 the notions of Green HPC and energy proportional computing were unknown. There was no tangible evidence that power was an issue in supercomputers. Vendors simply built large systems to customer specifications. Performance kept increasing exponentially and while performance efficiency was of interest, power efficiency was not.

My early work in Green HPC was inspired by the tradeoffs inherent to power and performance. I imagined how varying power modes might make supercomputers more efficient. I speculated as to how such technologies would change the way we compute in HPC. But, in the beginning this seemed like a solution looking for a problem. No one at the time believed power was or ever would be an issue in HPC.

I needed data. And lots of it if I was to convince a community power was important. The Top500 List provided a plethora of performance data, but nothing related to power. Many of the larger supercomputer systems posted their specifications online, but the information was spotty at best and it became obvious quickly that no one was measuring power. If I wanted to improve power efficiency in supercomputers, not only would I have to prove conclusively that a power problem existed, I would have to start measuring systems myself! As a software guy, this was daunting.

cameron_fig

Figure 1. Source: NSF Career Proposal Submission, K.W. Cameron, July 2003.

It seems almost comical now, but I spent 4 months obtaining the data for Figure 1. This “list” of power consumption for the top supercomputers from 6 different Top500 lists over ten years was the first of its kind. Perhaps the most striking feature is the exponential increase in raw power consumption of the top systems from 1993 to 2003. Moreover, despite separation by a decade of technological advances, the efficiency of the TMC CM-5 (~12 MFLOPS/watt) was more than double that of the Japanese Earth simulator (5.6 MFLOPS/watt).

The trends are clear and irrefutable. Supercomputer power was a liability and would soon limit scalability. Of course, it would be almost 4 years before the community at-large began to acknowledge supercomputer power was a fundamental constraint. Let’s just say I’ve learned to be patient.

Origins of the Green500 List

Particularly in those early years, I spent a lot of time considering data collection and power measurement. My team built infrastructures and designed tools and methodologies to accurately track power usage in HPC systems. We ported our framework again and again to learn as much as we could about the tradeoffs between power and performance on emergent systems. We also built the first power-scalable HPC system prototype.

Wu Feng approached me in 2006 with the notion of creating a list of power efficient supercomputers akin to the Top500. I was already a firm believer in the need for such data having spent 4 months creating a small list of power consumption for 6 supercomputers. Furthermore, I had spent the last three years designing several generations of power measurement toolkits. My group arguably had compiled the largest, most detailed repository of HPC power data and we had a vast amount of experience measuring HPC system power.

My primary role was to design the power measurement run rules for the first list. We knew that other benchmarking methodologies had suffered when the system could be gamed easily. Based on my experience measuring power, we wrote a set of run rules describing how to easily measure a single node and extrapolate the power for a supercomputer running Linpack. The rules were designed to encourage participation by enabling non-experts to report their own power data with minimal investment in time and money. For those not reporting, we would use the UL ratings (see Figure 1) to fully populate the list.

Ease of participation was paramount. The Linpack benchmark was not ideal, but the only benchmark most supercomputer users reported regularly. MFLOPS per Watt was not an ideal metric, but it was easy to report and would encourage energy efficient, high-performance solutions.

After 6 months of discussion we solicited participation from the broader community. About a year later, in November 2007, we released the first list. The launch of the first Green500 List was an event. As if scripted, just prior to launch, the power problem in data centers had become front-page news and rather suddenly many agreed that supercomputers needed to become more energy efficient.

Some embraced the list and touted high-ranked systems while deriding low-ranked systems. Some complained of being disenfranchised. Some ridiculed our methodology and metrics. Some took issue with the lack of community involvement or coordination with other lists, benchmarks, and government agencies.

The Green500 List Matures

While most of the early dialogue and press affirmed the need for the Green500 List, some valid criticisms led to significant improvements. For example, we released an updated list in early 2008 to include measured numbers from those that did not report to the first list. In succeeding lists, we limited the amount of information we track to focus exclusively on energy efficiency. Later, we obtained research funding to explore the potential use of other benchmarks and metrics.

We’ve actively sought feedback from users as the list has matured. This has resulted in additional lists such as the Little Green500. While entry to the Green500 requires placing among the 500 fastest systems in the world, the Little Green500 broadens this definition to include systems as fast as the slowest supercomputer from the three previous Top500 lists. The goal of this list is to provide efficiency information to those that would deploy smaller systems.

While the Green500 was a bit isolated initially, it is now part of a thriving community of activists promoting energy efficiency. The Climate Savers Computing Initiative, The Green Grid, and the Energy Efficiency HPC Working Group are just a few of the proactive groups that ensure energy efficiency is now a first-class constraint in HPC design, procurement and management. For example, the Energy Efficiency HPC Working Group has been instrumental in identifying limitations in the Green500 measurement methodology. They have invested significant time and effort to isolate these limitations and suggest improvements to our methodologies that will likely be adopted in the future. They have also provided a conduit for opening discussions between the Department of Energy and vendors to establish standard practices for evaluating energy efficiency during the procurement process.

Legacy and Future of the Green500

The legacy of the Green500 is the establishment of a consistent, easy-to-follow set of power measurement run rules and the resulting data. Before the Green500 there was no widely accepted methodology for measuring supercomputer power, no way to track energy efficiency from year to year, and thus no way to encourage efficient design. The Green500 power measurement methodology has persisted nearly unchanged for almost 7 years laying the foundation for a standardized methodology for collecting supercomputer power data. The methodology can always be improved. For example, the Top500 has tweaked its run rules over the years to prevent gaming. However, the early establishment of a set of consistent, easy to follow run rules provided fairness and stability in the Green500 List’s critical infancy.

The stability of the run rules enables us to consistently analyze trends in efficiency data from year to year. These trends lead to a number of interesting observations.

I agree with Horst. Assuming its efficiency could be maintained, the TMC CM-5 system from 1993 would have landed in position #493 on the inaugural November 2007 Green500 List. This position is ahead of both the Earth Simulator (#497) and ASCI Q (#500). From 1993 to 2007 the MFLOPS/watt of the fastest systems went from 12 to 357. From 2007 to 2013 the MFLOPS/watt of the fastest systems went from 357 to 3208.

An exascale system in 20 MW will require 50,000 MFLOPS/watt. If efficiency trends continue as they did from 1993 to 2007, a 20MW exascale system is achievable in about 22 years (2035). The last 6 years saw tremendous efficiency improvements using accelerators. Assuming another efficiency boost from new technology equivalent to the gain from accelerators, an exascale system is achievable in 20 MW in about 9 years (2022). Most likely, we will see moderate gains placing us at exascale in 20 MW by about 2025. This is well beyond the goal of exascale by 2020 in 20MW.

The shell game. While the Green500 gives us loads of information we never had before, there is little information about the power budget of the components of a system. While knowing total power is helpful, knowing how the power is spent across the system is critical to acquisition decisions. Is the majority of the power budget used on the GPUs, the memory, the CPUs, the disks, the network? Most systems in the Green500 are designed from commodity parts assembled at scale. If we truly want to promote efficiency and enable people to make informed design decisions, we need more insight to the details of where power is spent in these larger systems. Is a system with lots of disk arrays more or less of a power hog than a system with lots of GPUs? I really have no idea. And I’ve been studying power for more than a decade.

Will HPC ever embrace power management? The benefit of power management is clear. Save energy. Work abounds showing energy savings can be achieved with little to no performance loss. Nonetheless, most supercomputers disable all power management. On the flip side, power management technologies such as Intel Turbo boost can increase performance maximally within thermal limits. In fact, the SuperMuc supercomputer in Munich, Germany was chastised by some in the community for enabling Turbo boost during their early benchmarking and thus potentially skewing their Linpack results.

Trying to adapt benchmarking methodologies to mitigate against gaming is welcome. Trying to adapt benchmarking methodologies to neutralize the effects of technologies that improve efficiency is counterproductive and I believe ultimately futile. Systems are gaining in complexity every day. They are larger, have more parts and parallelism, and more autonomy in every generation. Processors throttle themselves, and memories and GPUs will soon do the same. Power and performance will not be fixed between two successive runs in these types of dynamic, complex systems. We must develop evaluation methodologies that embrace complexity and non-determinism since they will eventually transcend our ability to adapt. Furthermore, in the long run, the complexity and non-determinism we are attempting to ignore will be essential to maximize performance. Only when we accept complexity and non-determinism as constants can we adopt power management in production systems.

The Future. Accelerators are here to say, but most computational scientists I know refuse to use them. I’m not sure which group will blink first, the hardware designers or the users. Perhaps the middleware folks will come to the rescue and make accelerators more programmable. In any case, I think we’ll see accelerators dominate the Green500 List until they are replaced by a new technology or abandoned by all.

In every talk I’ve seen by Intel and NVidia, the consensus seems to be we are still really in the first generation of accelerators with several significantly advanced generations to come. These next generations are faster, have more parallelism, more on-board memory, more power management, and are more tightly integrated with the board. This means above all more complexity. These systems will be even harder to program and evaluate. They will likely show modest efficiency gains in the Green500, but they will not match the percentage gains from the first generation placing exascale beyond the 2020 goal.

W

While we co-founders have provided a consistent vision, biannual installments of the Green500 List are the work of an army of dedicated students, researchers, and passionate crusaders for energy efficiency. Without selfless adoption by a much broader community, the Green500 List would have been a fleeting anecdote.

It’s been more than twelve years since I started down the Green HPC path. I honestly thought after four to five years we would have exhausted all the interesting problems in HPC efficiency. The Green500 List’s impact has greatly exceeded my expectations. The introduction of a stable and fair methodology to track efficiency has withstood nearly 7 years of scrutiny and highlighted the insatiable need for ongoing research. What I failed to appreciate in the beginning was that power efficiency as a problem would transform and perpetuate with every new generation of supercomputer. Like the challenges of performance, reliability, and security, power efficiency is here to stay.

About the Author

Kirk W. Cameron is a Professor of Computer Science and a Faculty Fellow in the College of Engineering at Virginia Tech. Prof. Cameron is a pioneer and leading expert in Green Computing. Cameron is the Green IT columnist for IEEE Computer, Green500 co-founder, founding member of SPECPower, EPA consultant, Uptime Institute Fellow, and co-founder of power management software startup company MiserWare. His power measurement and management software tools are used by nearly half a million people in more than 160 countries. Accolades for his work include NSF and DOE Career Awards, the IBM Faculty Award, and being named Innovator of the Week by Bloomberg Businessweek Magazine. Prof. Cameron received the Ph.D. in Computer Science from Louisiana State University (2000) and B.S. in Mathematics from the University of Florida (1994).

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

Nvidia Appoints Andy Grant as EMEA Director of Supercomputing, Higher Education, and AI

March 22, 2024

Nvidia recently appointed Andy Grant as Director, Supercomputing, Higher Education, and AI for Europe, the Middle East, and Africa (EMEA). With over 25 years of high-performance computing (HPC) experience, Grant brings a Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Houston We Have a Solution: Addressing the HPC and Tech Talent Gap

March 15, 2024

Generations of Houstonian teachers, counselors, and parents have either worked in the aerospace industry or know people who do - the prospect of entering the fi Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire