HPC Power Efficiency and the Green500

By Kirk W. Cameron

November 20, 2013

The first Green500 List was launched in November 2007 ranking the energy efficiency of supercomputers. Co-founder Kirk W. Cameron discusses the events that led to creation of the Green500 List, its maturation, and future directions.

An Early Supercomputer Efficiency “List”

In 2001 the notions of Green HPC and energy proportional computing were unknown. There was no tangible evidence that power was an issue in supercomputers. Vendors simply built large systems to customer specifications. Performance kept increasing exponentially and while performance efficiency was of interest, power efficiency was not.

My early work in Green HPC was inspired by the tradeoffs inherent to power and performance. I imagined how varying power modes might make supercomputers more efficient. I speculated as to how such technologies would change the way we compute in HPC. But, in the beginning this seemed like a solution looking for a problem. No one at the time believed power was or ever would be an issue in HPC.

I needed data. And lots of it if I was to convince a community power was important. The Top500 List provided a plethora of performance data, but nothing related to power. Many of the larger supercomputer systems posted their specifications online, but the information was spotty at best and it became obvious quickly that no one was measuring power. If I wanted to improve power efficiency in supercomputers, not only would I have to prove conclusively that a power problem existed, I would have to start measuring systems myself! As a software guy, this was daunting.

cameron_fig

Figure 1. Source: NSF Career Proposal Submission, K.W. Cameron, July 2003.

It seems almost comical now, but I spent 4 months obtaining the data for Figure 1. This “list” of power consumption for the top supercomputers from 6 different Top500 lists over ten years was the first of its kind. Perhaps the most striking feature is the exponential increase in raw power consumption of the top systems from 1993 to 2003. Moreover, despite separation by a decade of technological advances, the efficiency of the TMC CM-5 (~12 MFLOPS/watt) was more than double that of the Japanese Earth simulator (5.6 MFLOPS/watt).

The trends are clear and irrefutable. Supercomputer power was a liability and would soon limit scalability. Of course, it would be almost 4 years before the community at-large began to acknowledge supercomputer power was a fundamental constraint. Let’s just say I’ve learned to be patient.

Origins of the Green500 List

Particularly in those early years, I spent a lot of time considering data collection and power measurement. My team built infrastructures and designed tools and methodologies to accurately track power usage in HPC systems. We ported our framework again and again to learn as much as we could about the tradeoffs between power and performance on emergent systems. We also built the first power-scalable HPC system prototype.

Wu Feng approached me in 2006 with the notion of creating a list of power efficient supercomputers akin to the Top500. I was already a firm believer in the need for such data having spent 4 months creating a small list of power consumption for 6 supercomputers. Furthermore, I had spent the last three years designing several generations of power measurement toolkits. My group arguably had compiled the largest, most detailed repository of HPC power data and we had a vast amount of experience measuring HPC system power.

My primary role was to design the power measurement run rules for the first list. We knew that other benchmarking methodologies had suffered when the system could be gamed easily. Based on my experience measuring power, we wrote a set of run rules describing how to easily measure a single node and extrapolate the power for a supercomputer running Linpack. The rules were designed to encourage participation by enabling non-experts to report their own power data with minimal investment in time and money. For those not reporting, we would use the UL ratings (see Figure 1) to fully populate the list.

Ease of participation was paramount. The Linpack benchmark was not ideal, but the only benchmark most supercomputer users reported regularly. MFLOPS per Watt was not an ideal metric, but it was easy to report and would encourage energy efficient, high-performance solutions.

After 6 months of discussion we solicited participation from the broader community. About a year later, in November 2007, we released the first list. The launch of the first Green500 List was an event. As if scripted, just prior to launch, the power problem in data centers had become front-page news and rather suddenly many agreed that supercomputers needed to become more energy efficient.

Some embraced the list and touted high-ranked systems while deriding low-ranked systems. Some complained of being disenfranchised. Some ridiculed our methodology and metrics. Some took issue with the lack of community involvement or coordination with other lists, benchmarks, and government agencies.

The Green500 List Matures

While most of the early dialogue and press affirmed the need for the Green500 List, some valid criticisms led to significant improvements. For example, we released an updated list in early 2008 to include measured numbers from those that did not report to the first list. In succeeding lists, we limited the amount of information we track to focus exclusively on energy efficiency. Later, we obtained research funding to explore the potential use of other benchmarks and metrics.

We’ve actively sought feedback from users as the list has matured. This has resulted in additional lists such as the Little Green500. While entry to the Green500 requires placing among the 500 fastest systems in the world, the Little Green500 broadens this definition to include systems as fast as the slowest supercomputer from the three previous Top500 lists. The goal of this list is to provide efficiency information to those that would deploy smaller systems.

While the Green500 was a bit isolated initially, it is now part of a thriving community of activists promoting energy efficiency. The Climate Savers Computing Initiative, The Green Grid, and the Energy Efficiency HPC Working Group are just a few of the proactive groups that ensure energy efficiency is now a first-class constraint in HPC design, procurement and management. For example, the Energy Efficiency HPC Working Group has been instrumental in identifying limitations in the Green500 measurement methodology. They have invested significant time and effort to isolate these limitations and suggest improvements to our methodologies that will likely be adopted in the future. They have also provided a conduit for opening discussions between the Department of Energy and vendors to establish standard practices for evaluating energy efficiency during the procurement process.

Legacy and Future of the Green500

The legacy of the Green500 is the establishment of a consistent, easy-to-follow set of power measurement run rules and the resulting data. Before the Green500 there was no widely accepted methodology for measuring supercomputer power, no way to track energy efficiency from year to year, and thus no way to encourage efficient design. The Green500 power measurement methodology has persisted nearly unchanged for almost 7 years laying the foundation for a standardized methodology for collecting supercomputer power data. The methodology can always be improved. For example, the Top500 has tweaked its run rules over the years to prevent gaming. However, the early establishment of a set of consistent, easy to follow run rules provided fairness and stability in the Green500 List’s critical infancy.

The stability of the run rules enables us to consistently analyze trends in efficiency data from year to year. These trends lead to a number of interesting observations.

I agree with Horst. Assuming its efficiency could be maintained, the TMC CM-5 system from 1993 would have landed in position #493 on the inaugural November 2007 Green500 List. This position is ahead of both the Earth Simulator (#497) and ASCI Q (#500). From 1993 to 2007 the MFLOPS/watt of the fastest systems went from 12 to 357. From 2007 to 2013 the MFLOPS/watt of the fastest systems went from 357 to 3208.

An exascale system in 20 MW will require 50,000 MFLOPS/watt. If efficiency trends continue as they did from 1993 to 2007, a 20MW exascale system is achievable in about 22 years (2035). The last 6 years saw tremendous efficiency improvements using accelerators. Assuming another efficiency boost from new technology equivalent to the gain from accelerators, an exascale system is achievable in 20 MW in about 9 years (2022). Most likely, we will see moderate gains placing us at exascale in 20 MW by about 2025. This is well beyond the goal of exascale by 2020 in 20MW.

The shell game. While the Green500 gives us loads of information we never had before, there is little information about the power budget of the components of a system. While knowing total power is helpful, knowing how the power is spent across the system is critical to acquisition decisions. Is the majority of the power budget used on the GPUs, the memory, the CPUs, the disks, the network? Most systems in the Green500 are designed from commodity parts assembled at scale. If we truly want to promote efficiency and enable people to make informed design decisions, we need more insight to the details of where power is spent in these larger systems. Is a system with lots of disk arrays more or less of a power hog than a system with lots of GPUs? I really have no idea. And I’ve been studying power for more than a decade.

Will HPC ever embrace power management? The benefit of power management is clear. Save energy. Work abounds showing energy savings can be achieved with little to no performance loss. Nonetheless, most supercomputers disable all power management. On the flip side, power management technologies such as Intel Turbo boost can increase performance maximally within thermal limits. In fact, the SuperMuc supercomputer in Munich, Germany was chastised by some in the community for enabling Turbo boost during their early benchmarking and thus potentially skewing their Linpack results.

Trying to adapt benchmarking methodologies to mitigate against gaming is welcome. Trying to adapt benchmarking methodologies to neutralize the effects of technologies that improve efficiency is counterproductive and I believe ultimately futile. Systems are gaining in complexity every day. They are larger, have more parts and parallelism, and more autonomy in every generation. Processors throttle themselves, and memories and GPUs will soon do the same. Power and performance will not be fixed between two successive runs in these types of dynamic, complex systems. We must develop evaluation methodologies that embrace complexity and non-determinism since they will eventually transcend our ability to adapt. Furthermore, in the long run, the complexity and non-determinism we are attempting to ignore will be essential to maximize performance. Only when we accept complexity and non-determinism as constants can we adopt power management in production systems.

The Future. Accelerators are here to say, but most computational scientists I know refuse to use them. I’m not sure which group will blink first, the hardware designers or the users. Perhaps the middleware folks will come to the rescue and make accelerators more programmable. In any case, I think we’ll see accelerators dominate the Green500 List until they are replaced by a new technology or abandoned by all.

In every talk I’ve seen by Intel and NVidia, the consensus seems to be we are still really in the first generation of accelerators with several significantly advanced generations to come. These next generations are faster, have more parallelism, more on-board memory, more power management, and are more tightly integrated with the board. This means above all more complexity. These systems will be even harder to program and evaluate. They will likely show modest efficiency gains in the Green500, but they will not match the percentage gains from the first generation placing exascale beyond the 2020 goal.

W

While we co-founders have provided a consistent vision, biannual installments of the Green500 List are the work of an army of dedicated students, researchers, and passionate crusaders for energy efficiency. Without selfless adoption by a much broader community, the Green500 List would have been a fleeting anecdote.

It’s been more than twelve years since I started down the Green HPC path. I honestly thought after four to five years we would have exhausted all the interesting problems in HPC efficiency. The Green500 List’s impact has greatly exceeded my expectations. The introduction of a stable and fair methodology to track efficiency has withstood nearly 7 years of scrutiny and highlighted the insatiable need for ongoing research. What I failed to appreciate in the beginning was that power efficiency as a problem would transform and perpetuate with every new generation of supercomputer. Like the challenges of performance, reliability, and security, power efficiency is here to stay.

About the Author

Kirk W. Cameron is a Professor of Computer Science and a Faculty Fellow in the College of Engineering at Virginia Tech. Prof. Cameron is a pioneer and leading expert in Green Computing. Cameron is the Green IT columnist for IEEE Computer, Green500 co-founder, founding member of SPECPower, EPA consultant, Uptime Institute Fellow, and co-founder of power management software startup company MiserWare. His power measurement and management software tools are used by nearly half a million people in more than 160 countries. Accolades for his work include NSF and DOE Career Awards, the IBM Faculty Award, and being named Innovator of the Week by Bloomberg Businessweek Magazine. Prof. Cameron received the Ph.D. in Computer Science from Louisiana State University (2000) and B.S. in Mathematics from the University of Florida (1994).

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Researchers Scale COSMO Climate Code to 4888 GPUs on Piz Daint

October 17, 2017

Effective global climate simulation, sorely needed to anticipate and cope with global warming, has long been computationally challenging. Two of the major obstacles are the needed resolution and prolonged time to compute Read more…

By John Russell

UCSD Web-based Tool Tracking CA Wildfires Generates 1.5M Views

October 16, 2017

Tracking the wildfires raging in northern CA is an unpleasant but necessary part of guiding efforts to fight the fires and safely evacuate affected residents. One such tool – Firemap – is a web-based tool developed b Read more…

By John Russell

Exascale Imperative: New Movie from HPE Makes a Compelling Case

October 13, 2017

Why is pursuing exascale computing so important? In a new video – Hewlett Packard Enterprise: Eighteen Zeros – four HPE executives, a prominent national lab HPC researcher, and HPCwire managing editor Tiffany Trader Read more…

By John Russell

HPE Extreme Performance Solutions

“Lunch & Learn” to Explore the Growing Applications of Genomic Analytics

In the digital age of medicine, healthcare providers are rapidly transforming their approach to patient care. Traditional technologies are no longer sufficient to process vast quantities of medical data (including patient histories, treatment plans, diagnostic reports, and more), challenging organizations to invest in a new style of IT to enable faster and higher-quality care. Read more…

Intel Delivers 17-Qubit Quantum Chip to European Research Partner

October 10, 2017

On Tuesday, Intel delivered a 17-qubit superconducting test chip to research partner QuTech, the quantum research institute of Delft University of Technology (TU Delft) in the Netherlands. The announcement marks a major milestone in the 10-year, $50-million collaborative relationship with TU Delft and TNO, the Dutch Organization for Applied Research, to accelerate advancements in quantum computing. Read more…

By Tiffany Trader

Intel Delivers 17-Qubit Quantum Chip to European Research Partner

October 10, 2017

On Tuesday, Intel delivered a 17-qubit superconducting test chip to research partner QuTech, the quantum research institute of Delft University of Technology (TU Delft) in the Netherlands. The announcement marks a major milestone in the 10-year, $50-million collaborative relationship with TU Delft and TNO, the Dutch Organization for Applied Research, to accelerate advancements in quantum computing. Read more…

By Tiffany Trader

Fujitsu Tapped to Build 37-Petaflops ABCI System for AIST

October 10, 2017

Fujitsu announced today it will build the long-planned AI Bridging Cloud Infrastructure (ABCI) which is set to become the fastest supercomputer system in Japan Read more…

By John Russell

HPC Chips – A Veritable Smorgasbord?

October 10, 2017

For the first time since AMD's ill-fated launch of Bulldozer the answer to the question, 'Which CPU will be in my next HPC system?' doesn't have to be 'Whichever variety of Intel Xeon E5 they are selling when we procure'. Read more…

By Dairsie Latimer

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Intel Debuts Programmable Acceleration Card

October 5, 2017

With a view toward supporting complex, data-intensive applications, such as AI inference, video streaming analytics, database acceleration and genomics, Intel i Read more…

By Doug Black

OLCF’s 200 Petaflops Summit Machine Still Slated for 2018 Start-up

October 3, 2017

The Department of Energy’s planned 200 petaflops Summit computer, which is currently being installed at Oak Ridge Leadership Computing Facility, is on track t Read more…

By John Russell

US Exascale Program – Some Additional Clarity

September 28, 2017

The last time we left the Department of Energy’s exascale computing program in July, things were looking very positive. Both the U.S. House and Senate had pas Read more…

By Alex R. Larzelere

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

How ‘Knights Mill’ Gets Its Deep Learning Flops

June 22, 2017

Intel, the subject of much speculation regarding the delayed, rewritten or potentially canceled “Aurora” contract (the Argonne Lab part of the CORAL “ Read more…

By Tiffany Trader

Reinders: “AVX-512 May Be a Hidden Gem” in Intel Xeon Scalable Processors

June 29, 2017

Imagine if we could use vector processing on something other than just floating point problems.  Today, GPUs and CPUs work tirelessly to accelerate algorithms Read more…

By James Reinders

NERSC Scales Scientific Deep Learning to 15 Petaflops

August 28, 2017

A collaborative effort between Intel, NERSC and Stanford has delivered the first 15-petaflops deep learning software running on HPC platforms and is, according Read more…

By Rob Farber

Oracle Layoffs Reportedly Hit SPARC and Solaris Hard

September 7, 2017

Oracle’s latest layoffs have many wondering if this is the end of the line for the SPARC processor and Solaris OS development. As reported by multiple sources Read more…

By John Russell

US Coalesces Plans for First Exascale Supercomputer: Aurora in 2021

September 27, 2017

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the "Aurora" supercompute Read more…

By Tiffany Trader

Google Releases Deeplearn.js to Further Democratize Machine Learning

August 17, 2017

Spreading the use of machine learning tools is one of the goals of Google’s PAIR (People + AI Research) initiative, which was introduced in early July. Last w Read more…

By John Russell

GlobalFoundries Puts Wind in AMD’s Sails with 12nm FinFET

September 24, 2017

From its annual tech conference last week (Sept. 20), where GlobalFoundries welcomed more than 600 semiconductor professionals (reaching the Santa Clara venue Read more…

By Tiffany Trader

Graphcore Readies Launch of 16nm Colossus-IPU Chip

July 20, 2017

A second $30 million funding round for U.K. AI chip developer Graphcore sets up the company to go to market with its “intelligent processing unit” (IPU) in Read more…

By Tiffany Trader

Leading Solution Providers

Amazon Debuts New AMD-based GPU Instances for Graphics Acceleration

September 12, 2017

Last week Amazon Web Services (AWS) streaming service, AppStream 2.0, introduced a new GPU instance called Graphics Design intended to accelerate graphics. The Read more…

By John Russell

Nvidia Responds to Google TPU Benchmarking

April 10, 2017

Nvidia highlights strengths of its newest GPU silicon in response to Google's report on the performance and energy advantages of its custom tensor processor. Read more…

By Tiffany Trader

EU Funds 20 Million Euro ARM+FPGA Exascale Project

September 7, 2017

At the Barcelona Supercomputer Centre on Wednesday (Sept. 6), 16 partners gathered to launch the EuroEXA project, which invests €20 million over three-and-a-half years into exascale-focused research and development. Led by the Horizon 2020 program, EuroEXA picks up the banner of a triad of partner projects — ExaNeSt, EcoScale and ExaNoDe — building on their work... Read more…

By Tiffany Trader

Cray Moves to Acquire the Seagate ClusterStor Line

July 28, 2017

This week Cray announced that it is picking up Seagate's ClusterStor HPC storage array business for an undisclosed sum. "In short we're effectively transitioning the bulk of the ClusterStor product line to Cray," said CEO Peter Ungaro. Read more…

By Tiffany Trader

Delays, Smoke, Records & Markets – A Candid Conversation with Cray CEO Peter Ungaro

October 5, 2017

Earlier this month, Tom Tabor, publisher of HPCwire and I had a very personal conversation with Cray CEO Peter Ungaro. Cray has been on something of a Cinderell Read more…

By Tiffany Trader & Tom Tabor

Intel Launches Software Tools to Ease FPGA Programming

September 5, 2017

Field Programmable Gate Arrays (FPGAs) have a reputation for being difficult to program, requiring expertise in specialty languages, like Verilog or VHDL. Easin Read more…

By Tiffany Trader

IBM Advances Web-based Quantum Programming

September 5, 2017

IBM Research is pairing its Jupyter-based Data Science Experience notebook environment with its cloud-based quantum computer, IBM Q, in hopes of encouraging a new class of entrepreneurial user to solve intractable problems that even exceed the capabilities of the best AI systems. Read more…

By Alex Woodie

Intel, NERSC and University Partners Launch New Big Data Center

August 17, 2017

A collaboration between the Department of Energy’s National Energy Research Scientific Computing Center (NERSC), Intel and five Intel Parallel Computing Cente Read more…

By Linda Barney

  • arrow
  • Click Here for More Headlines
  • arrow
Share This