At ISC, Sustainable Computing Leaders Discuss HPC’s Energy Crossroads

By Oliver Peckham

May 30, 2023

In the wake of SC22 last year, HPCwire wrote that “the conference’s eyes had shifted to carbon emissions and energy intensity” rather than the historical emphasis on flops-per-watt and power usage effectiveness (PUE). At ISC 2023 in Hamburg, Germany, this week, that trend continued, with nearly every mention of flops-per-watt or the Green500 list followed by mentions of the importance of other aspects of sustainability in HPC.

As part of the official ISC program, HPCwire had the opportunity to host a special event – “HPC’s Energy Crossroads: The Roles of Hardware, Software and Location in Low-Carbon HPC” – that brought together three HPC leaders coming at the question of low-carbon HPC from different angles.

On the panel:

  • Andrew Grimshaw, president of Lancium Compute, which colocates datacenters near plentiful, congested renewable energy in west Texas in order to provide clean, ultra-low-cost computing services. (See previous HPCwire coverage here.)
  • Jen Huffstetler, chief product sustainability officer and VP & GM for future platform strategy and sustainability at Intel, which has been placing an increasing emphasis on the energy and carbon benefits of its hardware and software offerings. (See previous HPCwire coverage here.)
  • Vincent Thibault, co-founder of QScale, which is building a massive, renewably-powered campus in Quebec that will leverage large-scale heat reuse to warm industrial greenhouses. (See previous HPCwire coverage here.)

Over the course of the hour, we posed three questions to the participants, all centered around how popular ideas of sustainable HPC have shifted – and how they may need to shift even more. We’ll cover some of the discussion below, but the full session is available to stream exclusively through ISC’s digital platform.

Andrew Grimshaw (top right); Jen Huffstetler (bottom left); Vincent Thibault (bottom right); and yours truly (top left).

What sustainable HPC needs (and what it doesn’t)

“Over the last few decades, our community has focused on flops per watt and PUE … under the assumption that reducing watts is the way to reduce CO2 emissions,” Grimshaw said. At the same time, he explained, there was an inflection point in the energy world, with incredibly cheap wind and solar energy in some places. But, of course, those renewables have issues: namely, variability and – less visibly – congestion. Grimshaw’s company, Lancium, exploits those issues by building “clean campuses” where renewables are plentiful and congested, allowing HPC customers to run their workloads on a fully renewable, ultra-low-cost grid. Customers can even opt into allowing their workloads to be paused when the resources are less available, further saving on costs and carbon.

“If you think about it, there are many applications in compute – particularly in HPC and HTC – where there are really no humans in the loop, so if the application is paused for an hour or maybe half a day, it’s not the end of the world,” Grimshaw said.

Huffstetler agreed, pointing out an adjacent issue where a high-power server might not be fully utilized by its available workloads – which might be pausable. “If you’re powering a server up and it’s not being utilized,” Huffstetler said, “you can turn the server off and wake it up when necessary. So [we’re] really starting to think about what workloads can handle that, versus this ‘always on, all the time’ mentality that we’ve had.”

“I think we can all agree that not all HPC workloads are the same,” Grimshaw said.

Thibault said that they, too, had noticed an increase in workloads that “are not sensitive to latency,” like large-scale HPC models and AI training.

“If the model runs for 24 hours, if the latency is 60ms, that’s going to be 24 hours and 60 milliseconds instead of 24 hours and 2 milliseconds,” Thibault said. “That means you should locate those workloads where energy is 100% renewable and the climate is as cold as possible.”

“The things that hardware vendors can do for us most effectively are: give us the ability to rapidly boot and unboot the machines,” Grimshaw added, “because we want to transition them essentially between a running state to a non-running state.” Grimshaw pointed out that an idling server can take up 65W, “which doesn’t sound like a lot of power until you multiply it by 10,000,” and that GPUs in particular boot very slowly.

Thibault took a different perspective. “If you have a $40,000 GPU or a $10,000 CPU, our clients want to run them pedal-to-the-metal, 24/7, 365 days a year,” he said, pointing out that for a chip like that, the cost of powering it for its entire lifetime might be just a small fraction of its capital cost.

For his part, Thibault said that the thing QScale needed most from hardware vendors was advances in liquid cooling – an item that had been mentioned by Huffstetler earlier in the talk. Thibault explained that as transistors shrank, voltage leaks increased, “so the power consumption of the chips is increasing kind of exponentially.” And with GPUs moving past the 1kW envelope, the problem was increasingly urgent. “What I believe will be a big, big change is if we can move to warm-water cooling,” Thibault said.

Huffstetler agreed that it was crucial to “look at the holistic energy consumption of the datacenter overall” – something she said Intel did in partnership with QScale and others. “This isn’t only looking at the energy efficiency or the performance-per-watt for the processor or the GPU, it’s also looking at the system-level power – the power required for cooling in the datacenter, which can at times represent up to 40% of the datacenter energy consumption.”

Huffstetler also said that software could be an enormous help to “leave no transistor behind,” adding that Intel had seen energy efficiency improvements of up to 100× through co-optimization.

“Furthermore, hardware players can think about how they provide more granularity into what is actually happening on the platform itself,” Huffstetler said. “So: management software that enables monitoring, analysis and even emissions control by forecasting carbon emissions, future power space and needs, monitoring the device and the datacenter energy consumption.” Huffstetler also mentioned advanced telemetry being added to Intel chips that would allow monitoring and management of system-level processes, enabling things like carbon-aware workloads.

Increasing the appeal of renewable colocation

Typically, we hear trepidation when we hear HPC users talk about moving computing offsite and colocating with renewable energy through providers like Lancium and QScale: many HPC users are used to having direct access to their systems, and for sensitive research (e.g. medicine, security, corporate secrets) there can be serious worries about data sovereignty and security.

Thibault said addressing these concerns was a question they were “facing daily at QScale,” but brought the question back to its fundamentals by talking about how New York banks slowly had their HPC operations pushed out of Manhattan due to ballooning tech needs, then ballooning energy needs. “When you move from a system that consumes 1MW of energy to something that’s consuming 15MW of energy, let me tell you: upgrading the headquarters in Manhattan’s going to be impossible,” Thibault said.

Some organizations, Thibault said, like the Department of Defense, genuinely couldn’t move workloads – but most organizations weren’t the DOD, and he said that the full cost of hosting at QScale was often less than the cost of the energy for those workloads in places like Germany.

“How many of us have friends in Europe who run supercomputing centers who, even if they wanted to build, can’t get the power to do so?” Grimshaw agreed, saying that remote management was increasingly the norm rather than the exception.

Huffstetler added that “the state of security and trust has been evolving” and that colocation is “not only becoming more cost-effective,” but also companies like Intel were providing many new tools (like Project Amber) to help build user trust.

Heat reuse was also a hot topic (get it?) during the panel. As mentioned above, QScale is planning massive heat reuse in partnership with industrial-scale greenhouses – which, in Quebec, need to be substantially heated during the long, harsh winters. “We believe that the cloud is turning into smog, and our objective is to turn the smog into tomatoes,” Thibault quipped.

As seen from the roof of QScale’s first datacenter, industrial-scale greenhouses glow in the distance.

Huffstetler said she was “fully aligned with QScale” on heat reuse, adding that it was “how we’re going to be giving back to local communities wherever these datacenters are built”; Thibault replied that heat reuse was a “key point to get community buy-in.”

“My hope is that we’ll switch from using PUE as a factor of efficiency to ‘ERE,’ which is energy recreation efficiency – so the amount of the heat that is produced by the computer that we can effectively reuse,” Thibault added.

A cohesive path forward?

Near the end of the session, Thibault took the opportunity to present his vision for a more sustainable HPC pipeline.

“The way I hope the world will move forward is that we have the latest and greatest equipment that’s running in facilities like QScale for a period of two, three, four years maybe,” he said. “And after that, the hardware is getting replaced and instead of moving to a dump site, it could be repurposed in a site like Lancium’s where the power cost is going to be basically nil to run it.”

“We know that there are users, principally in the research community, where they have more time, [but] they have less resources,” he continued. “How do we actually give those people those resources?”

The quotes in this feature are excerpts from the ISC special event “HPC’s Energy Crossroads: The Roles of Hardware, Software and Location in Low-Carbon HPC.” The full session is available exclusively through ISC’s digital platform.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Edge-to-Cloud: Exploring an HPC Expedition in Self-Driving Learning

April 25, 2024

The journey begins as Kate Keahey's wandering path unfolds, leading to improbable events. Keahey, Senior Scientist at Argonne National Laboratory and the University of Chicago, leads Chameleon. This innovative projec Read more…

Quantum Internet: Tsinghua Researchers’ New Memory Framework could be Game-Changer

April 25, 2024

Researchers from the Center for Quantum Information (CQI), Tsinghua University, Beijing, have reported successful development and testing of a new programmable quantum memory framework. “This work provides a promising Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Point. The system includes Intel's research chip called Loihi 2, Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Shutterstock 1748437547

Edge-to-Cloud: Exploring an HPC Expedition in Self-Driving Learning

April 25, 2024

The journey begins as Kate Keahey's wandering path unfolds, leading to improbable events. Keahey, Senior Scientist at Argonne National Laboratory and the Uni Read more…

Quantum Internet: Tsinghua Researchers’ New Memory Framework could be Game-Changer

April 25, 2024

Researchers from the Center for Quantum Information (CQI), Tsinghua University, Beijing, have reported successful development and testing of a new programmable Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Poin Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire