Cooling for Maximal High-bandwidth Processor Performance and a Bellwether Cluster Deployment

By Rob Farber

April 17, 2020

Heat and the impact of high processor memory bandwidth are key factors that must be considered when procuring a cluster that can realize the full potential of the latest generation of high memory bandwidth processors. System designers must address this dilemma as CPU vendors are now competing over memory bandwidth to achieve leadership application performance. High memory bandwidth is an extraordinary boon for all users as it means higher application performance – so long as more efficient use of the vector floating-point units does not cause the processor to overheat and reduce performance.

Thermal issues affect everyone

Everyone is affected by the heat versus memory bandwidth dilemma, as even small-scale workloads (by current standards) can experience downclocking. This means the every HPC and enterprise deployment is caught in this dilemma, regardless of whether the system is a small organizational cluster, large commercial enterprise datacenter, dedicated AI workhorse cluster, or an academic group or campus-wide datacenter.

HPE notes, “A system’s “high performance” claims may look impressive on paper. However, their real-world performance results can lag very far behind. For instance, as HPC clusters tune groups of cores to their own unique frequencies, temperature, and power regulation; competing groups can overthrow the system’s actual performance.” [i]

AI and tightly coupled HPC applications running at scale are particularly susceptible to performance degradation from heat-related issues such as thermal downclocking. Basically, tightly coupled applications including those that use reduction type operations (essential to AI training, common in most HPC applications) become rate limited by the slowest node(s) when processors run at different rates.

Good system design and system management are key to eliminating heat-related issues – as they will likely affect the performance of every application on the system – which in turn lets applications run faster due to the greater parallelism, floating-point capability and memory bandwidth of the latest modern processors.

Preserving high-bandwidth CPU performance

To understand the impact of heat on high memory-bandwidth system performance, we look at the Magma installation at bellwether LLNL (Lawrence Livermore National Laboratory). Magma employs Intel Xeon Platinum 9200 liquid-cooled SKUs. We focus on these SKUs because they currently provide the highest number of memory channels per CPU, and they provide very high floating-point performance with dual per-core AVX-512 vector units which generate heat when fully utilized. Thus, they provide a glimpse into our high-bandwidth CPU future.

High bandwidth processors are the future

Higher memory bandwidth is critical to performance for many HPC and AI workloads; processor cores that are starved for data simply don’t deliver performance.

Figure 1 (below) shows that the new 9200 12-channel processors provide better performance when compared against other Intel six memory channel per socket processors.

Figure 1: Comparison of the 6 channel and 12 channel Intel processors. The Intel Xeon Platinum 8760 has 24 cores while the Intel Xeon E5-2697 v4 has 18 cores. Thus any performance over the core count ratio of 1.3 can be attributed to the memory system and not just the greater core count. Meanwhile the performance of the Intel Platinum 9242, a 48 core chip, generally show a 2x performance increase over the Intel Xeon Platinum 8760 which indicates the additional cores are not starved for data.  (Image courtesy Intel)

Guideline: Air-cooling vs. Liquid-cooling

While increased memory bandwidth translates to faster application performance, it also creates a dilemma for systems designers as the heat generated when running all the cores and dual floating-point units in a high core-count processor at full speed can cause the chip to slow down (downclock) to stay within its thermal design limits.

Look closely the TDP (Thermal Design Power) ratings to understand when it becomes necessary to consider liquid cooling. As a guideline, think: the more cores, the higher the TDP and the greater the importance of the cooling solution. Also, consider that most compute nodes run dual-socket, so these TDP numbers must be doubled for all 2S computational nodes.

Air-cooling is fine for many HPC and data intensive HPDA workloads that perform many floating-point operations so long as there is sufficient air-flow to keep the processor(s) cool.

Liquid-cooling solves many thermal issues

In contrast, look to liquid cooling when running highly parallel, floating-point intensive vector codes that are cache intensive. DGEMM (double- precision general matrix multiplication) operations are the textbook example because such dense matrix operations can scale to all the processor cores on a chip and keep all the floating-point units active.

Figure 2: A 1U 9200WK liquid-cooled node (Image courtesy Intel)

As always, look to your workloads. If they reflect LINPACK benchmark behavior, then liquid cooling is the best way to keep all parts of the chip within thermal limits to achieve full performance. Otherwise, the processor may have to downclock to stay within its thermal envelope, thus decreasing performance.

Don’t forget to consider the impact of thermal issues when running at scale!

In particular, look at the proximity of vector intensive dense matrix operations relative to tightly coupled distributed operations such as a reduction operator. Hot nodes in air cooled systems are known to slow tightly coupled computations significantly, by a factor of two or more. [ii]  Essentially the distributed computation becomes rate limited by the slowest node(s). The impact of hot nodes can be observed at scale – even when running small scale jobs using on a few hundred nodes.[iii] Liquid cooling eliminates the problem of hot nodes.

LLNL Magma system

Funded through NNSA’s Advanced Simulation & Computing (ASC) program, the Magma supercomputer is a liquid-cooled supercomputer designed to support mission simulations critical to ensuring the safety, security and reliability of the nation’s nuclear weapons in the absence of underground testing. As of November 2019, Magma is ranked as the 69th fastest system in the world according to the Top500 list.

Magma consists of 760 compute nodes, with each node each configured with dual 12-memory channel per socket Xeon Platinum 9242 48-core processors — for a total of 72,960 cores. Its total memory capacity is 293 terabytes, with a total memory bandwidth of 430 terabytes per second. The cluster utilizes Penguin’s Relion XE2142eAP compute servers connected by an Intel Omni-Path interconnect. The system is supported by CoolIT Systems’ complete direct liquid cooling solution. [iv]

The physical reality of floating-point arithmetic

It’s unavoidable—floating-point arithmetic operations generate heat. This is exacerbated by wider vector units (meaning more operations can be performed per second) and the multiple per-core vector units that now co-exist alongside modern CPU cores.

Much software analysis has been performed to reduce the impact of downclocking when running floating-point intensive codes in software, [v] but the easiest solution is to exploit the greater thermal conductivity of liquid to remove heat.

Of course, cost, complexity and the practicality of installing plumbing in the datacenter may become an issue when considering liquid cooling. However, standardized full-service liquid providers along with broad support for a multitude of OEMs and various hyperscaler partners make it easier to implement standardized liquid cooling solutions.

You can see the dilemma caused when more data lets more of the vector units in a processor stay busy, which in turn generates more heat. Liquid has better thermal conductivity than air, so if your system workload tends to be dominated by floating-point calculations (easily determined by application profiling) then liquid cooling might be required.

Summary

The key takeaway is that higher memory bandwidth processors are a very good thing. Don’t starve your computing hardware for data. However, higher work efficiency in the processor does create a cooling dilemma.

Check your workloads to see if air cooling is still an option, or if your users would be better served with a liquid cooled solution. It might prove to take less room, operate more efficiently, and deliver higher performance on each node and when running tightly coupled applications at scale.

Rob Farber is a global technology consultant and author with an extensive background in HPC, AI, and teaching. Rob can be reached at [email protected]

[i] https://assets.ext.hpe.com/is/content/hpedam/documents/a00042000-2999/a00042027/a00042027enw.pdf

[ii] https://permalink.lanl.gov/object/tr?what=info:lanl-repo/lareport/LA-UR-03-3116

[iii] ibid

[iv] https://www.llnl.gov/news/penguinintel-magma-computing-cluster-coming-llnl

[v] https://arxiv.org/pdf/1901.04982.pdf

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

OpenHPC Progress Report – v2.0, More Recipes, Cloud and Arm Support, Says Schulz

October 26, 2020

Launched in late 2015 and transitioned to a Linux Foundation Project in 2016, OpenHPC has marched quietly but steadily forward. Its goal “to provide a reference collection of open-source HPC software components and bes Read more…

By John Russell

NASA Uses Supercomputing to Measure Carbon in the World’s Trees

October 22, 2020

Trees constitute one of the world’s most important carbon sinks, pulling enormous amounts of carbon dioxide from the atmosphere and storing the carbon in their trunks and the surrounding soil. Measuring this carbon sto Read more…

By Oliver Peckham

Nvidia Dominates (Again) Latest MLPerf Inference Results

October 22, 2020

The two-year-old AI benchmarking group MLPerf.org released its second set of inferencing results yesterday and again, as in the most recent MLPerf training results (July 2020), it was almost entirely The Nvidia Show, a p Read more…

By John Russell

With Optane Gaining, Intel Exits NAND Flash

October 21, 2020

In a sign that its 3D XPoint memory technology is gaining traction, Intel Corp. is departing the NAND flash memory and storage market with the sale of its manufacturing base in China to SK Hynix of South Korea. The $9 Read more…

By George Leopold

HPE, AMD and EuroHPC Partner for Pre-Exascale LUMI Supercomputer

October 21, 2020

Not even a week after Nvidia announced that it would be providing hardware for the first four of the eight planned EuroHPC systems, HPE and AMD are announcing another major EuroHPC design win. Finnish supercomputing cent Read more…

By Oliver Peckham

AWS Solution Channel

Live Webinar: AWS & Intel Research Webinar Series – Fast scaling research workloads on the cloud

Date: 27 Oct – 5 Nov

Join us for the AWS and Intel Research Webinar series.

You will learn how we help researchers process complex workloads, quickly analyze massive data pipelines, store petabytes of data, and advance research using transformative technologies. Read more…

Intel® HPC + AI Pavilion

Berlin Institute of Health: Putting HPC to Work for the World

Researchers from the Center for Digital Health at the Berlin Institute of Health (BIH) are using science to understand the pathophysiology of COVID-19, which can help to inform the development of targeted treatments. Read more…

HPE to Build Australia’s Most Powerful Supercomputer for Pawsey

October 20, 2020

The Pawsey Supercomputing Centre in Perth, Western Australia, has had a busy year. Pawsey typically spends much of its time looking to the stars, working with a variety of observatories and astronomers – but when COVID Read more…

By Oliver Peckham

OpenHPC Progress Report – v2.0, More Recipes, Cloud and Arm Support, Says Schulz

October 26, 2020

Launched in late 2015 and transitioned to a Linux Foundation Project in 2016, OpenHPC has marched quietly but steadily forward. Its goal “to provide a referen Read more…

By John Russell

Nvidia Dominates (Again) Latest MLPerf Inference Results

October 22, 2020

The two-year-old AI benchmarking group MLPerf.org released its second set of inferencing results yesterday and again, as in the most recent MLPerf training resu Read more…

By John Russell

HPE, AMD and EuroHPC Partner for Pre-Exascale LUMI Supercomputer

October 21, 2020

Not even a week after Nvidia announced that it would be providing hardware for the first four of the eight planned EuroHPC systems, HPE and AMD are announcing a Read more…

By Oliver Peckham

HPE to Build Australia’s Most Powerful Supercomputer for Pawsey

October 20, 2020

The Pawsey Supercomputing Centre in Perth, Western Australia, has had a busy year. Pawsey typically spends much of its time looking to the stars, working with a Read more…

By Oliver Peckham

DDN-Tintri Showcases Technology Integration with Two New Products

October 20, 2020

DDN, a long-time leader in HPC storage, announced two new products today and provided more detail around its strategy for integrating DDN HPC technologies with Read more…

By John Russell

Is the Nvidia A100 GPU Performance Worth a Hardware Upgrade?

October 16, 2020

Over the last decade, accelerators have seen an increasing rate of adoption in high-performance computing (HPC) platforms, and in the June 2020 Top500 list, eig Read more…

By Hartwig Anzt, Ahmad Abdelfattah and Jack Dongarra

Nvidia and EuroHPC Team for Four Supercomputers, Including Massive ‘Leonardo’ System

October 15, 2020

The EuroHPC Joint Undertaking (JU) serves as Europe’s concerted supercomputing play, currently comprising 32 member states and billions of euros in funding. I Read more…

By Oliver Peckham

ROI: Is HPC Worth It? What Can We Actually Measure?

October 15, 2020

HPC enables innovation and discovery. We all seem to agree on that. Is there a good way to quantify how much that’s worth? Thanks to a sponsored white pape Read more…

By Addison Snell, Intersect360 Research

Supercomputer-Powered Research Uncovers Signs of ‘Bradykinin Storm’ That May Explain COVID-19 Symptoms

July 28, 2020

Doctors and medical researchers have struggled to pinpoint – let alone explain – the deluge of symptoms induced by COVID-19 infections in patients, and what Read more…

By Oliver Peckham

Nvidia Said to Be Close on Arm Deal

August 3, 2020

GPU leader Nvidia Corp. is in talks to buy U.K. chip designer Arm from parent company Softbank, according to several reports over the weekend. If consummated Read more…

By George Leopold

Intel’s 7nm Slip Raises Questions About Ponte Vecchio GPU, Aurora Supercomputer

July 30, 2020

During its second-quarter earnings call, Intel announced a one-year delay of its 7nm process technology, which it says it will create an approximate six-month shift for its CPU product timing relative to prior expectations. The primary issue is a defect mode in the 7nm process that resulted in yield degradation... Read more…

By Tiffany Trader

Google Hires Longtime Intel Exec Bill Magro to Lead HPC Strategy

September 18, 2020

In a sign of the times, another prominent HPCer has made a move to a hyperscaler. Longtime Intel executive Bill Magro joined Google as chief technologist for hi Read more…

By Tiffany Trader

HPE Keeps Cray Brand Promise, Reveals HPE Cray Supercomputing Line

August 4, 2020

The HPC community, ever-affectionate toward Cray and its eponymous founder, can breathe a (virtual) sigh of relief. The Cray brand will live on, encompassing th Read more…

By Tiffany Trader

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Aurora’s Troubles Move Frontier into Pole Exascale Position

October 1, 2020

Intel’s 7nm node delay has raised questions about the status of the Aurora supercomputer that was scheduled to be stood up at Argonne National Laboratory next year. Aurora was in the running to be the United States’ first exascale supercomputer although it was on a contemporaneous timeline with... Read more…

By Tiffany Trader

Is the Nvidia A100 GPU Performance Worth a Hardware Upgrade?

October 16, 2020

Over the last decade, accelerators have seen an increasing rate of adoption in high-performance computing (HPC) platforms, and in the June 2020 Top500 list, eig Read more…

By Hartwig Anzt, Ahmad Abdelfattah and Jack Dongarra

Leading Solution Providers

Contributors

European Commission Declares €8 Billion Investment in Supercomputing

September 18, 2020

Just under two years ago, the European Commission formalized the EuroHPC Joint Undertaking (JU): a concerted HPC effort (comprising 32 participating states at c Read more…

By Oliver Peckham

Nvidia and EuroHPC Team for Four Supercomputers, Including Massive ‘Leonardo’ System

October 15, 2020

The EuroHPC Joint Undertaking (JU) serves as Europe’s concerted supercomputing play, currently comprising 32 member states and billions of euros in funding. I Read more…

By Oliver Peckham

Google Cloud Debuts 16-GPU Ampere A100 Instances

July 7, 2020

On the heels of the Nvidia’s Ampere A100 GPU launch in May, Google Cloud is announcing alpha availability of the A100 “Accelerator Optimized” VM A2 instance family on Google Compute Engine. The instances are powered by the HGX A100 16-GPU platform, which combines two HGX A100 8-GPU baseboards using... Read more…

By Tiffany Trader

Microsoft Azure Adds A100 GPU Instances for ‘Supercomputer-Class AI’ in the Cloud

August 19, 2020

Microsoft Azure continues to infuse its cloud platform with HPC- and AI-directed technologies. Today the cloud services purveyor announced a new virtual machine Read more…

By Tiffany Trader

Oracle Cloud Infrastructure Powers Fugaku’s Storage, Scores IO500 Win

August 28, 2020

In June, RIKEN shook the supercomputing world with its Arm-based, Fujitsu-built juggernaut: Fugaku. The system, which weighs in at 415.5 Linpack petaflops, topp Read more…

By Oliver Peckham

DOD Orders Two AI-Focused Supercomputers from Liqid

August 24, 2020

The U.S. Department of Defense is making a big investment in data analytics and AI computing with the procurement of two HPC systems that will provide the High Read more…

By Tiffany Trader

HPE, AMD and EuroHPC Partner for Pre-Exascale LUMI Supercomputer

October 21, 2020

Not even a week after Nvidia announced that it would be providing hardware for the first four of the eight planned EuroHPC systems, HPE and AMD are announcing a Read more…

By Oliver Peckham

Oracle Cloud Deepens HPC Embrace with Launch of A100 Instances, Plans for Arm, More 

September 22, 2020

Oracle Cloud Infrastructure (OCI) continued its steady ramp-up of HPC capabilities today with a flurry of announcements. Topping the list is general availabilit Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This