For organizations operating high-performance data centers, cooling is a significant challenge. With the demand for compute ever-increasing, the performance required to deliver on these needs also needs to rise. Subsequently, high-end processor TDPs are also climbing with each generation, and we are rapidly approaching the point where cooling these processors with traditional air will become infeasible. Liquid cooling is an obvious answer but brings new challenges, including cost, maintenance issues, and concerns about leakage and safety.
In this paper, we introduce the latest server solutions from Hewlett Packard Enterprise and Intel® and explain how innovative new designs can help organizations deploy dense, high-performance servers while extending the viability of air cooling. We also present a series of benchmarks conducted by HPE that explore the trade-offs between air and liquid cooling related to performance and power efficiency. With this information, data center managers can make informed decisions about how best to evolve their infrastructure based on their unique requirements.
Driven by data, compute requirements are on the rise
According to the latest IDC Global DataSphere forecast, the new data created, captured, replicated, and consumed yearly is expected to double between 2022 and 2026. Fueled by this growth in data, new predictive and analytic techniques, and competitive imperatives such as machine learning and AI, compute requirements are growing at a similar pace.
Today, high-performance systems are critical in applications ranging from manufacturing to financial services, life sciences, and data analytics. Manufacturers rely on HPC for structural design, computational fluid dynamics (CFD), and machine condition monitoring. Life sciences firms require large amounts of computing power for genomic analysis, surveillance, computational chemistry, and image analysis. Organizations are continually looking for ways to deliver additional computing capacity to keep pace with growing demands and stay a step ahead of the competition.
Cooling — a looming data center challenge
Meeting the high levels of performance required by today’s data and compute-intensive applications is driving increased power and cooling requirements. Power, density, cooling, and concerns about sustainability are issues for almost all data center operators. Per-socket TDPs for top-bin CPUs range from 270 watts to 350 watts or more, and next-gen CPUs are likely to be even more power-hungry, reaching 400 to 500 watts per socket.
The challenges are even more significant for GPUs critical to AI and machine learning workloads, where next-gen TDPs are expected to reach 700 watts.
A key issue is that silicon designs in modern processors are increasingly going 3D with components layered on top of one another. This presents new thermal challenges and requires that case temperatures be cooled to ever-lower levels to avoid component overheating and damage. These conflicting trends are illustrated in Figure 1.
Compounding this challenge, customers increasingly demand high-density racks that pack more computing power into a smaller data center footprint. Today, two-thirds of US data centers already have peak power demands of up to 16 to 20 kW per rack. Per rack power consumption is rising quickly, with dense HPC racks already consuming 40-60 kW or more.
Organizations will need to make trade-offs, either investing in new cooling technologies to accommodate next-generation processors and GPUs or settling for less powerful processors and more sparsely populated data center racks. It is no secret that liquid cooling provides better thermal transfer efficiencies than air cooling, so for many, liquid cooling is a logical path forward.
Liquid cooling spans a range of technologies, from rear-door chillers and heat exchangers to directly attached liquid cooling plates to immersion cooling. Liquid cooling can bring clear benefits:
- Improved efficiency — In an analysis conducted by HPE, liquid cooling has been shown to reduce data center power usage effectiveness (PUE) and cooling-related power costs by up to 87%.
- Reduced environmental impact — Reducing power consumption with more efficient cooling can help organizations meet environmental, social, and governance (ESG) goals and reduce their data center’s CO2 equivalent (CO2e) footprint.
- Defer expensive data center upgrades — In space-constrained data centers, liquid cooling can enable denser rack configurations, helping maximize available space.
- Improve reliability and predictability — Liquid cooling can prolong component life by providing stable operating temperatures, avoiding overheating conditions, and improving overall availability.
Despite these benefits, transitioning to liquid cooling is often easier said than done.
A challenging transition
Presently, air cooling is the predominant way of cooling high-performance servers. In a survey of 268 HPC sites from 252 organizations, Intersect360 Research found that 58% of respondents use air cooling exclusively. 42% use liquid cooling in some systems, with the largest share using rear door chillers. Only 23% of commercial organizations operate fully plumbed racks with facility heat exchangers. For most, plumbing is extended to only a subset of their data center racks. In other words, there is still a long way to go before most facilities can fully embrace liquid cooling.
When deciding on a liquid cooling solution, customers must consider several factors: cost, sustainability, maintenance, and ease of management. Among commercial and industrial HPC users, the majority operate multiple clusters. According to the same Intersect360 Research study, 37% of organizations operate ten or more clusters ranging from entry-level HPC systems with 16 nodes or less to supercomputers that consists of more than 512 nodes. Upgrading these systems to liquid cooling poses technical, logistical, and financial challenges. These include:
- The added expense of operating two cooling systems instead of one
- A lack of standardization in cooling systems complicating adoption in multivendor environments
- Concerns about corrosion and safety hazards, such as risks of electrocution and arcing
- Increased operational complexity and the risk of cooling system failures
Organizations must consider multiple factors when introducing liquid cooling, including existing data center space, rack composition, power constraints, cooling capacity, utility costs, and projected growth requirements.
Extending the life of air cooling
Fortunately, new technologies from HPE and Intel provide data center managers with the flexibility to deploy the latest server hardware in air-cooled environments. Organizations can significantly extend the viability of air cooling by taking advantage of the latest HPE Cray XD2000 Systems powered by 4th Gen Intel® Xeon® Scalable processors. By deploying these Intel-based systems, organizations can:
- Help maximize performance while minimizing data center impact
- Avoid costly capital upgrades to data center facilities
- Protect existing investments in software and hardware
As illustrated in Figure 2, organizations can extend the life of air cooling without sacrificing performance, enabling them to gradually manage the transition to liquid cooling based on their own schedule.
HPE Cray XD2000 Systems
With the HPE Cray family, HPE and Intel bring innovation from the world’s most powerful supercomputers, making them available in commercial data center settings. The HPE Cray XD2000 System is a dense, multiserver platform that packs exceptional performance and workload flexibility into a small data center space while delivering the efficiencies of a shared infrastructure.
Each HPE Cray XD2000 2U Chassis supports up to four HPE Cray XD220v 1U Servers powered by the latest 4th Gen Intel Xeon CPUs. Each server can be serviced without impacting the operation of other servers in the same chassis for maximum server availability. The HPE Cray XD2000 delivers up to 4 times the density of a traditional rackmount 2U server in standard racks and provides rear-aisle serviceability access. Up to 20 HPE Cray XD2000 Chassis can be installed in either 42U or 48U HPE standard racks delivering up to 80 2P servers and 160 x 4th Gen Intel Xeon Scalable processors per data rack, subject to power and cooling considerations.
These systems offer a complete, scalable solution for customers requiring high-performance solutions. They feature flexible power and cooling options, including air cooling and direct liquid cooling (DLC), delivering superior performance while reducing TCO.
Innovative engineering enables air cooling up and down the stack
Thanks to the design of the 1U HPE Cray XD220v Server residing in the HPE Cray XD2000 chassis; customers can benefit from the latest high-performance Intel Xeon Scalable processors in air-cooled environments. Customers can deploy fully populated HPE Cray XD2000 Racks using the latest processor technology without worrying about liquid cooling.
What makes this possible is the unique design of the Intel-powered HPE Cray XD220v Server illustrated in Figure 5. The HPE Cray XD220v is 20% wider than the previous generation HPE ProLiant XD200n designed for the HPE Apollo 2000 Chassis. However, this updated design still fits in industry-standard racks.
With larger heatsinks and additional cooling fans, this redesigned server supports the full range of 4th Gen Intel Xeon Scalable processors, from 12 to 56 cores per socket, to be efficiently air-cooled — including the most powerful 350 watt 56-core Intel® Xeon® Platinum 8480+ and Intel Xeon 9480 Max Series processors.
The redesigned system features a special baffling to optimize airflow and 16 fans (40 mm each) per HPE Cray XD2000 Chassis for reliable cooling of dense server configurations in the most demanding HPC environments. Better still, these servers are designed to support air-cooling of future Intel Xeon processors. This translates into exceptional investment protection and flexibility. Organizations can deploy HPE Cray XD2000 Systems with air-cooling today and easily add liquid cooling in the future.
This advantage is unique to the Intel-powered HPE Cray XD220v Server. Servers powered by competing processor technologies with similar TDPs require liquid cooling, adding cost and complexity to server deployments.
Direct liquid cooling (DLC)
While Intel-powered HPE Cray XD2000 Systems are at home in air-cooled environments, for customers with suitably equipped data centers, these systems also support plug-and-play support for DLC. Engineered and supported by HPE, DLC offers clear advantages compared to third-party or immersion-based cooling solutions. These include:
- Efficient thermal transfer for more efficient cooling
- No need for expensive hazardous chemicals or specialized fluids
- Server equipment remains easily accessible for serviceability
- HPE server racks connect directly to facility water supplies without secondary plumbing
By taking advantage of DLC, organizations can substantially reduce data center PUEs and cooling-related costs and improve energy efficiency. Options are available for CPU only or CPU plus memory cooling.
Measuring the impact of air vs. liquid cooling
In March 2023, HPE ran a series of six internal benchmarks to evaluate the latest Intel Xeon processors’ performance and power efficiency in air- and liquid-cooled environments. The tests involved an HPE Cray XD2000 Chassis fully populated with 4 x HPE Cray XD220v compute nodes, each with two Intel Xeon 8480+ processors. The benchmarks included SPEC CPU 2017 (SPECrate 2017_int_base and SPECrate 2017_fp_base), three separate SPEChpc™ 2021 benchmarks, and a High-Performance Linpack (HPL) benchmark.
For each benchmark, results were obtained in both air- and liquid-cooled configurations. Details were tabulated, including performance, power consumed by the HPE Cray XD2000 Chassis while the benchmarks ran, and performance per kW. The results of these tests are summarized in Table 2.
As shown, the latest top-bin Xeon processors in air-cooled configurations delivered performance on par with that achievable with liquid cooling. The variance between the air and liquid-cooled result for all six benchmarks was less than ~3%.
Liquid-cooled configurations delivered slightly better performance because the higher temperatures in the air-cooled configurations led to higher leakage current in silicon. This resulted in a higher power draw, leaving less power available for boosting clock frequencies within the processor’s fixed TDP budget.
Table 2. Comparing performance and power requirements in air- and liquid-cooled HPE Cray XD2000 Systems
The average impact of air vs. liquid cooling across all six benchmarks in Table 2 is shown in Figure 7. On average, the liquid-cooled configurations delivered 1.8% better performance and consumed 14.6% less power. The liquid cooled HPE Cray XD2000 delivered a 19.2% boost in power efficiency measured in terms of throughput per kW.
The results in Figure 7 show that while liquid cooling delivers superior power efficiency, air-cooled servers running the latest Intel Xeon 8480+ processors deliver excellent performance. Air-cooled HPE Cray XD2000 Systems are an excellent solution for customers that are either unable or not yet ready to make the transition to liquid cooling.
Help maximize performance, flexibility, and value
With power requirements for high-end CPUs rising, many organizations are considering liquid cooling to increase density and improve cooling efficiency. However, this transition can be expensive and disruptive, and not all organizations are ready to take this step.
Fortunately, the latest HPE Cray XD2000 powered by 4th Gen Intel Xeon Scalable processors provide customers the flexibility to navigate this transition at their own pace. With Intel-powered HPE Cray XD2000 Systems, customer can:
- Avoid or delay expensive data center upgrades and refits by extending the viability of air cooling
- Experience up to twice the throughput of previous-generation servers
- Deploy dense, energy-efficient servers to help maximize data center space
- Protect existing investments in software and tools
- Gradually adopt energy-efficient direct liquid cooling at their own pace
Learn more at
 Thermal Design Power (TDP) is defined as the theoretical maximum amount of heat generated by a CPU or GPU, usually expressed in watts, that a computer’s cooling system must be designed to dissipate.
 “Worldwide IDC Global DataSphere Forecast, 2022–2026: Enterprise Organizations Driving Most of the Data Growth,” IDC, 2022
 HPE internal estimates.
 TCASE refers to the temperature at the interface between a CPU package and its heatsink.
 How Power Density is Changing in Data Centers and What It Means for Liquid Cooling, JETCOOL Technologies Inc., March 2022
 HPE internal estimates.
 See the HPE whitepaper Addressing sustainability in the financial services industry. PUE refers to power usage efficiency, a measure of how much power is used to power data center servers vs. ancillary requirements such as lighting and air conditioning.
 “HPC Technology Survey 2021: Server Technologies and Configurations,” Intersect360 Research, August 2021
 2022 Data Center Trends: Liquid Cooling Adoption Survey
 Aurora Supercomputer and Argonne National Laboratory | HPE
 HPE Cray XD2000 QuickSpecs
 On average, the liquid-cooled configurations ran 1.8% faster across the six benchmarks. The largest impact was seen with the estimated SPECrate 2017_int_base result, where the liquid-cooled configuration ran 2.9% faster.
 Intel Turbo boost algorithm takes operating temperature into consideration, explaining the slight difference in performance.
 2P Intel Xeon Platinum 8480+ (112C) scored 932 SPECrate 2017_fp_base – http://spec.org/cpu2017/results/res2023q1/cpu2017-20221204-32903.html. 2P Intel Xeon Platinum 8380 (80C) scoring 467 SPECrate 2017_fp_base – http://spec.org/cpu2017/results/res2021q2/cpu2017-20210524-26430.html. 932/467 represents an ~2x performance improvement. SPEC, SPEC CPU, SPECfp, and SPECrate are trademarks of the Standard Performance Evaluation Corporation. All rights reserved. All stated results are as of April 15, 2023. See spec.org for more information.