Performance of Azure HBv4 and HX VMs for HPC

By Rachel Pruitt Product Marketing Manager, Azure Marketing, HPC + AI

May 29, 2022

Article contributed by Amirreza Rastegari, Jon Shelley, Jithin Jose, Anshul Jain, Jyothi Venkatesh, Joe Greenseid, Fanny Ou, and Evan Burness

Azure has announced new HBv4-series and HX-series virtual machines (VMs) for high performance computing (HPC). This blog provides in-depth technical and performance information about these new VMs.

These VMs are powered by the latest technologies, including:

  • 4th Gen AMD EPYC CPUs (Genoa while in Preview, Genoa-X at General Availability in 1H2023)
  • 800 GB/s of DDR5 memory bandwidth (STREAM TRIAD)
  • 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand, the first on the public cloud
  • 80 Gb/s Azure Accelerated Networking
  • 3.6 TB local NVMe SSD providing 12 GB/s (read) and 7 GB/s (write) of storage bandwidth

HBv4 and HX – VM Size Details & Technical Specifications Overview

HBv4 and HX VMs are available in the following sizes with specifications as shown in Tables 1 and 2, respectively. Just like existing H- VMs, HBv4 and HX-series also include constrained cores VM sizes, enabling customers to choose a size along a spectrum of from maximum-performance-per-VM to maximum-performance-per-core.

HBv4-series VMs

Table 1: Technical specifications of HBv4-series VMs
Click to enlarge

HX-series VMs 

Table 2: Technical specifications of HX-series VMs
Click to enlarge

*Clock frequencies are based on non-AVX workload scenarios and are based on measured frequency delivery for workloads as captured by the Azure HPC team with AMD EPYC 7004-series processors and corresponding system firmware. Experienced clock frequency by a customer is a function of a variety of factors, including the coding and usage of a given application. Frequencies indicated above are not necessarily indicative of final clock frequencies for EPYC 7004-series processors.

For more information see the official documentation for HBv4-series and HX-series VMs.

Microbenchmark Performance

This section focuses on microbenchmarks that characterize performance of the memory subsystem and the InfiniBand network of the HBv4-series and HX series VMs.

STREAM – Memory Performance

Below in Figure 1, we share the results of running We ran the industry standard STREAM benchmark on HBv4/HX VMs. The STREAM benchmark was run using the following:

sudo ./run_stream_dynamic.py -nt 30 -t 176 -oca 0-175 -m 20000 -thp madvis

This returned a result of ~770 GB/s bandwidth for STREAM-TRIAD, which is over 2x greater than that provided from DRAM on HBv3 VMs (~350 GB/s STREAM-TRIAD) as documented here.

Figure 1: STREAM-TRIAD measures 765.52GB/s Memory Bandwidth for HBv4/HX series VMs
Click to enlarge

InfiniBand Perftests – Network Performance

HBv4 and HX VMs are equipped with latest NVIDIA Quantum-2 CX7 InfiniBand (NDR) interconnect. We ran the industry standard IB perftests test across two (2) HBv4-series VMs featuring 400 Gb/s (NDR) InfiniBand links. The IB bandwidth test was run using the following:

Unidirectional bandwidth:

numactl -c 0 ib_send_bw -aF -q 2

Bi-directional bandwidth:

numactl -c 0 ib_send_bw -aF -q 2 -b

Results of these tests are depicted in Figures 2 and 3, below.

Figure 2: Unidirectional InfiniBand bandwidth measuring up to the expected peak bandwidth of 400 Gb/s
Click to enlarge
Figure 3: Bi-directional InfiniBand bandwidth measuring up to the expected peak bandwidth of 800 Gb/s
Click to enlarge

As depicted above, HBv4/HX-series VMs achieve line-rate bandwidth performance (99% of peak) for both unidirectional and bi-directional tests.

Application Performance

This section will focus on characterizing performance of HBv4 and HX VMs on commonly run HPC applications. Performance comparisons are also provided across other HPC VMs offered on Azure, including:

Note: HC-series represents a highly customer relevant comparison as the majority of HPC workloads, market-wide, still run largely or exclusively in on-premises datacenters and on infrastructure that is operated for, on average, between 4-5 years. Thus, it is important to include performance information of HPC technology that aligns to the full age spectrum that customers may be accustomed to using on-premises. Azure HC-series VMs well-represent the older end of that spectrum and also feature highly performant technologies like EDR InfiniBand, 1DPC DDR4 2666 MT/s memory, and Xeon Platinum 1st Gen (“Skylake”) processors that dominated HPC customer investments and configuration choices during that period. As such, application performance comparisons below commonly use HC-series as a representative proxy for an approximately 4-year-old HPC optimized server.

Summary performance improvements with HBv4 and HX VMs compared to our most recent HPC VM offering, HBv3-series VMs are as follows:

  • Up to 2.24x higher performance for CFD workloads
  • Up to 5.3x higher performance for FEA workloads
  • Up to 2.51x higher performance for weather simulation workloads
  • Up to 2x higher performance for molecular dynamics workloads
  • Up to 1.87x higher performance for rendering workloads
  • Up to 2.45x higher performance for chemistry workloads

Computational Fluid Dynamics (CFD)

Ansys Fluent – version 2022 R2

Figure 4: On Ansys Fluent (Aircraft Wing 14M) HBv4/HX VMs provide a greater than 4x performance uplift compared to 4-year-old HPC server (represented by HC-series VMs) and 1.84x higher performance compared to the most recent Azure HPC VM, HBv3-series.
Click to enlarge

The absolute values for the benchmark represented in Figure 4 are shared below:

Table 3: Ansys Fluent (aircraft wing 14M) absolute performance (average solver rating, higher = better).
Click to enlarge

In addition, we share here scale-up performance within a single VM:

Figure 5: On Ansys Fluent (Aircraft Wing 14M) performance increases an additional 38% from the 96-core VM size to the 176 core VM size, illustrating the tradeoff between per-core v. per-VM performance.
Click to enlarge

The absolute values for the benchmark represented in Figure 5 are shared below:

Table 4: Ansys Fluent (aircraft wing 14M) absolute performance (average solver rating, higher = better).
Click to enlarge

Siemens Simcenter STAR-CCM+ – version 17.04.008

Figure 6: On Siemens Simcenter STAR-CCM+(Civil) HBv4/HX VMs show a greater than 5x performance uplift compared to 4 year-old HPC server, and more than 2x compared to HBv3-series.
Click to enlarge

 

The absolute values for the benchmark represented in Figure 6 are shared below:

Table 5: Siemens Simcenter STAR-CCM+(Civil) absolute performance (time elapsed, lower = better).
Click to enlarge

In addition, we share here scale-up performance within a single VM:

Figure 7: On Siemens Simcenter STAR-CCM+ (Civil) time to solution decreases by nearly 40% from the 96-core VM size to the 176 core VM size, illustrating the tradeoff between per-core v. per-VM performance.</em
Click to enlarge

The absolute values for the benchmark represented in Figure 7 are shared below:

Table 6: STAR-CCM+(Civil) absolute performance (time elapsed, lower = better) across HBv4/HX VM sizes.
Click to enlarge

As we can see from the scale-up performance figures for Ansys Fluent and Siemens Simcenter STAR-CCM+, respectively, Constrained Cores HBv4/HX VMs provide significant benefits for customer workloads that may require lower core count due to commercial software licensing constraints. For example, looking at Table 4 for Ansys Fluent, the 96-core HBv4/HX VM size provides 73% of the performance of the 176-core VM size while requiring only 55% as many software licensed cores.

OpenFOAM – version 2012

igure 8: On OpenFOAM (Motorbike 28M) HBv4/HX VMs provide more than a 4x performance uplift compared to a 4 year-old HPC server, and more than 2x compared to the most recent Azure HPC VM, HBv3-series.
Click to enlarge

The absolute values for the benchmark represented in Figure 8 are shared below:

Table 7: OpenFOAM (Motorbike 28M cells) absolute performance (execution time, lower = better).
Click to enlarge

Finite Element Analysis (FEA)

Altair RADIOSS – version 2022.1

Figure 9: On Altair Radioss (T10M) HBv4/HX VMs provide more than a 4x performance uplift compared to 4 year-old HPC server, and more than 2x compared to the most recent Azure HPC VM, HBv3-series.
Click to enlarge

The absolute values for the benchmark represented in Figure 9 are shared below:

Table 8: Altair Radioss (T10M) absolute performance (execution time, lower = better).
Click to enlarge

MSC Nastran – version 2022.3

Note: for NASTRAN, the SOL108 medium benchmark was only tested on a HX-series VM because this VM type was created to support such large memory workloads. The larger memory footprint of HX-series (2x that of HBv4-series) allows the benchmark to run completely out of DRAM, which in turn provides additional performance speedup on top of that provided by the newer 4th Gen EPYC CPUs and faster memory subsystem. As such, it would not be accurate to characterize the performance depicted below as “HBv4/HX” and we have instead marked it simply as “HX.”

Figure 10: On MSC NASTRAN (SOL108 Medium) HX-series VMs provide more than a 8x performance uplift compared to 4 year-old HPC server, and more than 5x compared to the most recent Azure HPC VM, HBv3-series.
Click to enlarge

The absolute values for the benchmark represented in Figure 10 are shared below:

Table 9: MSC NASTRAN absolute performance (Execution time: lower = better).
Click to enlarge

Weather Simulation

WRF – version 4.2.2

Figure 11: On WRF (Conus 2.5km) HBv4/HX VMs provide more than a 8x performance uplift compared to a 4 year-old HPC server, and more than 2x compared to the most recent Azure HPC VM, HBv3-series.
Click to enlarge

The absolute values for the benchmark represented in Figure 11 are shared below:

Table 10: WRF (Conus 2.5km) absolute performance (time/time-step, lower = better).
Click to enlarge

Molecular Dynamics

NAMD – version 2.15

Figure 12: On NAMD (Apoa1 100K atoms) HBv4/HX VMs provide more than a 5x performance uplift compared to 4 year-old HPC server, and more than 2x compared to the most recent Azure HPC VM, HBv3-series.
Click to enlarge

The absolute values for the benchmark represented in Figure 12 are shared below:

Table 11: NAMD (Apoa1 100K atoms) absolute performance (nanoseconds/day, higher = better).
Click to enlarge

Rendering

V-Ray – version 5.02.00

Figure 13: On V-Ray 5, HBv4/HX VMs provide more than a 4x performance uplift compared to 4-year-old HPC server, and 1.86x compared to the most recent Azure HPC VM, HBv3-series.
Click to enlarge

The absolute values for the benchmark represented in Figure 13 are shared below:

Table 12: Chaos V-ray 5 absolute performance (frames rendered, higher = better).
Click to enlarge

Chemistry

CP2K – version 9.1

Figure 14: On CP2K (H2O-DFT-LS), HBv4/HX VMs provide nearly a 5x performance uplift compared to 4-year-old HPC server, and nearly 2.5x compared to the most recent Azure HPC VM, HBv3-series.
Click to enlarge

The absolute values for the benchmark represented in Figure 14 are shared below:

Table 13: CP2K (H2O-DFT-LS) absolute performance (execution time, lower = better).
Click to enlarge

#AzureHPCAI

#MakeAIYourReality

Return to Solution Channel Homepage
Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Pegasus ‘Big Memory’ Supercomputer Now Deployed at the University of Tsukuba

March 25, 2023

In the bevy of news from Nvidia's GPU Technology Conference this week, another new system has come to light: Pegasus, which entered operations at the University of Tsukuba’s Center for Computational Sciences in January Read more…

EuroHPC Summit: Tackling Exascale, Energy, Industry & Sovereignty

March 24, 2023

As the 2023 EuroHPC Summit opened in Gothenburg on Monday, Herbert Zeisel – chair of EuroHPC’s Governing Board – commented that the undertaking had “left its teenage years behind.” Indeed, a sense of general ma Read more…

Is Fortran the Best Programming Language? Asking ChatGPT

March 23, 2023

I recently wrote about my experience with interviewing ChatGPT here. As promised, in this follow-on and conclusion of my interview, I focus on Fortran and other languages. All in good fun. I hope you enjoy the conclusion of my interview. After my programming language questions, I conclude with a few notes... Read more…

Nvidia Doubling Down on China Market in the Face of Tightened US Export Controls

March 23, 2023

Chipmakers are tightlipped on China activities following a U.S. crackdown on hardware exports to the country. But Nvidia remains unfazed, and is doubling down on China being an important country for its computing hardwar Read more…

Intel’s Sapphire Rapids Comes to Australia’s Gadi Supercomputer

March 22, 2023

Until the launch of Pawsey’s Setonix system last year, NCI’s Gadi system – launched in 2020 – was Australia’s most powerful publicly ranked supercomputer. Now, the system has received a major boost powered by I Read more…

AWS Solution Channel

Shutterstock_2206622211

Install optimized software with Spack configs for AWS ParallelCluster

With AWS ParallelCluster, you can choose a computing architecture that best matches your HPC application. But, HPC applications are complex. That means they can be challenging to get working well. Read more…

 

Get the latest on AI innovation at NVIDIA GTC

Join Microsoft at NVIDIA GTC, a free online global technology conference, March 20 – 23 to learn how organizations of any size can power AI innovation with purpose-built cloud infrastructure from Microsoft. Read more…

Nvidia Announces BlueField-3 GA, Oracle Cloud Is Early User

March 21, 2023

Nvidia today announced general availability for its BlueField-3 data processing unit (DPU) along with impressive early deployments including Oracle Cloud Infrastructure. First described in 2021 and now being delivered, B Read more…

Pegasus ‘Big Memory’ Supercomputer Now Deployed at the University of Tsukuba

March 25, 2023

In the bevy of news from Nvidia's GPU Technology Conference this week, another new system has come to light: Pegasus, which entered operations at the University Read more…

EuroHPC Summit: Tackling Exascale, Energy, Industry & Sovereignty

March 24, 2023

As the 2023 EuroHPC Summit opened in Gothenburg on Monday, Herbert Zeisel – chair of EuroHPC’s Governing Board – commented that the undertaking had “lef Read more…

Nvidia Doubling Down on China Market in the Face of Tightened US Export Controls

March 23, 2023

Chipmakers are tightlipped on China activities following a U.S. crackdown on hardware exports to the country. But Nvidia remains unfazed, and is doubling down o Read more…

Nvidia Announces BlueField-3 GA, Oracle Cloud Is Early User

March 21, 2023

Nvidia today announced general availability for its BlueField-3 data processing unit (DPU) along with impressive early deployments including Oracle Cloud Infras Read more…

Nvidia Announces ‘Tokyo-1’ Generative AI Supercomputer Amid Gradual H100 Rollout

March 21, 2023

Nvidia’s Hopper-generation H100 GPU is continuing its slow march toward “current-generation.” After Nvidia announced that the H100 was in “full producti Read more…

DGX Cloud Is Here: Nvidia’s AI Factory Services Start at $37,000

March 21, 2023

If you are a die-hard Nvidia loyalist, be ready to pay a fortune to use its AI factories in the cloud. Renting the GPU company's DGX Cloud, which is an all-inclusive AI supercomputer in the cloud, starts at $36,999 per instance for a month. The rental includes access to a cloud computer with eight Nvidia H100 or A100 GPUs and 640GB... Read more…

Quantum Bits: IBM-Cleveland Clinic Launch; D-Wave Adds Solver; DOE/AWS Offer QICK

March 20, 2023

IBM today launched the first installation of an IBM Quantum System One at a collaborator site in the U.S. – this one is at the Cleveland Clinic where IBM’s Read more…

SCA23: Pawsey’s Mark Stickells on Sustainable Australian Supercomputing

March 17, 2023

“While the need for supercomputing is great, we have, in my view, reached a tipping point,” said Mark Stickells, executive director of Australia’s Pawsey Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

SC22 Booth Videos

AMD @ SC22
Altair @ SC22
AWS @ SC22
Ayar Labs @ SC22
CoolIT @ SC22
Cornelis Networks @ SC22
DDN @ SC22
Dell Technologies @ SC22
HPE @ SC22
Intel @ SC22
Intelligent Light @ SC22
Lancium @ SC22
Lenovo @ SC22
Microsoft and NVIDIA @ SC22
One Stop Systems @ SC22
Penguin Solutions @ SC22
QCT @ SC22
Supermicro @ SC22
Tuxera @ SC22
Tyan Computer @ SC22
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire