MLPerf Issues HPC 1.0 Benchmark Results Featuring Impressive Systems (Think Fugaku)

By John Russell

November 19, 2021

Earlier this week MLCommons issued results from its latest MLPerf HPC training benchmarking exercise. Unlike other MLPerf benchmarks, which mostly measure the training and inference performance of systems that are available for purchase or use in the cloud, MLPerf HPC has showcased performances of large, complicated, research-oriented systems – the top of the food chain, if you will. Fugaku – the reigning Top500 champ – was a top performer.

This is just the second running of the MLPerf HPC training benchmark which debuted last year at SC20. While the number of participants remains small (8 this year versus 6 last year) they are impressive including systems such as Piz Daint (CSCS), Theta ANL), Perlmutter (NERSC), JUWELS Booster (Jülich SC), HAL cluster (NCSA), Selene (Nvidia) and Frontera (TACC).

MLCommons has continued improving the HPC benchmark. The latest version (v1.0) adds a third HPC application – OpenCatalyst – and separates out strong-scaling and weak-scaling Here’s an excerpt from the MLPerf website on the changes:

  • “MLPerf HPC v1.0 is a significant update and includes a new benchmark as well as a new performance metric. The OpenCatalyst benchmark predicts the quantum mechanical properties of catalyst systems to discover and evaluate new catalyst materials for energy storage applications. This benchmark uses the OC20 dataset from the Open Catalyst Project, the largest and most diverse publicly available dataset of its kind, with the task of predicting energy and the per-atom forces. The reference model for OpenCatalyst is DimeNet++, a graph neural network (GNN) designed for atomic systems that can model the interactions between pairs of atoms as well as angular relations between triplets of atoms.
  • “MLPerf HPC v1.0 also features a novel weak-scaling performance metric that is designed to measure the aggregate machine learning capabilities for leading supercomputers. Most large supercomputers run multiple jobs in parallel, for example training multiple ML models. The new benchmark trains multiple instances of a model across a supercomputer to capture the impact on shared resources such as the storage system and interconnect. The benchmark reports both the time-to-train for all the model instances and the aggregate throughput of an HPC system, i.e., number of models trained per minute. Using the new weak-scaling metric, the MLPerf HPC benchmarks can measure the ML capabilities for supercomputers of any size, from just a handful of nodes to the world’s largest systems.”

(List of participating organizations: Argonne National Laboratory, the Swiss National Supercomputing Centre, Fujitsu and Japan’s Institute of Physical and Chemical Research (RIKEN), Helmholtz AI (a collaboration of the Jülich Supercomputing Centre at Forschungszentrum and the Steinbuch Centre for Computing at the Karlsruhe Institute of Technology), Lawrence Berkeley National Laboratory, the National Center for Supercomputing Applications, NVIDIA, and the Texas Advanced Computing Center.)

In reporting on other MLPerf benchmarks, much of the emphasis is on accelerator/CPU combinations and comparing their performances. To that extent, MLPerf has largely been a showcase for Nvidia GPU advances (software and hardware) which, frankly, are impressive. NVIDIA GPUs again showed strong performances and the company has touted that in a blog (MLPerf HPC Benchmarks Show the Power of HPC+AI). For commercially available, GPU-accelerated systems, Nvidia has enjoyed steady dominance.

The MLPerf HPC benchmark is in many ways more interesting if perhaps less useful as a purchase-guiding (and marketing) tool. The systems featured are complicated and powerful and each possesses distinct advantages. Fugaku, for example, doesn’t rely on separate GPU accelerators.

Fujitsu issued a press release saying, Fugaku took,  “First place amongst all the systems for the CosmoFlow training application benchmark category, demonstrating performance at rates approximately 1.77 times faster than other systems. This result revealed that Fugaku has the world’s highest level of performance in the field of large-scale scientific and technological calculations using machine learning.” It is a wonderful machine.

Best to dig into the full results to get a fuller picture. That said, included in the results report were statements from participating systems organizations on their approaches to running the benchmark. These are, on balance, quite substantive and informative. Here’s are small portions of two of the statements. All of those submitted are included at the end of the article and they are well worth reading:

  • ANL – “These benchmarks were run on 16 NVIDIA DGX3 nodes (128 A100 GPUs) of Theta. We made minor modifications to the DeepCam and OpenCatalyst submissions in order to correctly initialize MPI communication for distributed training. After confirming that all of the models were working as expected, we ran preliminary tests to verify that our workflows would be compliant with the MLPerf HPC requirements (logging, system information, etc.). The available documentation helped us understand the impact of the various hyperparameters on the model training performance. We started with the default parameters and tuned the hyperparameters to reduce the overall training cost. We employed data staging on the node-local storage NVMe to accelerate the I/O.”
  • Fugaku – “For weak scaling, since the job scheduler cannot launch a large number of instances immediately, inter-instance synchronization across jobs was added to align start times among instances. Moreover, to avoid excessive access to the FEFS from all instances, the dataset is staged to node local memory using a MPI program that only the first instance reads the dataset from FEFS and broadcasts it to the other instances. We actually ran 648 instances (82,944 nodes) but submitted 637 instance results of them. The pruned instances consist of 1 instance that hung during training, 6 instances that used the same seed value as others unintentionally, and 4 instances that took particularly long time.”

The latest MLPerf benchmark results provide an interesting look at side-by-side performances on these impressive systems.

SUBMITTED STATEMENTS

Argonne National Laboratory (ANL)

The Argonne Leadership Computing Facility (ALCF) [1], a U.S. Department of Energy (DOE) Office of Science User Facility located at Argonne National Laboratory, enables breakthroughs in science and engineering by providing supercomputing resources and expertise to the research community. The Theta supercomputer [2] is operated and maintained by the ALCF. ThetaGPU, a 3.9 Petaflops system has 24 Nvidia DGX3 A100 nodes with eight (8) NVIDIA A100 Tensor Core GPUs and two (2) AMD Rome CPUs per node that provide 320 gigabytes (7680 GB aggregately) of GPU memory for training artificial intelligence (AI) datasets, while also enabling GPU-specific and enhanced high-performance computing (HPC) applications for modeling and simulation.

For the 2021 MLPerf HPC v1.0, we submitted strong scaling results for DeepCam and OpenCatalyst training benchmarks in the closed division. These benchmarks were run on 16 NVIDIA DGX3 nodes (128 A100 GPUs) of Theta. We made minor modifications to the DeepCam and OpenCatalyst submissions in order to correctly initialize MPI communication for distributed training. After confirming that all of the models were working as expected, we ran preliminary tests to verify that our workflows would be compliant with the MLPerf HPC requirements (logging, system information, etc.). The available documentation helped us understand the impact of the various hyperparameters on the model training performance. We started with the default parameters and tuned the hyperparameters to reduce the overall training cost. We employed data staging on the node-local storage NVMe to accelerate the I/O.

The insights gained from these runs will help us improve our efforts to optimize large scientific machine learning applications on the upcoming supercomputers, Polaris and Aurora, and thereby glean insights faster.

[1] https://www.alcf.anl.gov/

[2] https://www.alcf.anl.gov/alcf-resources/theta

Swiss National Supercomputing Centre

The Swiss National Supercomputing Centre (CSCS) participated in MLPerf HPC v1.0 with the Open Catalyst and DeepCAM benchmarks on our flagship system, Piz Daint.

Our focus in this round was on recent trends in scientific deep learning within the atmospheric modelling and atomistic simulation communities, and these two benchmarks represent well the growing usage of data from physical simulations for large scale deep learning in these domains.

Managing the data processing requirements of large scale climate simulations is a challenge of the EXCLAIM program. Segmentation tasks such as the one solved by DeepCAM arise naturally when compressing the output of global weather simulations with regional detail resolution for storage.

In our submissions to DeepCAM, we improved the code for higher performance on our distributed file system. In particular, on 128 GPUs, where the dataset does not fit in RAM, prefetching the data before using it on the GPU allowed us to guarantee 98% GPU utilization on average. To sustain this performance up to 1,024 GPUs, we added a caching mechanism in PyTorch that makes effective use of the much larger RAM capacity. Furthermore, we found that performance at this scale is highly sensitive to tuning communication – in particular a tree-based algorithm and sufficient GPU resources in NCCL – which is consistent with last year’s finding on fine-grained communication in CosmoFlow.

The purpose of OpenCatalyst, which we ran on 256 GPUs, is highly aligned with our PASC project “Machine learning for materials and molecules: toward the exascale”, which investigates methods for high fidelity molecular dynamics simulations with potentials that accurately reproduce expensive electronic structure calculations using ML techniques.

Together with last year’s results on CosmoFlow, these submissions complete the coverage of the full MLPerf HPC benchmark suite on Piz Daint and will serve as a baseline for the newly upcoming system, Alps.

Fujitsu + RIKEN

RIKEN and Fujitsu jointly developed the world’s top-level supercomputer—the supercomputer Fugaku—capable of realizing high effective performance for a broad range of application software, and started its official operation on March 9, 2021 [1]. RIKEN and Fujitsu submitted CosmoFlow results to closed division using 512 nodes for strong scaling and 81,536 nodes (=128 nodes×637 model instances) for weak scaling.

For both weak and strong scaling, LLIO (Lightweight Layered IO Accelerator) was used to cache library and program files from FEFS (Fujitsu Exabyte File System) storage. We developed customized TensorFlow and optimized oneAPI Deep Neural Network Library (oneDNN) as the backend [2]. The oneDNN uses JIT assembler Xbyak_aarch64 to exploit the performance of A64FX.

For weak scaling, since the job scheduler cannot launch a large number of instances immediately, inter-instance synchronization across jobs was added to align start times among instances. Moreover, to avoid excessive access to the FEFS from all instances, the dataset is staged to node local memory using a MPI program that only the first instance reads the dataset from FEFS and broadcasts it to the other instances. We actually ran 648 instances (82,944 nodes) but submitted 637 instance results of them. The pruned instances consist of 1 instance that hung during training, 6 instances that used the same seed value as others unintentionally, and 4 instances that took particularly long time.

For strong scaling, we used reformatted uncompressed TFRecord dataset to improve training throughput. The reference dataset is compressed with gzip and needs decompression at each training step. Since the number of nodes increases from weak scaling and the amount of staging data per node decreases, the uncompressed dataset could be used.

In this round, the performance of the Fugaku half-system with more than 80,000 nodes can be evaluated using the new weak scaling metric.

[1] https://www.fujitsu.com/global/about/innovation/fugaku/

[2] https://github.com/fujitsu

Helmholtz AI (JSC – FZJ, SCC – KIT)

In the Helmholtz AI platform, Germany’s largest research centers have teamed up to bring cutting-edge AI methods to scientists from other fields. With this in mind, researchers and Helmholtz AI members from the Jülich Supercomputing Centre (JSC) at Forschungszentrum Jülich and the Steinbuch Centre for Computing (SCC) at Karlsruhe Institute of Technology have jointly submitted their results for the MLPerf™ HPC benchmarking suite. We successfully executed large-scale training runs of the CosmoFlow and DeepCAM applications with up to 3072 NVIDIA A100 GPUs on the JUWELS supercomputer at JSC and the HoreKa supercomputer at SCC.

While striving for performance, it is vital to balance the environmental costs of such large-scale measurements. With JUWELS and HoreKa ranking among the top 15 on the worldwide Green500 list of energy-efficient supercomputers, the high performance computing resources in Helmholtz AI are both computationally and energy efficient. Not only have we used these benchmarks to better understand our current systems in preparation for improved future systems but also for testing tools to inform users of the carbon footprint of each individual computing job.

An important step to maximizing the performance was using an optimized HDF5 file format for the dataset. With this, it was possible to get the maximum data loading performance. This was a result of the Helmholtz AI team jointly analyzing the execution performance and implementing a solution that works optimally on both supercomputers. The joint effort to submit competitive results for the MLPerf™ HPC benchmarking suite has been another important step towards democratizing AI for all Helmholtz researchers.

Lawrence Berkeley National Lab (LBNL)

The MLPerf HPC v1.0 benchmarks represent the growing scientific AI computational workload at DOE HPC facilities like NERSC. The applications push on HPC system capabilities for compute, storage, and network, making the benchmark suite a valuable tool for assessing and optimizing system performance.

For LBNL, this round featured the debut of Perlmutter Phase 1 at NERSC. Perlmutter Phase 1 has demonstrated itself as a world class AI supercomputer, with leading strong-scaling performance on OpenCatalyst, DeepCAM and CosmoFlow. Additionally, we demonstrated excellent scalability, taking advantage of 5,120 GPUs for the weak-scaling benchmark and metric.

Perlmutter, an HPE Cray EX supercomputer, is designed to meet the emerging simulation, data analytics, and AI requirements of the scientific community. The Phase I system, which debuted at the #5 Top500 spot in June 2021, features more than 6,000 NVIDIA A100 GPUs, an all-flash Lustre filesystem, and a Cray Slingshot network.

LBNL submitted results for all three benchmarks on Perlmutter Phase 1 in the closed division:

  • CosmoFlow and DeepCAM strong-scaling results on 2,048 GPUs
  • CosmoFlow and DeepCAM weak-scaling results on 5,120 GPUs, both run with 10 concurrent model-training instances of 512 GPUs each
  • An OpenCatalyst strong-scaling result on 512 GPUs.

The submissions utilized various features and optimizations, including:

  • DALI for accelerating the data pipelines in CosmoFlow and DeepCAM
  • Fast data staging from all-flash shared filesystem into on-node DRAM
  • PyTorch JIT compilation for DeepCAM and OpenCatalyst
  • CUDA graphs for CosmoFlow and DeepCAM
  • Load-balancing variable-sized samples in OpenCatalyst
  • Shifter containers for all benchmarks based on NGC PyTorch and MXNet releases.

National Center for Supercomputing Applications (NCSA)

The National Center for Supercomputing Applications (NCSA) is a hub of interdisciplinary research and digital scholarship where University of Illinois faculty, staff, students, and collaborators from around the world work together to address research grand challenges for the benefits of science and society [1].

This year, the NCSA team participated in MLPerf HPC v1.0 with the DeepCAM and Open Catalyst benchmarks carried out on the Hardware-Accelerated Learning (HAL) system [2]. This system is composed of 16 IBM AC922 8335-GTH compute nodes, each containing two 20-core IBM POWER9 CPUs, 256 GB memory, four NVIDIA V100 GPUs with NVLink 2.0, and an EDR InfiniBand adapters to provide high-performance communication. The two storage nodes provide 224 TB of usable NVMe SSD-based storage capable of a peak cluster-aggregate bandwidth of over 90GB/s.

The experience we obtained from this year’s submission has already benefited multiple research projects, especially for their software environment configuration and optimization. Moreover, the insights we learned from this year will also contribute to the design of our future ML/DL systems.

[1] http://www.ncsa.illinois.edu/

[2] https://wiki.ncsa.illinois.edu/display/ISL20/HAL+cluster

NVIDIA

Cutting edge HPC is blending simulation with AI to reach new levels of performance and accuracy.  Recent advances in molecular dynamics, astronomy and climate simulation all took this approach to making scientific breakthroughs, a trend driving the adoption of exascale AI.

The new MLPerf HPC benchmarks help users compare HPC systems using this style of computing. NVIDIA-powered systems led on four of five benchmarks in the rankings.

Compared to the best v0.7 results, NVIDIA’s supercomputer Selene achieved a 5x better result for cosmoflow at 2x the scale and nearly 7x for deepcam at 4x the scale.  LBNL/Perlmutter lead the new opencatalyst benchmark using 2048 NVIDIA A100s. In the weak-scaling category, Selene lead deepcam at 16 nodes per instance and 256 simultaneous instances.

The MLPerf HPC benchmarks are meant to model the types of workloads HPC centers may perform:

  • Cosmoflow – physical quantity estimation from cosmological image data
  • Deepcam – identification of hurricanes and atmospheric rivers in climate simulation data
  • Opencatalyst (new) – predict energies of molecular configurations based on graph connectivity

Optimizations used to achieve MLPerf HPC v1.0 results:

  • DALI accelerates data processing
  • Use of CUDA graphs reduces small-batch latency
  • SHARP accelerates communication
  • Async DRAM prefetching removes IO from critical path
  • New fused kernels developed

The NVIDIA ecosystem submitted with commercially available platforms using three generations of NVIDIA GPUs (P100, V100, and A100). Supercomputing centers Julich, Argonne National Lab, Lawrence Berkeley National Lab, Swiss National Supercomputing Centre, NCSA, and the Texas Advanced Computing Center made direct submissions, accounting for seven of the eight participants.

The NVIDIA platform excels in both performance and usability, offering a single leadership platform from data center to edge to cloud.  NVIDIA HPC and AI accelerates 2400+ applications today.

All software used for NVIDIA submissions is available from the MLPerf repository, though node and cluster specific tuning is required to get the most from the benchmarks. We constantly add these cutting-edge MLPerf improvements into our Deep learning framework containers available on NGC, our software hub for GPU applications.

Texas Advanced Computing Center (TACC)

Texas Advanced Computing Center (TACC) aims to facilitate novel discoveries that advance science and society through advanced computing technologies. TACC designs and operates some of the world’s most powerful supercomputers, including Frontera, Longhorn, and Stampede2. The Longhorn system consists of 108 hybrid CPU/GPU compute nodes powered by IBM POWER9 processors and NVIDIA Tesla V100 GPUs. Each node provides 40 cores on two sockets, four GPUs, 256 GB of RAM, 900 GB of local storage, and interconnects through Mellanox EDR InfiniBand with other nodes. Longhorn’s multiple GPUs per node facilitate a powerful tool for the research carried out in astronomy and cosmology, fluid particulate, material research, biophysics, and deep learning domains. In 2020, COVID-19 research performed on the Longhorn system won the Association for Computing Machinery Gordon Bell Special Prize in High Performance Computing.

MLCommons HPC applications, e.g., CosmoFlow and Deepcam, provide an invaluable opportunity to understand the infrastructure requirements of next-generation Machine Learning and  Deep Learning applications. This year, TACC participated in MLCommons HPC v1.0 benchmarking by submitting the performance of Cosmoflow and Deepcam applications at 32 nodes (128  Tesla V100 GPUs) of its Longhorn system [1]. The lessons learned from these submissions will help envision the architecture of forthcoming TACC systems that will assist its rapidly growing AI users in solving intractable problems deterministically.

[1] https://www.tacc.utexas.edu/systems/longhorn

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

SC21 Was Unlike Any Other — Was That a Good Thing?

December 3, 2021

For a long time, the promised in-person SC21 seemed like an impossible fever dream, the assurances of a prominent physical component persisting across years of canceled conferences, including two virtual ISCs and the virtual SC20. With the advent of the Delta variant, Covid surges in St. Louis and contention over vaccine requirements... Read more…

The Green500’s Crystal Anniversary Sees MN-3 Crystallize Its Winning Streak

December 2, 2021

“This is the 30th Green500,” said Wu Feng, custodian of the Green500 list, at the list’s SC21 birds-of-a-feather session. “You could say 15 years of Green500, which makes it, I guess, the crystal anniversary.” Indeed, HPCwire marked the 15th anniversary of the Green500 – which ranks supercomputers by flops-per-watt, rather than just by flops – earlier this year with... Read more…

AWS Arm-based Graviton3 Instances Now in Preview

December 1, 2021

Three years after unveiling the first generation of its AWS Graviton chip-powered instances in 2018, Amazon Web Services announced that the third generation of the processors – the AWS Graviton3 – will power all-new Amazon Elastic Compute 2 (EC2) C7g instances that are now available in preview. Debuting at the AWS re:Invent 2021... Read more…

Nvidia Dominates Latest MLPerf Results but Competitors Start Speaking Up

December 1, 2021

MLCommons today released its fifth round of MLPerf training benchmark results with Nvidia GPUs again dominating. That said, a few other AI accelerator companies participated and, one of them, Graphcore, even held a separ Read more…

HPC Career Notes: December 2021 Edition

December 1, 2021

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it’s a promotion, new company hire, or even an accolade, we’ Read more…

AWS Solution Channel

Running a 3.2M vCPU HPC Workload on AWS with YellowDog

Historically, advances in fields such as meteorology, healthcare, and engineering, were achieved through large investments in on-premises computing infrastructure. Upfront capital investment and operational complexity have been the accepted norm of large-scale HPC research. Read more…

At SC21, Experts Ask: Can Fast HPC Be Green?

November 30, 2021

HPC is entering a new era: exascale is (somewhat) officially here, but Moore’s law is ending. Power consumption and other sustainability concerns loom over the enormous systems and chips of this new epoch, for both cost and compliance reasons. Reconciling the need to continue the supercomputer scale-up while reducing HPC’s environmental impacts... Read more…

SC21 Was Unlike Any Other — Was That a Good Thing?

December 3, 2021

For a long time, the promised in-person SC21 seemed like an impossible fever dream, the assurances of a prominent physical component persisting across years of canceled conferences, including two virtual ISCs and the virtual SC20. With the advent of the Delta variant, Covid surges in St. Louis and contention over vaccine requirements... Read more…

The Green500’s Crystal Anniversary Sees MN-3 Crystallize Its Winning Streak

December 2, 2021

“This is the 30th Green500,” said Wu Feng, custodian of the Green500 list, at the list’s SC21 birds-of-a-feather session. “You could say 15 years of Green500, which makes it, I guess, the crystal anniversary.” Indeed, HPCwire marked the 15th anniversary of the Green500 – which ranks supercomputers by flops-per-watt, rather than just by flops – earlier this year with... Read more…

Nvidia Dominates Latest MLPerf Results but Competitors Start Speaking Up

December 1, 2021

MLCommons today released its fifth round of MLPerf training benchmark results with Nvidia GPUs again dominating. That said, a few other AI accelerator companies Read more…

At SC21, Experts Ask: Can Fast HPC Be Green?

November 30, 2021

HPC is entering a new era: exascale is (somewhat) officially here, but Moore’s law is ending. Power consumption and other sustainability concerns loom over the enormous systems and chips of this new epoch, for both cost and compliance reasons. Reconciling the need to continue the supercomputer scale-up while reducing HPC’s environmental impacts... Read more…

Raja Koduri and Satoshi Matsuoka Discuss the Future of HPC at SC21

November 29, 2021

HPCwire's Managing Editor sits down with Intel's Raja Koduri and Riken's Satoshi Matsuoka in St. Louis for an off-the-cuff conversation about their SC21 experience, what comes after exascale and why they are collaborating. Koduri, senior vice president and general manager of Intel's accelerated computing systems and graphics (AXG) group, leads the team... Read more…

Jack Dongarra on SC21, the Top500 and His Retirement Plans

November 29, 2021

HPCwire's Managing Editor sits down with Jack Dongarra, Top500 co-founder and Distinguished Professor at the University of Tennessee, during SC21 in St. Louis to discuss the 2021 Top500 list, the outlook for global exascale computing, and what exactly is going on in that Viking helmet photo. Read more…

SC21: Larry Smarr on The Rise of Supernetwork Data Intensive Computing

November 26, 2021

Larry Smarr, founding director of Calit2 (now Distinguished Professor Emeritus at the University of California San Diego) and the first director of NCSA, is one of the seminal figures in the U.S. supercomputing community. What began as a personal drive, shared by others, to spur the creation of supercomputers in the U.S. for scientific use, later expanded into a... Read more…

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed

November 24, 2021

Details about two previously rumored Chinese exascale systems came to light during last week’s SC21 proceedings. Asked about these systems during the Top500 media briefing on Monday, Nov. 15, list author and co-founder Jack Dongarra indicated he was aware of some very impressive results, but withheld comment when asked directly if he had... Read more…

IonQ Is First Quantum Startup to Go Public; Will It be First to Deliver Profits?

November 3, 2021

On October 1 of this year, IonQ became the first pure-play quantum computing start-up to go public. At this writing, the stock (NYSE: IONQ) was around $15 and its market capitalization was roughly $2.89 billion. Co-founder and chief scientist Chris Monroe says it was fun to have a few of the company’s roughly 100 employees travel to New York to ring the opening bell of the New York Stock... Read more…

Enter Dojo: Tesla Reveals Design for Modular Supercomputer & D1 Chip

August 20, 2021

Two months ago, Tesla revealed a massive GPU cluster that it said was “roughly the number five supercomputer in the world,” and which was just a precursor to Tesla’s real supercomputing moonshot: the long-rumored, little-detailed Dojo system. Read more…

Esperanto, Silicon in Hand, Champions the Efficiency of Its 1,092-Core RISC-V Chip

August 27, 2021

Esperanto Technologies made waves last December when it announced ET-SoC-1, a new RISC-V-based chip aimed at machine learning that packed nearly 1,100 cores onto a package small enough to fit six times over on a single PCIe card. Now, Esperanto is back, silicon in-hand and taking aim... Read more…

US Closes in on Exascale: Frontier Installation Is Underway

September 29, 2021

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, held by Zoom this week (Sept. 29-30), it was revealed that the Frontier supercomputer is currently being installed at Oak Ridge National Laboratory in Oak Ridge, Tenn. The staff at the Oak Ridge Leadership... Read more…

AMD Launches Milan-X CPU with 3D V-Cache and Multichip Instinct MI200 GPU

November 8, 2021

At a virtual event this morning, AMD CEO Lisa Su unveiled the company’s latest and much-anticipated server products: the new Milan-X CPU, which leverages AMD’s new 3D V-Cache technology; and its new Instinct MI200 GPU, which provides up to 220 compute units across two Infinity Fabric-connected dies, delivering an astounding 47.9 peak double-precision teraflops. “We're in a high-performance computing megacycle, driven by the growing need to deploy additional compute performance... Read more…

Intel Reorgs HPC Group, Creates Two ‘Super Compute’ Groups

October 15, 2021

Following on changes made in June that moved Intel’s HPC unit out of the Data Platform Group and into the newly created Accelerated Computing Systems and Graphics (AXG) business unit, led by Raja Koduri, Intel is making further updates to the HPC group and announcing... Read more…

Intel Completes LLVM Adoption; Will End Updates to Classic C/C++ Compilers in Future

August 10, 2021

Intel reported in a blog this week that its adoption of the open source LLVM architecture for Intel’s C/C++ compiler is complete. The transition is part of In Read more…

Killer Instinct: AMD’s Multi-Chip MI200 GPU Readies for a Major Global Debut

October 21, 2021

AMD’s next-generation supercomputer GPU is on its way – and by all appearances, it’s about to make a name for itself. The AMD Radeon Instinct MI200 GPU (a successor to the MI100) will, over the next year, begin to power three massive systems on three continents: the United States’ exascale Frontier system; the European Union’s pre-exascale LUMI system; and Australia’s petascale Setonix system. Read more…

Leading Solution Providers

Contributors

Hot Chips: Here Come the DPUs and IPUs from Arm, Nvidia and Intel

August 25, 2021

The emergence of data processing units (DPU) and infrastructure processing units (IPU) as potentially important pieces in cloud and datacenter architectures was Read more…

D-Wave Embraces Gate-Based Quantum Computing; Charts Path Forward

October 21, 2021

Earlier this month D-Wave Systems, the quantum computing pioneer that has long championed quantum annealing-based quantum computing (and sometimes taken heat fo Read more…

HPE Wins $2B GreenLake HPC-as-a-Service Deal with NSA

September 1, 2021

In the heated, oft-contentious, government IT space, HPE has won a massive $2 billion contract to provide HPC and AI services to the United States’ National Security Agency (NSA). Following on the heels of the now-canceled $10 billion JEDI contract (reissued as JWCC) and a $10 billion... Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-ap Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer... Read more…

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed

November 24, 2021

Details about two previously rumored Chinese exascale systems came to light during last week’s SC21 proceedings. Asked about these systems during the Top500 media briefing on Monday, Nov. 15, list author and co-founder Jack Dongarra indicated he was aware of some very impressive results, but withheld comment when asked directly if he had... Read more…

2021 Gordon Bell Prize Goes to Exascale-Powered Quantum Supremacy Challenge

November 18, 2021

Today at the hybrid virtual/in-person SC21 conference, the organizers announced the winners of the 2021 ACM Gordon Bell Prize: a team of Chinese researchers leveraging the new exascale Sunway system to simulate quantum circuits. The Gordon Bell Prize, which comes with an award of $10,000 courtesy of HPC pioneer Gordon Bell, is awarded annually... Read more…

Quantum Computer Market Headed to $830M in 2024

September 13, 2021

What is one to make of the quantum computing market? Energized (lots of funding) but still chaotic and advancing in unpredictable ways (e.g. competing qubit tec Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire