MLPerf Issues HPC 1.0 Benchmark Results Featuring Impressive Systems (Think Fugaku)

By John Russell

November 19, 2021

Earlier this week MLCommons issued results from its latest MLPerf HPC training benchmarking exercise. Unlike other MLPerf benchmarks, which mostly measure the training and inference performance of systems that are available for purchase or use in the cloud, MLPerf HPC has showcased performances of large, complicated, research-oriented systems – the top of the food chain, if you will. Fugaku – the reigning Top500 champ – was a top performer.

This is just the second running of the MLPerf HPC training benchmark which debuted last year at SC20. While the number of participants remains small (8 this year versus 6 last year) they are impressive including systems such as Piz Daint (CSCS), Theta ANL), Perlmutter (NERSC), JUWELS Booster (Jülich SC), HAL cluster (NCSA), Selene (Nvidia) and Frontera (TACC).

MLCommons has continued improving the HPC benchmark. The latest version (v1.0) adds a third HPC application – OpenCatalyst – and separates out strong-scaling and weak-scaling Here’s an excerpt from the MLPerf website on the changes:

  • “MLPerf HPC v1.0 is a significant update and includes a new benchmark as well as a new performance metric. The OpenCatalyst benchmark predicts the quantum mechanical properties of catalyst systems to discover and evaluate new catalyst materials for energy storage applications. This benchmark uses the OC20 dataset from the Open Catalyst Project, the largest and most diverse publicly available dataset of its kind, with the task of predicting energy and the per-atom forces. The reference model for OpenCatalyst is DimeNet++, a graph neural network (GNN) designed for atomic systems that can model the interactions between pairs of atoms as well as angular relations between triplets of atoms.
  • “MLPerf HPC v1.0 also features a novel weak-scaling performance metric that is designed to measure the aggregate machine learning capabilities for leading supercomputers. Most large supercomputers run multiple jobs in parallel, for example training multiple ML models. The new benchmark trains multiple instances of a model across a supercomputer to capture the impact on shared resources such as the storage system and interconnect. The benchmark reports both the time-to-train for all the model instances and the aggregate throughput of an HPC system, i.e., number of models trained per minute. Using the new weak-scaling metric, the MLPerf HPC benchmarks can measure the ML capabilities for supercomputers of any size, from just a handful of nodes to the world’s largest systems.”

(List of participating organizations: Argonne National Laboratory, the Swiss National Supercomputing Centre, Fujitsu and Japan’s Institute of Physical and Chemical Research (RIKEN), Helmholtz AI (a collaboration of the Jülich Supercomputing Centre at Forschungszentrum and the Steinbuch Centre for Computing at the Karlsruhe Institute of Technology), Lawrence Berkeley National Laboratory, the National Center for Supercomputing Applications, NVIDIA, and the Texas Advanced Computing Center.)

In reporting on other MLPerf benchmarks, much of the emphasis is on accelerator/CPU combinations and comparing their performances. To that extent, MLPerf has largely been a showcase for Nvidia GPU advances (software and hardware) which, frankly, are impressive. NVIDIA GPUs again showed strong performances and the company has touted that in a blog (MLPerf HPC Benchmarks Show the Power of HPC+AI). For commercially available, GPU-accelerated systems, Nvidia has enjoyed steady dominance.

The MLPerf HPC benchmark is in many ways more interesting if perhaps less useful as a purchase-guiding (and marketing) tool. The systems featured are complicated and powerful and each possesses distinct advantages. Fugaku, for example, doesn’t rely on separate GPU accelerators.

Fujitsu issued a press release saying, Fugaku took,  “First place amongst all the systems for the CosmoFlow training application benchmark category, demonstrating performance at rates approximately 1.77 times faster than other systems. This result revealed that Fugaku has the world’s highest level of performance in the field of large-scale scientific and technological calculations using machine learning.” It is a wonderful machine.

Best to dig into the full results to get a fuller picture. That said, included in the results report were statements from participating systems organizations on their approaches to running the benchmark. These are, on balance, quite substantive and informative. Here’s are small portions of two of the statements. All of those submitted are included at the end of the article and they are well worth reading:

  • ANL – “These benchmarks were run on 16 NVIDIA DGX3 nodes (128 A100 GPUs) of Theta. We made minor modifications to the DeepCam and OpenCatalyst submissions in order to correctly initialize MPI communication for distributed training. After confirming that all of the models were working as expected, we ran preliminary tests to verify that our workflows would be compliant with the MLPerf HPC requirements (logging, system information, etc.). The available documentation helped us understand the impact of the various hyperparameters on the model training performance. We started with the default parameters and tuned the hyperparameters to reduce the overall training cost. We employed data staging on the node-local storage NVMe to accelerate the I/O.”
  • Fugaku – “For weak scaling, since the job scheduler cannot launch a large number of instances immediately, inter-instance synchronization across jobs was added to align start times among instances. Moreover, to avoid excessive access to the FEFS from all instances, the dataset is staged to node local memory using a MPI program that only the first instance reads the dataset from FEFS and broadcasts it to the other instances. We actually ran 648 instances (82,944 nodes) but submitted 637 instance results of them. The pruned instances consist of 1 instance that hung during training, 6 instances that used the same seed value as others unintentionally, and 4 instances that took particularly long time.”

The latest MLPerf benchmark results provide an interesting look at side-by-side performances on these impressive systems.

SUBMITTED STATEMENTS

Argonne National Laboratory (ANL)

The Argonne Leadership Computing Facility (ALCF) [1], a U.S. Department of Energy (DOE) Office of Science User Facility located at Argonne National Laboratory, enables breakthroughs in science and engineering by providing supercomputing resources and expertise to the research community. The Theta supercomputer [2] is operated and maintained by the ALCF. ThetaGPU, a 3.9 Petaflops system has 24 Nvidia DGX3 A100 nodes with eight (8) NVIDIA A100 Tensor Core GPUs and two (2) AMD Rome CPUs per node that provide 320 gigabytes (7680 GB aggregately) of GPU memory for training artificial intelligence (AI) datasets, while also enabling GPU-specific and enhanced high-performance computing (HPC) applications for modeling and simulation.

For the 2021 MLPerf HPC v1.0, we submitted strong scaling results for DeepCam and OpenCatalyst training benchmarks in the closed division. These benchmarks were run on 16 NVIDIA DGX3 nodes (128 A100 GPUs) of Theta. We made minor modifications to the DeepCam and OpenCatalyst submissions in order to correctly initialize MPI communication for distributed training. After confirming that all of the models were working as expected, we ran preliminary tests to verify that our workflows would be compliant with the MLPerf HPC requirements (logging, system information, etc.). The available documentation helped us understand the impact of the various hyperparameters on the model training performance. We started with the default parameters and tuned the hyperparameters to reduce the overall training cost. We employed data staging on the node-local storage NVMe to accelerate the I/O.

The insights gained from these runs will help us improve our efforts to optimize large scientific machine learning applications on the upcoming supercomputers, Polaris and Aurora, and thereby glean insights faster.

[1] https://www.alcf.anl.gov/

[2] https://www.alcf.anl.gov/alcf-resources/theta

Swiss National Supercomputing Centre

The Swiss National Supercomputing Centre (CSCS) participated in MLPerf HPC v1.0 with the Open Catalyst and DeepCAM benchmarks on our flagship system, Piz Daint.

Our focus in this round was on recent trends in scientific deep learning within the atmospheric modelling and atomistic simulation communities, and these two benchmarks represent well the growing usage of data from physical simulations for large scale deep learning in these domains.

Managing the data processing requirements of large scale climate simulations is a challenge of the EXCLAIM program. Segmentation tasks such as the one solved by DeepCAM arise naturally when compressing the output of global weather simulations with regional detail resolution for storage.

In our submissions to DeepCAM, we improved the code for higher performance on our distributed file system. In particular, on 128 GPUs, where the dataset does not fit in RAM, prefetching the data before using it on the GPU allowed us to guarantee 98% GPU utilization on average. To sustain this performance up to 1,024 GPUs, we added a caching mechanism in PyTorch that makes effective use of the much larger RAM capacity. Furthermore, we found that performance at this scale is highly sensitive to tuning communication – in particular a tree-based algorithm and sufficient GPU resources in NCCL – which is consistent with last year’s finding on fine-grained communication in CosmoFlow.

The purpose of OpenCatalyst, which we ran on 256 GPUs, is highly aligned with our PASC project “Machine learning for materials and molecules: toward the exascale”, which investigates methods for high fidelity molecular dynamics simulations with potentials that accurately reproduce expensive electronic structure calculations using ML techniques.

Together with last year’s results on CosmoFlow, these submissions complete the coverage of the full MLPerf HPC benchmark suite on Piz Daint and will serve as a baseline for the newly upcoming system, Alps.

Fujitsu + RIKEN

RIKEN and Fujitsu jointly developed the world’s top-level supercomputer—the supercomputer Fugaku—capable of realizing high effective performance for a broad range of application software, and started its official operation on March 9, 2021 [1]. RIKEN and Fujitsu submitted CosmoFlow results to closed division using 512 nodes for strong scaling and 81,536 nodes (=128 nodes×637 model instances) for weak scaling.

For both weak and strong scaling, LLIO (Lightweight Layered IO Accelerator) was used to cache library and program files from FEFS (Fujitsu Exabyte File System) storage. We developed customized TensorFlow and optimized oneAPI Deep Neural Network Library (oneDNN) as the backend [2]. The oneDNN uses JIT assembler Xbyak_aarch64 to exploit the performance of A64FX.

For weak scaling, since the job scheduler cannot launch a large number of instances immediately, inter-instance synchronization across jobs was added to align start times among instances. Moreover, to avoid excessive access to the FEFS from all instances, the dataset is staged to node local memory using a MPI program that only the first instance reads the dataset from FEFS and broadcasts it to the other instances. We actually ran 648 instances (82,944 nodes) but submitted 637 instance results of them. The pruned instances consist of 1 instance that hung during training, 6 instances that used the same seed value as others unintentionally, and 4 instances that took particularly long time.

For strong scaling, we used reformatted uncompressed TFRecord dataset to improve training throughput. The reference dataset is compressed with gzip and needs decompression at each training step. Since the number of nodes increases from weak scaling and the amount of staging data per node decreases, the uncompressed dataset could be used.

In this round, the performance of the Fugaku half-system with more than 80,000 nodes can be evaluated using the new weak scaling metric.

[1] https://www.fujitsu.com/global/about/innovation/fugaku/

[2] https://github.com/fujitsu

Helmholtz AI (JSC – FZJ, SCC – KIT)

In the Helmholtz AI platform, Germany’s largest research centers have teamed up to bring cutting-edge AI methods to scientists from other fields. With this in mind, researchers and Helmholtz AI members from the Jülich Supercomputing Centre (JSC) at Forschungszentrum Jülich and the Steinbuch Centre for Computing (SCC) at Karlsruhe Institute of Technology have jointly submitted their results for the MLPerf™ HPC benchmarking suite. We successfully executed large-scale training runs of the CosmoFlow and DeepCAM applications with up to 3072 NVIDIA A100 GPUs on the JUWELS supercomputer at JSC and the HoreKa supercomputer at SCC.

While striving for performance, it is vital to balance the environmental costs of such large-scale measurements. With JUWELS and HoreKa ranking among the top 15 on the worldwide Green500 list of energy-efficient supercomputers, the high performance computing resources in Helmholtz AI are both computationally and energy efficient. Not only have we used these benchmarks to better understand our current systems in preparation for improved future systems but also for testing tools to inform users of the carbon footprint of each individual computing job.

An important step to maximizing the performance was using an optimized HDF5 file format for the dataset. With this, it was possible to get the maximum data loading performance. This was a result of the Helmholtz AI team jointly analyzing the execution performance and implementing a solution that works optimally on both supercomputers. The joint effort to submit competitive results for the MLPerf™ HPC benchmarking suite has been another important step towards democratizing AI for all Helmholtz researchers.

Lawrence Berkeley National Lab (LBNL)

The MLPerf HPC v1.0 benchmarks represent the growing scientific AI computational workload at DOE HPC facilities like NERSC. The applications push on HPC system capabilities for compute, storage, and network, making the benchmark suite a valuable tool for assessing and optimizing system performance.

For LBNL, this round featured the debut of Perlmutter Phase 1 at NERSC. Perlmutter Phase 1 has demonstrated itself as a world class AI supercomputer, with leading strong-scaling performance on OpenCatalyst, DeepCAM and CosmoFlow. Additionally, we demonstrated excellent scalability, taking advantage of 5,120 GPUs for the weak-scaling benchmark and metric.

Perlmutter, an HPE Cray EX supercomputer, is designed to meet the emerging simulation, data analytics, and AI requirements of the scientific community. The Phase I system, which debuted at the #5 Top500 spot in June 2021, features more than 6,000 NVIDIA A100 GPUs, an all-flash Lustre filesystem, and a Cray Slingshot network.

LBNL submitted results for all three benchmarks on Perlmutter Phase 1 in the closed division:

  • CosmoFlow and DeepCAM strong-scaling results on 2,048 GPUs
  • CosmoFlow and DeepCAM weak-scaling results on 5,120 GPUs, both run with 10 concurrent model-training instances of 512 GPUs each
  • An OpenCatalyst strong-scaling result on 512 GPUs.

The submissions utilized various features and optimizations, including:

  • DALI for accelerating the data pipelines in CosmoFlow and DeepCAM
  • Fast data staging from all-flash shared filesystem into on-node DRAM
  • PyTorch JIT compilation for DeepCAM and OpenCatalyst
  • CUDA graphs for CosmoFlow and DeepCAM
  • Load-balancing variable-sized samples in OpenCatalyst
  • Shifter containers for all benchmarks based on NGC PyTorch and MXNet releases.

National Center for Supercomputing Applications (NCSA)

The National Center for Supercomputing Applications (NCSA) is a hub of interdisciplinary research and digital scholarship where University of Illinois faculty, staff, students, and collaborators from around the world work together to address research grand challenges for the benefits of science and society [1].

This year, the NCSA team participated in MLPerf HPC v1.0 with the DeepCAM and Open Catalyst benchmarks carried out on the Hardware-Accelerated Learning (HAL) system [2]. This system is composed of 16 IBM AC922 8335-GTH compute nodes, each containing two 20-core IBM POWER9 CPUs, 256 GB memory, four NVIDIA V100 GPUs with NVLink 2.0, and an EDR InfiniBand adapters to provide high-performance communication. The two storage nodes provide 224 TB of usable NVMe SSD-based storage capable of a peak cluster-aggregate bandwidth of over 90GB/s.

The experience we obtained from this year’s submission has already benefited multiple research projects, especially for their software environment configuration and optimization. Moreover, the insights we learned from this year will also contribute to the design of our future ML/DL systems.

[1] http://www.ncsa.illinois.edu/

[2] https://wiki.ncsa.illinois.edu/display/ISL20/HAL+cluster

NVIDIA

Cutting edge HPC is blending simulation with AI to reach new levels of performance and accuracy.  Recent advances in molecular dynamics, astronomy and climate simulation all took this approach to making scientific breakthroughs, a trend driving the adoption of exascale AI.

The new MLPerf HPC benchmarks help users compare HPC systems using this style of computing. NVIDIA-powered systems led on four of five benchmarks in the rankings.

Compared to the best v0.7 results, NVIDIA’s supercomputer Selene achieved a 5x better result for cosmoflow at 2x the scale and nearly 7x for deepcam at 4x the scale.  LBNL/Perlmutter lead the new opencatalyst benchmark using 2048 NVIDIA A100s. In the weak-scaling category, Selene lead deepcam at 16 nodes per instance and 256 simultaneous instances.

The MLPerf HPC benchmarks are meant to model the types of workloads HPC centers may perform:

  • Cosmoflow – physical quantity estimation from cosmological image data
  • Deepcam – identification of hurricanes and atmospheric rivers in climate simulation data
  • Opencatalyst (new) – predict energies of molecular configurations based on graph connectivity

Optimizations used to achieve MLPerf HPC v1.0 results:

  • DALI accelerates data processing
  • Use of CUDA graphs reduces small-batch latency
  • SHARP accelerates communication
  • Async DRAM prefetching removes IO from critical path
  • New fused kernels developed

The NVIDIA ecosystem submitted with commercially available platforms using three generations of NVIDIA GPUs (P100, V100, and A100). Supercomputing centers Julich, Argonne National Lab, Lawrence Berkeley National Lab, Swiss National Supercomputing Centre, NCSA, and the Texas Advanced Computing Center made direct submissions, accounting for seven of the eight participants.

The NVIDIA platform excels in both performance and usability, offering a single leadership platform from data center to edge to cloud.  NVIDIA HPC and AI accelerates 2400+ applications today.

All software used for NVIDIA submissions is available from the MLPerf repository, though node and cluster specific tuning is required to get the most from the benchmarks. We constantly add these cutting-edge MLPerf improvements into our Deep learning framework containers available on NGC, our software hub for GPU applications.

Texas Advanced Computing Center (TACC)

Texas Advanced Computing Center (TACC) aims to facilitate novel discoveries that advance science and society through advanced computing technologies. TACC designs and operates some of the world’s most powerful supercomputers, including Frontera, Longhorn, and Stampede2. The Longhorn system consists of 108 hybrid CPU/GPU compute nodes powered by IBM POWER9 processors and NVIDIA Tesla V100 GPUs. Each node provides 40 cores on two sockets, four GPUs, 256 GB of RAM, 900 GB of local storage, and interconnects through Mellanox EDR InfiniBand with other nodes. Longhorn’s multiple GPUs per node facilitate a powerful tool for the research carried out in astronomy and cosmology, fluid particulate, material research, biophysics, and deep learning domains. In 2020, COVID-19 research performed on the Longhorn system won the Association for Computing Machinery Gordon Bell Special Prize in High Performance Computing.

MLCommons HPC applications, e.g., CosmoFlow and Deepcam, provide an invaluable opportunity to understand the infrastructure requirements of next-generation Machine Learning and  Deep Learning applications. This year, TACC participated in MLCommons HPC v1.0 benchmarking by submitting the performance of Cosmoflow and Deepcam applications at 32 nodes (128  Tesla V100 GPUs) of its Longhorn system [1]. The lessons learned from these submissions will help envision the architecture of forthcoming TACC systems that will assist its rapidly growing AI users in solving intractable problems deterministically.

[1] https://www.tacc.utexas.edu/systems/longhorn

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire