What’s New in HPC Research: HipBone, GPU-Aware Asynchronous Tasks, Autotuning & More

By Mariana Iriarte

March 10, 2022

In this regular feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here.


A two MPI process mesh arrangement of third-order 2D spectral elements. Credit: Chalmers et al.

HipBone: A performance-portable GPU-accelerated C++ version of the NekBone benchmark

Using three HPC systems at the Oak Ridge Laboratory – Summit supercomputer and Frontier early access clusters, Spock and Crusher – the academic-industry research team (which includes two authors from AMD) demonstrated the performance of hipBone, an open source application for Nek5000 computational fluid dynamics applications. HipBone “is a fully GPU-accelerated C++ implementation of the original NekBone CPU proxy application with several novel algorithmic and implementation improvements, which optimize its performance on modern finegrain parallel GPU accelerators.”  The tests demonstrate hipBone’s “portability across different clusters and very good scaling efficiency, especially on large problems.”

Authors: Noel Chalmers, Abhishek Mishra, Damon McDougall, and Tim Warburton

A Case for intra-rack resource disaggregation in HPC

A multi-institution research team utilized Cori, a high performance computing system at the National Energy Research Scientific Computing Center, to analyze “resource disaggregation to enable finer-grain allocation of hardware resources to applications.” In their paper, the authors also profile a “ variety of deep learning applications to represent an emerging workload.” Researchers demonstrated that “for a rack configuration and applications similar to Cori, a central processing unit with intra-rack disaggregation has a 99.5 percent probability to find all resources it requires inside its rack.”

Authors: George Michelogiannakis, Benjamin Klenk, Brandon Cook, Min Yee Teh, Madeleine Glick, Larry Dennison, Keren Bergman, and John Shalf

MPI 3D Jacobi example (Jacobi3D) with a manual overlap option. Credit: Choi et al.

Improving Scalability with GPU-Aware Asynchronous Tasks

Computer scientists from the University of Illinois at Urbana-Champaign and Lawrence Livermore National Laboratory demonstrated improved scalability to hide communication behind computation with GPU-aware asynchronous tasks. According to the authors, “while the ability to hide communication behind computation can be highly effective in weak scaling scenarios, performance begins to suffer with smaller problem sizes or in strong scaling due to fine-grained overheads and reduced room for overlap.” The authors integrated “GPU-aware communication into asynchronous tasks in addition to computation-communication overlap, with the goal of reducing time spent in communication and further increasing GPU utilization.” They were able to demonstrate the performance impact of their approach by utilizing “a proxy application that performs the Jacobi iterative method on GPUs, Jacobi3D.”  In their paper, the authors also dive into “techniques such as kernel fusion and CUDA Graphs to combat fine-grained overheads at scale.”

Authors: Jaemin Choi, David F. Richards, Laxmikant V. Kale

A convolutional neural network based approach for computational fluid dynamics

To overcome the cost, time, and memory disadvantages of using computational fluid dynamic (CFD) simulation, this Indian research team proposed using “a model based on convolutional neural networks, to predict non-uniform flow in 2D.” They define CFD as “the visualization of how a fluid moves and interacts with things as it passes by using applied mathematics, physics, and computational software.” The authors’ approach “aims to aid the behavior of fluid particles on a certain system and to assist in the development of the system based on the fluid particles that travel through it. At the early stages of design, this technique can give quick feedback for real-time design revisions.” 

Authors: Satyadhyan Chickerur and P Ashish

A single block of the variational wave function in terms of parameterized quantum circuits. Credit: Rinaldi et al.

Matrix-model simulations using quantum computing, deep learning, and lattice Monte Carlo

This international research team conducted “the first systematic survey for quantum computing and deep-learning approaches to matrix quantum mechanics.” While the “Euclidean lattice Monte Carlo simulations are the de facto numerical tool for understanding the spectrum of large matrix models and have been used to test the holographic duality,” the authors write, “they are not tailored to extract dynamical properties or even the quantum wave function of the ground state of matrix models.” The authors compare the deep learning approaches to lattice Monte Carlo simulations and provide baseline benchmarks. The research leveraged Riken’s HOKUSAI “BigWaterfall” supercomputer.

Authors: Enrico Rinaldi, Xizhi Han, Mohammad Hassan, Yuan Feng, Franco Nori, Michael McGuigan, and Masanori Hanada

GPTuneBand: Multi-task and multi-fidelity autotuning for large-scale high performance computing applications

A group of researchers from Cornell University and Lawrence Berkeley National Laboratory propose a multi-task and multi-fidelity autotuning framework, called GPTuneBand to tune high-performance computing applications. GPTuneBand combines a multi-task Bayesian optimization algorithm with a multi-armed bandit strategy, well-suited for tuning expensive HPC applications such as numerical libraries, scientific simulation codes and machine learning models, particularly with a very limited tuning budget,” the authors write. Compared to its predecessor, GPTuneBand demonstrated “a maximum speedup of 1.2x, and wins over a single-task, multi-fidelity tuner BOHB on 72.5 percent tasks.”

Authors: Xinran Zhu, Yang Liu, Pieter Ghysels, David Bindel, Xiaoye S. Li

High performance computing architecture for sample value processing in the smart grid

In this Open Access article, a group of researchers from the University of the Basque Country, Spain, present a high level interface solution for application designers that addresses the challenges of current technologies for the Smart Grid. Making the case that FPGAs provide superior performance and reliability over CPUs, the authors present a “solution to accelerate the computation of hundreds of streams, combining a custom-designed silicon Intellectual Property and a new generation field programmable gate array-based accelerator card.” The researchers leverage Xilinx’s FPGAs and adaptive computing framework.

Authors: Le Sun, Leire Muguira, Jaime Jiménez, Armando Astarloa, Jesús Lázaro


Do you know about research that should be included in next month’s list? If so, send us an email at [email protected]. We look forward to hearing from you.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Like Nvidia, Google’s Moat Draws Interest from DOJ

October 14, 2024

A "moat" is a common term associated with Nvidia and its proprietary products that lock customers into their hardware and software. Another moat breakdown should have them concerned. The U.S. Department of Justice is Read more…

Recipe for Scaling: ARQUIN Framework for Simulating a Distributed Quantum Computing System

October 14, 2024

One of the most difficult problems with quantum computing relates to increasing the size of the quantum computer. Researchers globally are seeking to solve this “challenge of scale.” To bring quantum scaling closer Read more…

Nvidia Is Increasingly the Secret Sauce in AI Deployments, But You Still Need Experience

October 14, 2024

I’ve been through a number of briefings from different vendors from IBM to HP, and there is one constant: they are all leaning heavily on Nvidia for their AI services strategy. That may be a best practice, but Nvidia d Read more…

Zapata Computing, Early Quantum-AI Software Specialist, Ceases Operations

October 14, 2024

Zapata Computing, which was founded in 2017 as a Harvard spinout specializing in quantum software and later pivoted to an AI focus, is ceasing operations, according to an SEC filing last week. Zapata had gone public one Read more…

AMD Announces Flurry of New Chips

October 10, 2024

AMD today announced several new chips including its newest Instinct GPU — the MI325X — as it chases Nvidia. Other new devices announced at the company event in San Francisco included the 5th Gen AMD EPYC processors, Read more…

NSF Grants $107,600 to English Professors to Research Aurora Supercomputer

October 9, 2024

The National Science Foundation has granted $107,600 to English professors at US universities to unearth the mysteries of the Aurora supercomputer. The two-year grant recipients will write up what the Aurora supercompute Read more…

Nvidia Is Increasingly the Secret Sauce in AI Deployments, But You Still Need Experience

October 14, 2024

I’ve been through a number of briefings from different vendors from IBM to HP, and there is one constant: they are all leaning heavily on Nvidia for their AI Read more…

NSF Grants $107,600 to English Professors to Research Aurora Supercomputer

October 9, 2024

The National Science Foundation has granted $107,600 to English professors at US universities to unearth the mysteries of the Aurora supercomputer. The two-year Read more…

VAST Looks Inward, Outward for An AI Edge

October 9, 2024

There’s no single best way to respond to the explosion of data and AI. Sometimes you need to bring everything into your own unified platform. Other times, you Read more…

Google Reports Progress on Quantum Devices beyond Supercomputer Capability

October 9, 2024

A Google-led team of researchers has presented more evidence that it’s possible to run productive circuits on today’s near-term intermediate scale quantum d Read more…

At 50, Foxconn Celebrates Graduation from Connectors to AI Supercomputing

October 8, 2024

Foxconn is celebrating its 50th birthday this year. It started by making connectors, then moved to systems, and now, a supercomputer. The company announced it w Read more…

The New MLPerf Storage Benchmark Runs Without ML Accelerators

October 3, 2024

MLCommons is known for its independent Machine Learning (ML) benchmarks. These benchmarks have focused on mathematical ML operations and accelerators (e.g., Nvi Read more…

DataPelago Unveils Universal Engine to Unite Big Data, Advanced Analytics, HPC, and AI Workloads

October 3, 2024

DataPelago this week emerged from stealth with a new virtualization layer that it says will allow users to move AI, data analytics, and ETL workloads to whateve Read more…

Stayin’ Alive: Intel’s Falcon Shores GPU Will Survive Restructuring

October 2, 2024

Intel's upcoming Falcon Shores GPU will survive the brutal cost-cutting measures as part of its "next phase of transformation." An Intel spokeswoman confirmed t Read more…

Shutterstock_2176157037

Intel’s Falcon Shores Future Looks Bleak as It Concedes AI Training to GPU Rivals

September 17, 2024

Intel's Falcon Shores future looks bleak as it concedes AI training to GPU rivals On Monday, Intel sent a letter to employees detailing its comeback plan after Read more…

Granite Rapids HPC Benchmarks: I’m Thinking Intel Is Back (Updated)

September 25, 2024

Waiting is the hardest part. In the fall of 2023, HPCwire wrote about the new diverging Xeon processor strategy from Intel. Instead of a on-size-fits all approa Read more…

Ansys Fluent® Adds AMD Instinct™ MI200 and MI300 Acceleration to Power CFD Simulations

September 23, 2024

Ansys Fluent® is well-known in the commercial computational fluid dynamics (CFD) space and is praised for its versatility as a general-purpose solver. Its impr Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Shutterstock_1687123447

Nvidia Economics: Make $5-$7 for Every $1 Spent on GPUs

June 30, 2024

Nvidia is saying that companies could make $5 to $7 for every $1 invested in GPUs over a four-year period. Customers are investing billions in new Nvidia hardwa Read more…

Shutterstock 1024337068

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Leading Solution Providers

Contributors

IBM Develops New Quantum Benchmarking Tool — Benchpress

September 26, 2024

Benchmarking is an important topic in quantum computing. There’s consensus it’s needed but opinions vary widely on how to go about it. Last week, IBM introd Read more…

Intel Customizing Granite Rapids Server Chips for Nvidia GPUs

September 25, 2024

Intel is now customizing its latest Xeon 6 server chips for use with Nvidia's GPUs that dominate the AI landscape. The chipmaker's new Xeon 6 chips, also called Read more…

Quantum and AI: Navigating the Resource Challenge

September 18, 2024

Rapid advancements in quantum computing are bringing a new era of technological possibilities. However, as quantum technology progresses, there are growing conc Read more…

IonQ Plots Path to Commercial (Quantum) Advantage

July 2, 2024

IonQ, the trapped ion quantum computing specialist, delivered a progress report last week firming up 2024/25 product goals and reviewing its technology roadmap. Read more…

Google’s DataGemma Tackles AI Hallucination

September 18, 2024

The rapid evolution of large language models (LLMs) has fueled significant advancement in AI, enabling these systems to analyze text, generate summaries, sugges Read more…

Microsoft, Quantinuum Use Hybrid Workflow to Simulate Catalyst

September 13, 2024

Microsoft and Quantinuum reported the ability to create 12 logical qubits on Quantinuum's H2 trapped ion system this week and also reported using two logical qu Read more…

US Implements Controls on Quantum Computing and other Technologies

September 27, 2024

Yesterday the Commerce Department announced export controls on quantum computing technologies as well as new controls for advanced semiconductors and additive Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire