Inside the Gordon Bell Prize Finalist Projects

By Oliver Peckham

September 7, 2022

The ACM Gordon Bell Prize, which comes with a $10,000 award courtesy of HPC luminary Gordon Bell, is widely considered the highest prize in high-performance computing. Each year, six finalists are selected who represent the pinnacle of outstanding research achievements in HPC. Last month, listings on the SC22 schedule revealed those finalists. Over the last few weeks, HPCwire got in touch with members of the six finalist teams to learn more about their projects.

Last year, for the first time, the Gordon Bell Prize nominees included two projects powered by exascale computing — specifically, China’s “new Sunway supercomputer,” also known as OceanLight. These research papers, at the time, constituted the most substantively “official” reveal of the system (which remains unranked). One of those OceanLight-powered papers — a challenge to Google’s quantum supremacy claim — won that year’s Gordon Bell Prize.

In 2022, OceanLight has exascale-caliber competition: not one but two of the other five finalist projects used the new American exascale supercomputer, Frontier, which launched earlier this year at Oak Ridge National Lab (ORNL). And, beyond OceanLight and Frontier, previous Top500-toppers Fugaku (RIKEN) and Summit (ORNL) both return to the list under multiple finalist teams, along with Perlmutter (at NERSC, the National Energy Research Scientific Computing Center) and Shaheen-2 (at KAUST, the King Abdullah University of Science and Technology).

And now: the finalist projects.

Using OceanLight to simulate millions of atoms

This year sees OceanLight return to the stage as the sole supercomputer behind a paper titled 2.5 Million-Atom Ab Initio Electronic-Structure Simulation of Complex Metallic Heterostructures with DGDFT — a project involving simulations of millions of atoms that made use of tens of millions of cores on OceanLight.


Abstract: Over the past three decades, ab initio electronic structure calculations of large, complex and metallic systems are limited to tens of thousands of atoms in both numerical accuracy and computational efficiency on leadership supercomputers. We present a massively parallel discontinuous Galerkin density functional theory (DGDFT) implementation, which adopts adaptive local basis functions to discretize the Kohn-Sham equation, resulting in a block-sparse Hamiltonian matrix. A highly efficient pole expansion and selected inversion (PEXSI) sparse direct solver is implemented in DGDFT to achieve O(^1.5) scaling for quasi two-dimensional systems. DGDFT allows us to compute the electronic structures of complex metallic heterostructures with 2.5 million atoms (17.2 million electrons) using 35.9 million cores on the new Sunway supercomputer. In particular, the peak performance of PEXSI can achieve 64 PFLOPS (∼5 percent of theoretical peak), which is unprecedented for sparse direct solvers. This accomplishment paves the way for quantum mechanical simulations into mesoscopic scale for designing next-generation energy materials and electronic devices.

Per the SC22 schedule, this team includes researchers from the Chinese Academy of Sciences, Peking University, the Pilot National Laboratory for Marine Science and Technology, the National Research Center of Parallel Computer Engineering and Technology, the Qilo University of Technology and the University of Science and Technology of China.


“Our team is highly excited [to be] nominated for the Gordon Bell Prize finalists as we started preparation for this work since last year,” said Qingcai Jiang, a researcher at the University of Science and Technology of China (USTC), in an email to HPCwire. “Our work for the first time achieves plane-wave precision electronic structure calculation for large-scale complex metallic heterostructures containing 2.5 million atoms (17.2 million electrons), and our optimization techniques make our work able to achieve peak performance of 64 PFLOPS (∼5 percent of theoretical peak), which is unprecedented for sparse direct solvers.”

Frontier powers biomedical literature analytics

The first of projects powered by Frontier, titled ExaFlops Biomedical Knowledge Graph Analytics, also made use of ORNL’s previous chart-topper, Summit, and focuses on large-scale mining of biomedical research literature.

The Frontier supercomputer, one of six represented in the Gordon Bell Prize finalists.

Abstract: We are motivated by newly proposed methods for mining large-scale corpora of scholarly publications (e.g., full biomedical literature), which consists of tens of millions of papers spanning decades of research. In this setting, analysts seek to discover relationships among concepts. They construct graph representations from annotated text databases and then formulate the relationship-mining problem as an all-pairs shortest paths (APSP) and validate connective paths against curated biomedical knowledge graphs (e.g., SPOKE). In this context, we present COAST (Exascale Communication-Optimized All-Pairs Shortest Path) and demonstrate 1.004 EF/s on 9,200 Frontier nodes (73,600 GCDs). We develop hyperbolic performance models (HYPERMOD), which guide optimizations and parametric tuning. The proposed COAST algorithm achieved the memory constant parallel efficiency of 99 percent in the single-precision tropical semiring. Looking forward, COAST will enable the integration of scholarly corpora like PubMed into the SPOKE biomedical knowledge graph.

Per the SC22 schedule, this team includes researchers from AMD, the Georgia Institute of Technology, ORNL and the University of California, San Francisco.


“The ability to establish paths between any pair of biomedical concepts with the richness of PubMed in a reasonable time has the potential to revolutionize biomedical research and apply national research funds more effectively,” said Ramakrishnan Kannan, group leader for discrete algorithms at ORNL, in an email to HPCwire. “The comparison of knowledge encoded within SPOKE, which is largely human-curated, against concept relationships that might be mined automatically from a scholarly database like PubMed will result in faster and automated integration of biomedical information at scale.”

According to the team, this project is “the first exascale graph AI demonstration” to run at over one exaflops. “This first demonstration of exascale computation speed will transform the way we currently conduct search in complex heterogeneous knowledge graphs like SPOKE,” the research team told HPCwire. “Specifically, it will enable a new class of algorithms to be implemented in graphs of unprecedented size and complexity. This will greatly improve the quality of biomedical research inquiry, and accelerate the time to patient diagnosis and care like never before.”

Four top-ten supercomputers enable plasma simulations

The second project to use Frontier: Pushing the Frontier in the Design of Laser-Based Electron Accelerators with Groundbreaking Mesh-Refined Particle-In-Cell Simulations on Exascale-Class Supercomputers. Though the title of the paper — which revolved around kinetic plasma simulations — winks at its use of Frontier, the team actually used four supercomputers: Frontier, Fugaku (RIKEN), Summit and Perlmutter (NERSC), meaning that this one paper used four of the top seven supercomputers on the most recent Top500 list. In an email to HPCwire, Jean-Luc Vay — a senior scientist at Lawrence Berkeley National Lab — outlined the science runs of the research, which were conducted on Frontier (up to 8,192 nodes), Fugaku (up to ~93,000 nodes) and Summit (up to 4,096 nodes).

The Perlmutter supercomputer.

Abstract: We present a first-of-kind mesh-refined (MR) massively parallel Particle-In-Cell (PIC) code for kinetic plasma simulations optimized on the Frontier, Fugaku, Summit, and Perlmutter supercomputers. Major innovations, implemented in the WarpX PIC code, include: (i) a three level parallelization strategy that demonstrated performance portability and scaling on millions of A64FX cores and tens of thousands of AMD and Nvidia GPUs (ii) a groundbreaking mesh refinement capability that provides between 1.5x to 4x savings in computing requirements on the science case reported in this paper, (iii) an efficient load balancing strategy between multiple MR levels. The MR PIC code enabled 3D simulations of laser-matter interactions on Frontier, Fugaku, and Summit, which have so far been out of the reach of standard codes. These simulations helped remove a major limitation of compact laser-based electron accelerators, which are promising candidates for next generation high-energy physics experiments and ultra-high dose rate FLASH radiotherapy.

Per the SC22 schedule, this team includes researchers from Arm, Atos, CEA-Université Paris-Saclay, ENSTA Paris, GENCI, Lawrence Berkeley National Lab and RIKEN.


“Plasma accelerator technologies have the potential to provide particle accelerators that are much more compact than existing ones, opening the door to exciting novel applications in science, industry, security and health,” Vay explained. “Exploiting the most powerful supercomputers in the world to boost the research to make these complex machines a reality is so stimulating to all of us.”

“It is thrilling for the entire team to be selected as finalist of the Gordon Bell Prize, even for the one of us (Axel Huebl), for whom it is ‘déjà vu’ as he was already a finalist in 2013 with another (PIConGPU) team,” Vay added. “It is the vindication of years of hard work from the U.S. DOE Exascale Computing Project participants and longstanding collaborators from CEA Saclay in France, coupled to the more recent hard work with colleagues from various labs and private companies in France (Genci, Arm, Atos) and RIKEN in Japan.”

Geostatistics get a boost from Shaheen-2 and Fugaku

The exascale-enabled research only constitutes half the list. Another finalist paper — Reshaping Geostatistical Modeling and Prediction for Extreme-Scale Environmental Applications — used Shaheen-2 as well as Fugaku.

The Shaheen-2 supercomputer.

Abstract: We extend the capability of space-time geostatistical modeling using algebraic approximations, illustrating application-expected accuracy worthy of double precision from majority low-precision computations and low-rank matrix approximations. We exploit the mathematical structure of the dense covariance matrix whose inverse action and determinant are repeatedly required in Gaussian log-likelihood optimization. Geostatistics augments first-principles modeling approaches for the prediction of environmental phenomena given the availability of measurements at a large number of locations; however, traditional Cholesky-based approaches grow cubically in complexity, gating practical extension to continental and global datasets now available. We combine the linear algebraic contributions of mixed-precision and low-rank computations within a tilebased Cholesky solver with on-demand casting of precisions and dynamic runtime support from PaRSEC to orchestrate tasks and data movement. Our adaptive approach scales on various systems and leverages the Fujitsu A64FX nodes of Fugaku to achieve upto 12X performance speedup against the highly optimized dense Cholesky implementation.

Per the SC22 schedule, this team includes researchers from KAUST, ORNL and the University of Tennessee. Perhaps notably, the team also includes Jack Dongarra, one of SC22’s keynote speakers.


“For our exploratory science runs, and to demonstrate the acceptable accuracy of our algorithmic variations on Cholesky factorization and further manipulation of massive covariance matrices, we used Shaheen-2 at KAUST,” explained David Keyes, director of the Extreme Computing Research Center at KAUST, in an email to HPCwire. “Shaheen-2 has only 6,192 nodes, so we applied to use Fugaku at RIKEN to scale further and were generously considered by RIKEN. Fugaku has 158,976 nodes, about 25 times more than Shaheen-2, and each node has 48 cores, 1.5 times more than a Shaheen-2 node. However, each Fugaku node is equipped with only 32GB of memory, one-quarter as much as Shaheen-2’s 128GB per node, thus only one-sixth as much per core, which required us to make software adaptations.”

“Entering the Gordon Bell competition was exciting for all of the team members, especially the students and postdocs,” Keyes said. “It provided an opportunity to run on the world’s second ranked computer. The required algorithmic adaptations to architecture led to improvements in our tools that will be useful at all scales. More importantly, the nomination created excitement with the statistics community since 2022 appears to be the first time after 35 years of the prize that any significant spatial statistics computation, environmental or otherwise, has thus advanced.”

Simulating earthquakes with Fugaku

The final of Fugaku’s three appearances among the finalist list comes courtesy of Extreme Scale Earthquake Simulation with Uncertainty Quantification, which used the second-ranked system to advance scientific understanding of earthquakes and fields with similar dynamics.

The Fugaku supercomputer.

Abstract: We develop a stochastic finite element method with ultra-large degrees of freedom that discretize probabilistic and physical spaces using unstructured second-order tetrahedral elements with double precision using a mixed-precision implicit iterative solver that scales to the full Fugaku system and enables fast Uncertainty Quantification (UQ). The developed solver designed to attain high performance on a variety of CPU/GPU-based supercomputers enabled solving 37 trillion degrees-of-freedom problem with 19.8 percent peak FP64 performance on full Fugaku (89.8 PFLOPS) with 87.7 percent weak scaling efficiency, corresponding to 224-fold speedup over the state of the art solver running on full Summit. This method, which has shown its effectiveness via solving huge (32-trillion degrees-of-freedom) practical problems, is expected to be a breakthrough in damage mitigation, and is expected to facilitate the scientific understanding of earthquake phenomena and have a ripple effect on other fields that similarly require UQ.

Per the SC22 schedule, this team includes researchers from Fujitsu, the Japan Agency for Marine-Earth Science and Technology, RIKEN and the University of Tokyo.


“We are very happy to be selected as finalists,” wrote Tsuyoshi Ichimura, a professor with the Earthquake Research Institute at the University of Tokyo, in an email to HPCwire. “We believe that this has a great impact in showing that capability computing can contribute to an unprecedented Uncertainty Quantification (UQ).”

Leveraging Summit to search proteins

Last, but certainly not least: Extreme-Scale Many-against-Many Protein Similarity Search, which used the Summit supercomputer to perform protein similarity calculations across hundreds of millions of proteins in just a few hours.

The Summit supercomputer.

Abstract: Similarity search is one of the most fundamental computations that are regularly performed on ever-increasing protein datasets. Scalability is of paramount importance for uncovering novel phenomena that occur at very large scales. We unleash the power of over 20,000 GPUs on the Summit system to perform all-vs-all protein similarity search on one of the largest publicly available datasets with 405 million proteins, in less than 3.5 hours, cutting the time-to-solution for many use cases from weeks. The variability of protein sequence lengths, as well as the sparsity of the space of pairwise comparisons, make this a challenging problem in distributed memory. Due to the need to construct and maintain a data structure holding indices to all other sequences, this application has a huge memory footprint that makes it hard to scale the problem sizes. We overcome this memory limitation by innovative matrix-based blocking techniques, without introducing additional load imbalance.

Per the SC22 schedule, this team includes researchers from Indiana University, the Institute for Fundamental Biomedical Research, the Department of Energy’s Joint Genome Institute, Lawrence Berkeley National Lab, Microsoft, NERSC and the University of California, Berkeley.


In an email to HPCwire, the team stressed the importance of this research area to critical fields. “Many-against-many sequence search is the backbone of biological sequence analysis used in drug discovery, healthcare, bioenergy, and environmental studies,” they wrote. “Our work is perhaps the first [Gordon Bell] finalist for a biological sequence analysis problem, which is surprising because sequence analysis is a perfect supercomputing application due to its data and compute intensive nature.”

“Our pipeline, PASTIS, performs a novel application of sparse matrices to narrow down the search space and to avoid quadratic number of sequence comparisons. Sparse matrix computations are much harder to map efficiently to modern supercomputing hardware, especially to GPU-equipped supercomputers such as the Summit system we have used in this work. Our approach cuts back the turnaround time from days to minutes in discovering similar sequences in huge protein datasets to complete the subsequent analytical steps in bioinformatics and allow for exploratory analysis of data sets under different parameter settings.”

What’s next

That’s all of them. For those keeping score at home: three finalist teams used Fugaku; three used Summit; two used Frontier; and OceanLight, Perlmutter and Shaheen-2 were each used by one finalist team. We’re still watching for the reveal of the finalists for the Gordon Bell Special Prize for High Performance Computing-Based Covid-19 Research, which will be awarded for the third time at SC22. At SC22 itself — set to be held in Dallas from November 13-18 — the finalists for both Gordon Bell Prizes will present their research ahead of the award ceremony.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Point. The system includes Intel's research chip called Loihi 2, Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Poin Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire