AMReX: A Performance-Portable Framework for Block-Structured Adaptive Mesh Refinement Applications

By Rob Farber

August 21, 2023

Performance, portability, and broad functionality are all key features of the AMReX software framework, which was developed by researchers at Lawrence Berkeley National Laboratory (Berkeley Lab), the National Renewable Energy Laboratory, and Argonne National Laboratory as part of the US Department of Energy’s (DOE’s) Exascale Computing Project (ECP) AMReX Co-Design Center.

Figure 1. AMReX utilizes a hierarchical representation of the solution at multiple levels of resolution.

The ECP AMReX Co-Design Center ensures that this popular and heavily utilized software framework for massively parallel, block-structured adaptive mesh refinement (AMR) applications can run efficiently on DOE supercomputers. Numerous ECP applications utilize AMReX to model a broad range of different applications, including accelerator design, astrophysics, combustion, cosmology, multiphase flow, and wind energy.

To address such a broad range of physical phenomena, the software must support a wide range of algorithmic requirements. As noted in “A Survey of Software Implementations Used by Application Codes in the Exascale Computing Project,” AMReX achieves this platform portability through its APIs. Other ECP co-design centers such as the Center for Efficient Exascale Discretizations (CEED) and Co-design Center for Particle Applications also utilize this same approach to deliver performance portability.

From CPUs to Heterogeneous Architectures

AMReX was originally based on the earlier BoxLib framework, which was used to develop AMR applications. John Bell (Figure 2), principal investigator of the AMReX Co-Design Center and a senior scientist in the Applied Mathematics and Computational Research division at Berkeley Lab, explained, “ECP funding allowed us to completely redesign BoxLib, which was designed for CPU-only systems, to create AMReX, which provides a performance-portable framework that supports both multicore CPUs and a number of different GPU accelerators. AMReX is currently used by a diverse set of applications on many different systems.”

Figure 2. John Bell

For those application codes that were already based on BoxLib, the AMReX team documented how to migrate their codes from BoxLib to AMReX. (This documentation is available in the AMReX repository at Docs/Migration)

Along with GPU acceleration, Bell noted, “One of the key design features of AMReX is that it separates the basic data structures and core operations on those data structures from the algorithms used by a particular application, providing developers a lot more flexibility in how to solve their problems.”

Technology Introduction

Scientists use block-structured AMR as a “numerical microscope” for solving systems of partial differential equations (PDEs). AMReX provides a framework for developing algorithms to solve these systems, targeting machines ranging from laptops to exascale architectures both with and without GPU acceleration.

Scientists describe a wide range of physical phenomena using PDEs, which are relationships between derivatives of different quantities describing the system. The wind flowing over a mountain range, the vibration of a bridge during an earthquake, and the burning inside a supernova are all described by PDEs. Solving PDEs allows scientists to gain insight into the behavior of complex systems. However, in most cases, no easy mathematical solution to a system of PDEs exists.

Instead, they must be solved using a computer. Central to solving PDEs on a computer is how the scientist represents the system. One common approach is to define the state of the system in terms of its values on a finite mesh of points. In this type of mesh-based approach, the finer the mesh (i.e., the more points it contains), the better the representation of the solution. AMR algorithms dynamically control the number and location of mesh points to minimize the computational cost while solving the problem with sufficient accuracy.

As noted in “AMReX: Block-Structured Adaptive Mesh Refinement for Multiphysics Applications,” block-structured AMR algorithms utilize a hierarchical representation of the solution at multiple levels of resolution (Figure 1). At each level, the solution is defined on the union of data containers at that resolution, each of which represents the solution over a logically rectangular subregion of the domain.

This representation can be data defined on a mesh, particle data, or combinations of both. For mesh-based algorithms, AMReX supports a wide range of spatial and temporal discretizations. Support for solving linear systems includes native geometric multigrid solvers and the ability to link to external linear algebra software. For problems that include stiff systems of ordinary differential equations that represent single-point processes such as chemical kinetics or nucleosynthesis, AMReX provides an interface to ordinary differential equation solvers provided by SUNDIALS. AMReX also supports embedded boundary (cut cell) representations of complex geometries.

The Rationale for Block-Structured AMR Codes

Figure 3. Detailed adaptive simulation of a burning dimethyl ether jet. Credit: AMReX.

Bell observed, “AMR differs from approaches such as the multigrid method that use a hierarchy of coarser grids to develop efficient algorithms to solve a problem at a fixed fine resolution. AMR represents the solution using a different resolution in different parts of the domain to focus computational effort where it is needed, such as at a shock wave or flame front.” The adaptive simulation of a burning dimethyl ether flame illustrates the ability to use finer resolution only where it is most needed (Figure 3).

The code determines when and where more resolution is required based on user-provided criteria, which means that the mesh can be dynamically adapted during the simulation. The AMReX compressible gas dynamics tutorial illustrates how an embedded boundary representation can dynamically adapt the mesh resolution over both space and time (Figure 4). This and more complex 2D and 3D examples can be viewed here, which includes several animations shared with the AMReX team by researchers using AMReX-based codes to study a variety of phenomena.

Figure 4. Compressible gas dynamics shock reflection using an embedded boundary representation of the ramp. The colors represent the density field. There are three total levels of refinement. Credit: AMReX.

GPU Acceleration

DOE exascale supercomputers are large, distributed architectures that rely on GPUs to achieve high performance. Bell observed, “Block-structured AMR algorithms have a natural hierarchical parallelism that makes them ideally suited to GPU-accelerated supercomputers. In an AMR algorithm, the representation of the solution is broken into large patches that can be distributed across the nodes of the machine. The operations on each patch occur on a large block of data that can be performed efficiently on a GPU.”

Operations on patches form the numerical core of AMR algorithms; however, efficient execution of supercomputers involves other considerations. Operating independently on patches does not lead to a full solution to the problem. The results on patches must be stitched together to form a complete solution. When advancing the solution for a single time step, each patch needs data from neighboring patches to move forward. After the solution on all the patches is advanced, the new solution on all the patches must be synchronized. Bell noted, “Synchronization operations reflect the underlying physical processes. For applications solving a relatively simple set of equations, the synchronization is fairly simple, but for large complex multiphysics applications, synchronization can be a complex multistage process.”

AMReX does a number of things “under the hood” to orchestrate the execution of AMR algorithms. At the start of a simulation and each time the adaptive algorithm changes the grid layout to reflect changing conditions, the data must be distributed across the nodes of a distributed computing architecture. The distribution must balance the computational work across the nodes while minimizing the cost of communication between nodes. Exchanging data between patches and synchronization operations requires efficient communication algorithms to minimize communication costs.

Using GPUs raises several other issues. Data migration between the GPU and its host CPU can be very expensive in terms of run time. AMReX provides tools to allow applications to manage memory that avoid unnecessary data movement. Additionally, no single programming model for GPUs exists. Each type has its own hardware characteristics and software environment. To address this issue, AMReX provides a lightweight abstraction layer that hides the details of a particular architecture from application code. This layer provides constructs that allow the user to specify the operations they want to perform on a block of data without specifying how those operations are carried out. AMReX then maps those operations onto the specific hardware at compile time so that the hardware is utilized effectively.

From a mathematical and computational perspective, Andrew Myers, computer systems engineer at Berkeley Lab and one of the AMReX lead developers, pointed out that “AMReX preserves many of the nice regular data access patterns as with a regular grid. This improves performance. It also makes the reasoning about the numerical method ‘easier‘ because the algorithm locally computes on a structured grid rather than a completely unstructured grid.”

For a more in-depth explanation, see “AMReX: Block-Structured Adaptive Mesh Refinement for Multiphysics Applications.”

To continue reading this ECP technical highlight, click here.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

2024 Winter Classic: The Return of Team Fayetteville

April 18, 2024

Hailing from Fayetteville, NC, Fayetteville State University stayed under the radar in their first Winter Classic competition in 2022. Solid students for sure, but not a lot of HPC experience. All good. They didn’t Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire