Preparing for Aurora: Abhishek Bagusetty Helps Develop and Test Software

May 11, 2022 — In this series, ALCF examines the range of activities and collaborations that ALCF staff undertake to guide the facility and its users into the next era of scientific computing.

The deployment of a leadership-scale computing system is an ambitious undertaking. At the Argonne Leadership Computing Facility (ALCF), a U.S. Department of Energy user facility at DOE’s Argonne National Laboratory, staff members and collaborators throughout the high-performance computing (HPC) research community are working to develop not just the hardware but also the software tools, codes, and methods necessary to fully exploit next-generation systems at launch.

In the following Q&A, Abhishek Bagusetty, a computational scientist at the ALCF, discusses his software development work in support of the launch of Aurora, Argonne’s forthcoming exascale system.

How long have you worked in HPC?

I’ve been working with HPC projects that cut across various domains—especially spanning computational fluid dynamics (CFD), domain-specific languages (DSLs) and molecular simulations of materials—since 2012, when I was a master’s student at the University of Utah. In particular, my work focuses on general-purpose computing on graphics processing units (GPGPU) applications.

What most interests you in your work?

With the evolution of GPGPU programming models, helping domain scientists to focus more on accelerated scientific discovery has become a greater priority. Keeping up with the emergence of programming models, programming languages, and their integration to domain science projects—all within the context of performance and portability—is an enormous challenge and a compelling research thrust.

What does your Aurora development work consist of?

Readying new computing systems involves lots of porting, compiling, testing, and evaluating—not just applications, but libraries, modules, and frameworks as well.

My current research supports Exascale Computing Project (ECP) work in the application development domain, including the NWChemEx and Energy Exascale Earth System Model (E3SM) codes; and in the software technology domain, as relates to mathematical libraries such as HYPRE and SuperLU.

Development on NWChemEx, a chemistry code for modeling molecular systems, includes enabling support across frameworks and libraries for the DPC++ programming language.

An important component of E3SM, a climatological application, is a model for incorporating cloud physics while also obtaining the throughput necessary for multidecade, coupled high-resolution simulations—which are otherwise so computationally expensive as to overtax even exascale systems. While this tool will improve the scientific community’s ability to assess the regional impacts of climate change on the water cycle, making it functional means implementing lots of models for microphysics and turbulence.

I’ve also been involved in supporting a multi-scale, multi-physics science application called Uintah, developed at University of Utah. Uintah is primarily an asynchronous many-task runtime system for next-generation architectures and exascale supercomputers.

All of these projects utilize the Data Parallel C++ (DPC++) programming model, which has the benefit of providing modern, portable, single-source code C++ design patterns that can be related to existing GP-GPU programming models. DPC++ itself will rely on Intel oneAPI Level Zero as the runtime engine for Aurora’s exascale architecture.

The Level Zero API aims to provide direct-to-metal interfaces for offloading accelerator devices. Its programming interface can be tailored to fit the needs of any device and can be adapted to support a broader set of language features, such as function pointers, virtual functions, unified memory, and I/O capabilities.

Who do you collaborate with for this work?

The teams I collaborate with are based predominantly at DOE facilities, including colleagues at Argonne, Pacific Northwest, Oak Ridge, Lawrence Livermore, Ames, Brookhaven, and Berkeley national laboratories. Much of my work—especially efforts related to porting, testing, and evaluation of performance characteristics for Aurora computing architecture—also involves collaborating with several members from Intel’s Center of Excellence, located at Argonne.

The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines. Supported by the U.S. Department of Energy’s (DOE’s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of two DOE Leadership Computing Facilities in the nation dedicated to open science.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.

The U.S. Department of Energy’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States and is working to address some of the most pressing challenges of our time. For more information, visit https://energy.gov/science.

Source: Nils Heionen, Argonne Leadership Computing Facility