NERSC Collaborates to Advance Fusion Energy Research with New ‘Superfacility’ Model

May 31, 2024

May 31, 2024 — As the global population continues to grow, demand for energy will also increase. One potentially transformative way of meeting that demand is by generating energy using nuclear fusion, which does not emit carbon or consume scarce natural resources and could, therefore, present a solution that is both equitable and ecologically sound.

Credit: Shutterstock

Although major scientific, technological, and workforce development hurdles remain before energy produced by nuclear fusion is widely available, U.S. Department of Energy (DOE)-funded researchers at the DIII-D National Fusion Facility, the National Energy Research Scientific Computing Center (NERSC), and the Energy Sciences Network (ESnet) are teaming up to bring that vision closer to reality.

This new collaboration leverages high performance computing (HPC) at NERSC, DOE’s high-speed data network (ESnet), and DIII-D’s rich diagnostic suite to make important data from fusion experiments more useful and available to the global fusion research community, helping to accelerate the realization of fusion energy production.

Data + High-Performance Networking + High-Speed Computing = Breakthroughs

At DIII-D, researchers perform experiments to understand how to harness nuclear fusion energy as a viable energy source. Making the most of these experimental resources requires rapid processing of massive amounts of experimental data – a challenge well-suited to HPC facilities.

To meet this challenge, a multi-institutional team from DIII-D, NERSC, and ESnet collaborated to develop a “Superfacility:” a multi-institution scientific environment composed of experimental resources at DIII-D and HPC resources at NERSC interconnected via ESnet6, the latest iteration of ESnet’s dedicated high-speed network for science.

The NERSC and ESnet staff who worked on the DIII-D Superfacility project. Credit: NERSC.

“To achieve the goal of fusion, all available resources, including HPC, need to be applied to the analysis of present fusion experiments to be able to extrapolate to future fusion power plants,” said Sterling Smith, the project lead for the DIII-D team involved in the Superfacility effort. “The use of HPC accelerates advanced analyses so that they can feed into the scientific understanding of fusion and point us toward the solutions needed to realize fusion energy production.”

The Superfacility model harmonizes with the larger DOE vision of an Integrated Research Infrastructure, known as IRI (Read the full DOE IRI report.) It enables near-real-time analysis of massive quantities of data during experiments, allowing tailoring of the experimental process and accelerating the pace of experimental science.

“We have long recognized that experimental science teams need a better way to connect experiment facilities with high-speed networks and HPC,” said NERSC Data Department Head Debbie Bard, who leads Superfacility work at Lawrence Berkeley National Laboratory (Berkeley Lab). “We started the Superfacility project at Berkeley Lab as a broad initiative to develop the tools, infrastructure, and policies to enable these connections. DIII-D and ESnet have been key partners in this work.”

Said Raffi Nazikian, head of ITER research and Senior Director at General Atomics (host of the DIII-D National Fusion Facility), “We see the Superfacility concept, and the emerging Integrated Research Infrastructure under the DOE Office of Advanced Scientific Computing Research, as a transformative capability for fusion research and look forward to exploring its full potential, beginning with the DIII-D/NERSC Superfacility.”

Making Every Shot Count

The device at the heart of the DIII-D National Fusion Facility, the DIII-D tokamak, creates plasmas. In the tokamak, gas atoms are heated to temperatures hotter than the Sun, which causes them to disintegrate into their component electrons and nuclei. The free nuclei may crash into each other and fuse, releasing energy. During experimental sessions, the research team studies the behavior of short plasma discharges called shots, typically performed at 10- to 15-minute intervals with nearly 100 diagnostic and instrumentation systems capturing gigabytes of data during each shot. Between shots, scientists have a brief time to address any issues or evaluate how specific parameter settings affect plasma behavior.

Previously, making adjustments between shots required extensive manual calculations by subject-matter experts, a time- and labor-intensive process that still offered limited information. Automating some of this work has always been a potential solution, but the computational needs of experiments at DIII-D were greater than could be addressed using standard computing systems, and gaining access to a separate supercomputing center on an individual scientist basis tended to be a prohibitively difficult and time-consuming process.

The DIII-D Superfacility team

“While DIII-D has automated rapid data processing performed on local computing systems to provide near-real-time feedback to scientists to inform experimental decision-making, over the years, the fidelity of models and understanding of the physics has increased dramatically,” said David Schissel, DIII-D Computer Systems and Science Coordinator. “Experiments now require higher-resolution, higher-fidelity analyses that cannot be completed on our local systems. The ability to perform this much more detailed analysis in near-real time to inform control room decision-making is possible only through the Superfacility model, which will allow researchers to make better adjustments and learn more from their experiments.”

The technical process of establishing the Superfacility model connecting DIII-D and NERSC via ESnet began with coordinating code: first establishing that the EFIT code used at DIII-D to calculate the device’s equilibrium magnetic field profile would run well on NERSC’s Perlmutter supercomputer. With that established, the combined team adopted the Consistent Automated Kinetic Equilibria (CAKE) workflow, developed by Prof. Egemen Kolemen’s team at Princeton University for DIII-D., to bring together all the “ingredients” provided by separate analysis and modeling codes to produce full descriptions of plasma behavior in the DIII-D tokamak. By monitoring and adjusting the CAKE workflow for optimal use on NERSC, the Superfacility team was able to decrease the time-to-solution by 80%, from 60 minutes to 11 minutes for a benchmark case.

“DOE has invested in establishing DIII-D as the best-diagnosed fusion facility in the world. However, large-scale automated analysis, using tools like CAKE, is the only way to convert all the diagnostic data into useful information; the Superfacility team is taking a key step toward the future of fusion,” said Prof. Egemen Kolemen.

Data collected by DIII-D diagnostics (left) & associated magnetic field reconstruction created using DIII-D/NERSC Superfacility resources.

The increase in speed made it possible to complete these analyses and additional follow-on analyses between DIII-D experimental shots. Between 2008 and 2022, before DIII-D had real-time access to HPC, only 4,000 hand-made reconstructions were produced. In the first six months of the DIII-D/NERSC Superfacility operation, DIII-D achieved more than 20,000 automated high-resolution magnetic field profile reconstructions for 555 DIII-D shots.

The results are now part of a database of high-fidelity results that all DIII-D users have access to and can use to inform experimental planning and interpretation – a cache of information that will benefit the push for fusion energy worldwide.

“DIII-D (and fusion more generally) presents a use-case where returning results quickly really matters,” said Laurie Stephey, a member of the NERSC team. “Many types of fusion simulation and data analysis are too computationally demanding to be run on local resources between shots, so this often means that these analyses either never get done, or if they do get done, they are often finished too late to be actionable. This Superfacility project combines DIII-D and HPC resources to produce something greater than the sum of their parts – just-in-time scientific results that would otherwise not be possible.”

Democratizing Access to HPC Resources

The success of the DIII-D/NERSC Superfacility model is a victory for fusion energy research today, and it may also be a template for expanding the use of Superfacility and other IRI collaborations across the DOE lab complex. The DIII-D team is also working with staff at the Argonne Leadership Computing Facility to analyze plasma pulses quickly, again using ESnet but with a different approach to the workflow.

“ESnet is actively exploring ways to improve network performance for the DIII-D/NERSC Superfacility and other collaborations. As the workflow matures, we anticipate being able to deploy advanced ESnet services that will enable the project to easily expand into multiple computing environments at multiple HPC facilities, in a performant and scientist-friendly way,” said ESnet Science Engagement Team network engineer Eli Dart.

In addition to improved science outcomes, the Superfacility model can help make the experimental process more equitable and inclusive for a broader range of researchers. Providing HPC access to all team researchers without requiring them to acquire their own allocation of compute time allows newer groups and researchers to participate in experimental sessions, as well as connecting all members of experimental teams.

Additionally, the physically distributed nature of the Superfacility lowers barriers to entry for subject-matter experts who may be needed for data analysis during experiments. This may make it easier for early-career researchers with smaller professional networks to collaborate with these experts in their research projects. Overall, these changes contribute to a more inclusive environment for all researchers and build a fusion workforce from a wider cross-section of experts.

This Superfacility collaboration between DIII-D, NERSC, and ESnet successfully demonstrates the value of combining experimental scientific simulations with compute resources leading to solutions needed to realize fusion energy production.

About NERSC and Berkeley Lab

The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, NERSC serves almost 10,000 scientists at national laboratories and universities researching a wide range of problems in climate, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.


Source: Elizabeth Ball (NERSC), Bonnie Powell (ESnet), and Lindsay Ward-Kavanagh (DIII-D)

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Under The Wire: Nearly HPC News (June 13, 2024)

June 13, 2024

As managing editor of the major global HPC news source, the term "news fire hose" is often mentioned. The analogy is quite correct. In any given week, there are many interesting stories, and only a few ever become headli Read more…

Quantum Tech Sector Hiring Stays Soft

June 13, 2024

New job announcements in the quantum tech sector declined again last month, according to an Quantum Economic Development Consortium (QED-C) report issued last week. “Globally, the number of new, public postings for Qu Read more…

Labs Keep Supercomputers Alive for Ten Years as Vendors Pull Support Early

June 12, 2024

Laboratories are running supercomputers for much longer, beyond the typical lifespan, as vendors prematurely deprecate the hardware and stop providing support. A typical supercomputer lifecycle is about five to six years Read more…

MLPerf Training 4.0 – Nvidia Still King; Power and LLM Fine Tuning Added

June 12, 2024

There are really two stories packaged in the most recent MLPerf  Training 4.0 results, released today. The first, of course, is the results. Nvidia (currently king of accelerated computing) wins again, sweeping all nine Read more…

Highlights from GlobusWorld 2024: The Conference for Reimagining Research IT

June 11, 2024

The Globus user conference, now in its 22nd year, brought together over 180 researchers, system administrators, developers, and IT leaders from 55 top research computing centers, national labs, federal agencies, and univ Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst firm TechInsights. Nvidia's GPU shipments in 2023 grew by more Read more…

Under The Wire: Nearly HPC News (June 13, 2024)

June 13, 2024

As managing editor of the major global HPC news source, the term "news fire hose" is often mentioned. The analogy is quite correct. In any given week, there are Read more…

Labs Keep Supercomputers Alive for Ten Years as Vendors Pull Support Early

June 12, 2024

Laboratories are running supercomputers for much longer, beyond the typical lifespan, as vendors prematurely deprecate the hardware and stop providing support. Read more…

MLPerf Training 4.0 – Nvidia Still King; Power and LLM Fine Tuning Added

June 12, 2024

There are really two stories packaged in the most recent MLPerf  Training 4.0 results, released today. The first, of course, is the results. Nvidia (currently Read more…

Highlights from GlobusWorld 2024: The Conference for Reimagining Research IT

June 11, 2024

The Globus user conference, now in its 22nd year, brought together over 180 researchers, system administrators, developers, and IT leaders from 55 top research Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

ASC24 Expert Perspective: Dongarra, Hoefler, Yong Lin

June 7, 2024

One of the great things about being at an ASC (Asia Supercomputer Community) cluster competition is getting the chance to interview various industry experts and Read more…

HPC and Climate: Coastal Hurricanes Around the World Are Intensifying Faster

June 6, 2024

Hurricanes are among the world's most destructive natural hazards. Their environment shapes their ability to deliver damage; conditions like warm ocean waters, Read more…

ASC24: The Battle, The Apps, and The Competitors

June 5, 2024

The ASC24 (Asia Supercomputer Community) Student Cluster Competition was one for the ages. More than 350 university teams worked for months in the preliminary competition to earn one of the 25 final competition slots. The winning teams... Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Atos goes under, HPE will be the only major option for building large-scale systems. Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Leading Solution Providers

Contributors

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

The NASA Black Hole Plunge

May 7, 2024

We have all thought about it. No one has done it, but now, thanks to HPC, we see what it looks like. Hold on to your feet because NASA has released videos of wh Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Intel’s Next-gen Falcon Shores Coming Out in Late 2025 

April 30, 2024

It's a long wait for customers hanging on for Intel's next-generation GPU, Falcon Shores, which will be released in late 2025.  "Then we have a rich, a very Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Intel Plans Falcon Shores 2 GPU Supercomputing Chip for 2026  

August 8, 2023

Intel is planning to onboard a new version of the Falcon Shores chip in 2026, which is code-named Falcon Shores 2. The new product was announced by CEO Pat Gel Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire