SC13 Research Highlight: Extreme Scale Plasma Turbulence Simulation

By Bei Wang, Stephane Ethier & William Tang

November 16, 2013

As the global energy economy makes the transition from fossil fuels toward cleaner alternatives, fusion becomes an attractive potential solution for satisfying the growing needs. Fusion energy, which is the power source for the sun, can be generated on earth, for example, in magnetically-confined laboratory plasma experiments (called “tokamaks”) when the isotopes of hydrogen (e.g., deuterium and tritium) combine to produce an energetic helium “alpha” particle and a fast neutron – with an overall energy multiplication factor of 450:1.

Building the scientific foundations needed to develop fusion power demands high-physics-fidelity predictive simulation capability for magnetically-confined fusion energy (MFE) plasmas. To do so in a timely way requires utilizing the power of modern supercomputers to simulate the complex dynamics governing MFE systems — including ITER, a multi-billion dollar international burning plasma experiment supported by 7 governments representing over half of the world’s population.

Unavoidable spatial variations in such systems produce microturbulence which can significantly increase the transport rate of heat, particles, and momentum across the confining magnetic field in tokamak devices.  Since the balance between these energy losses and the self-heating rates of the actual fusion reactions will ultimately determine the size and cost of an actual fusion reactor, understanding and possibly controlling the underlying physical processes is key to achieving the efficiency needed to help ensure the practicality of future fusion reactors.

The goal here is to gain new physics insights on MFE confinement scaling by making effective use of powerful world-class supercomputing systems such as the IBM Blue-Gene-Q “Mira” at the Argonne Leadership Class Facility (ALCF). Associated knowledge gained addresses the key question of how turbulent transport and associated confinement characteristics scale from present generation devices to the much larger ITER-scale plasmas. This involves the development of modern software capable of using leadership class supercomputers to carry out reliable first principles-based simulations of multi-scale tokamak plasmas.  The fusion physics challenge here is that the key decade-long MFE estimates of confinement scaling with device size (the so-called “Bohm to Gyro-Bohm” “rollover” trend caused by the ion temperature gradient instability) demands much higher resolution to be realistic/reliable.  Our important new fusion physics finding is that this “rollover” is much more gradual than established earlier in far lower resolution, shorter duration studies with magnitude of transport now reduced by a factor of two.

The basic particle method has long been a well established approach that simulates the behavior of charged particles interacting with each other through pair-wise electromagnetic forces.  At each time step, the particle properties are updated according to these calculated forces.  For applications on powerful modern supercomputers with deep cache hierarchy, a pure particle method is very efficient with respect to locality and arithmetic intensity (compute bound). Unfortunately, the O(N2 ) complexity makes a particle method impractical for plasma simulations using millions of particles per process.  Rather than calculating O(N2) forces, the particle-in-cell (PIC) method, which was introduced by J. Dawson and N. Birdsall in 1968, employs a grid as the media to calculate the long range electromagnetic forces.  This reduces the complexity from O(N2) to O(N+MlogM), where M is the number of grid points and is usually much smaller than N.  Specifically, the PIC simulations are being carried out using “macro” particles (~103 times the radius of a real charged ion particle) with characteristic properties, including position, velocity and weight.  However, achieving high parallel and architectural efficiency is very challenging for a PIC method due to potential fine-grained data hazards, irregular data access, and low arithmetic intensity.  The issue gets more severe as the HPC community moves into the future to address even more radical changes in computer architectures as the multicore and manycore revolution progresses.

Machines such as the IBM BG/Q Mira demand at least 49,152-way MPI parallelism and up to 3 million-way thread-level parallelism in order to fully utilize the system. While distributing particles to at least 49,152 processes is straightforward, the distribution of a 3D torus-shape grid among those processes is non-trivial. For example, first consider the 3D torus as being decomposed into sub-domains of uniform volume.  In a circular geometry, the sub-domains close to the edge of the system will contain more grid points than the core. This leads to potential load imbalance issues for the associated grid-based work.

Through a close collaboration with the Future Technologies Group at the Lawrence Berkeley National Laboratory, we have developed and optimized a new version of the Gyrokinetic Toroidal Code (“GTC-Princeton” or “GTC-P”) to address the challenges in the PIC method for leadership-class systems in the multicore/manycore regime.  GTC-P includes multiple levels of parallelism, a 2D domain decomposition, a particle decomposition, and a loop level parallelism implemented with OpenMP – all of which help enable this state-of-the-art PIC code to efficiently scale to the full capability of the largest extreme scale HPC systems currently available. Special attention has been paid to the load imbalance issue associated with domain decomposition. To improve single node performance, we select a “structure-of-arrays” (SOA) data layout for particle data, align memory allocation to facilitate SIMD intrinsic, binning particles to improve locality, and use loop fusion to improve arithmetic intensity. We also manually flatten irregular nested loop to expose more parallelization to OpenMP threads. GTC-P features a two-dimensional topology for point-to-point communication. On the IBM BG/Q system with 5D torus network, we have optimized communication with customized process mapping. Data parallelism is also being continuously exploited through SIMD intrinsics (e.g., QPX intrinsics on IBM BG/Q) and by improving data movement through software pre-fetching.

Simulations of confinement physics for large-scale MFE plasmas have been carried out for the first time with very high phase-space resolution and long temporal duration to deliver important new scientific insights. This was enabled by the new “GTC-P” code which was developed to use multi-petascale capabilities on world-class systems such as the IBM BG-Q  “Mira” @ ALCF  and also “Sequoia” @ LLNL.  (Accomplishments are summarized in the two figures below.)


Figure 1:  Modern GTC-Princeton (GTC-P) Code Performance on World-Class IBM BG-Q Systems


Figure 2:  Important new scientific discoveries enabled by harnessing modern supercomputing capabilities at extreme scale

The success of these projects were greatly facilitated by the fact that true interdisciplinary collaborative effort with Computer Science and Applied Math scientists have produced modern C and CUDA versions of the key HPC code (originally written — as in the case of the vast majority of codes in the FES application domain) in Fortran-90.  The demonstrated capability to run at scale on the largest open-science IBM BG-Q system (“Mira” at the ALCF) opened the door to obtain access to NNSA’s “Sequoia” system at LLNL – which then produced the outstanding results shown on Figure 1.  More recently, excellent performance of the GPU-version of GTC-P has been demonstrated on the “Titan” system at the Oak Ridge Leadership Class Facility (OLCF).  Finally, the G8-sponsored international R&D advances have enabled this project to gain collaborative access to a number of the top international supercomputing facilities — including the Fujitsu K Computer, Japan’s #1 supercomputer.   In addition, these highly visible accomplishments have very recently enabled this project to begin collaborative applications on China’s new Tianhe-2 (TH-2) Intel-MIC-based system – the #1 supercomputing system worldwide.

RESEARCH TEAM:  Bei Wang (Princeton U), Stephane Ethier (PPPL), William Tang (Princeton U/PPPL), K. Ibrahim, S. Williams, L. Oliker (LBNL), K. Madduri (Penn State U), Tim Williams (ANL)

Link to SC13 conference:

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Hyperion: HPC Server Market Ekes 1 Percent Gain in 2020, Storage Poised for ‘Tipping Point’

May 12, 2021

The HPC User Forum meeting taking place virtually this week (May 11-13) kicked off with Hyperion Research’s market update, covering the 2020 period. Although the HPC server market had been facing a 6.7 percent COVID-re Read more…

Finland’s CSC Chronicles the COVID Research Performed on Its ‘Puhti’ Supercomputer

May 11, 2021

CSC, Finland’s IT Center for Science, is home to a variety of computing resources, including the 1.7 petaflops Puhti supercomputer. The 682-node, Intel Cascade Lake-powered system, which places about halfway down the T Read more…

IBM Debuts Qiskit Runtime for Quantum Computing; Reports Dramatic Speed-up

May 11, 2021

In conjunction with its virtual Think event, IBM today introduced an enhanced Qiskit Runtime Software for quantum computing, which it says demonstrated 120x speedup in simulating molecules. Qiskit is IBM’s quantum soft Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base clock of 2.0GHz – implemented in HPE's single-socket ProLian Read more…

Supercomputer Research Tracks the Loss of the World’s Glaciers

May 7, 2021

British Columbia – which is over twice the size of California – contains around 17,000 glaciers that cover three percent of its landmass. These glaciers are crucial for the Canadian province, which relies on its many Read more…

AWS Solution Channel

FLYING WHALES runs CFD workloads 15 times faster on AWS

FLYING WHALES is a French startup that is developing a 60-ton payload cargo airship for the heavy lift and outsize cargo market. The project was born out of France’s ambition to provide efficient, environmentally friendly transportation for collecting wood in remote areas. Read more…

Meet Dell’s Pete Manca, an HPCwire Person to Watch in 2021

May 7, 2021

Pete Manca heads up Dell's newly formed HPC and AI leadership group. As senior vice president of the integrated solutions engineering team, he is focused on custom design, technology alliances, high-performance computing Read more…

Hyperion: HPC Server Market Ekes 1 Percent Gain in 2020, Storage Poised for ‘Tipping Point’

May 12, 2021

The HPC User Forum meeting taking place virtually this week (May 11-13) kicked off with Hyperion Research’s market update, covering the 2020 period. Although Read more…

IBM Debuts Qiskit Runtime for Quantum Computing; Reports Dramatic Speed-up

May 11, 2021

In conjunction with its virtual Think event, IBM today introduced an enhanced Qiskit Runtime Software for quantum computing, which it says demonstrated 120x spe Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Fast Pass Through (Some of) the Quantum Landscape with ORNL’s Raphael Pooser

May 7, 2021

In a rather remarkable way, and despite the frequent hype, the behind-the-scenes work of developing quantum computing has dramatically accelerated in the past f Read more…

IBM Research Debuts 2nm Test Chip with 50 Billion Transistors

May 6, 2021

IBM Research today announced the successful prototyping of the world's first 2 nanometer chip, fabricated with silicon nanosheet technology on a standard 300mm Read more…

LRZ Announces New Phase of SuperMUC-NG Supercomputer with Intel’s ‘Ponte Vecchio’ GPU

May 5, 2021

At the Leibniz Supercomputing Centre (LRZ) in München, Germany – one of the constituent centers of the Gauss Centre for Supercomputing (GCS) – the SuperMUC Read more…

Crystal Ball Gazing at Nvidia: R&D Chief Bill Dally Talks Targets and Approach

May 4, 2021

There’s no quibbling with Nvidia’s success. Entrenched atop the GPU market, Nvidia has ridden its own inventiveness and growing demand for accelerated computing to meet the needs of HPC and AI. Recently it embarked on an ambitious expansion by acquiring Mellanox (interconnect)... Read more…

Intel Invests $3.5 Billion in New Mexico Fab to Focus on Foveros Packaging Technology

May 3, 2021

Intel announced it is investing $3.5 billion in its Rio Rancho, New Mexico, facility to support its advanced 3D manufacturing and packaging technology, Foveros. Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from I Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

CERN Is Betting Big on Exascale

April 1, 2021

The European Organization for Nuclear Research (CERN) involves 23 countries, 15,000 researchers, billions of dollars a year, and the biggest machine in the worl Read more…

HPE Launches Storage Line Loaded with IBM’s Spectrum Scale File System

April 6, 2021

HPE today launched a new family of storage solutions bundled with IBM’s Spectrum Scale Erasure Code Edition parallel file system (description below) and featu Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

Quantum Computer Start-up IonQ Plans IPO via SPAC

March 8, 2021

IonQ, a Maryland-based quantum computing start-up working with ion trap technology, plans to go public via a Special Purpose Acquisition Company (SPAC) merger a Read more…

Leading Solution Providers


Can Deep Learning Replace Numerical Weather Prediction?

March 3, 2021

Numerical weather prediction (NWP) is a mainstay of supercomputing. Some of the first applications of the first supercomputers dealt with climate modeling, and Read more…

AMD Launches Epyc ‘Milan’ with 19 SKUs for HPC, Enterprise and Hyperscale

March 15, 2021

At a virtual launch event held today (Monday), AMD revealed its third-generation Epyc “Milan” CPU lineup: a set of 19 SKUs -- including the flagship 64-core, 280-watt 7763 part --  aimed at HPC, enterprise and cloud workloads. Notably, the third-gen Epyc Milan chips achieve 19 percent... Read more…

Livermore’s El Capitan Supercomputer to Debut HPE ‘Rabbit’ Near Node Local Storage

February 18, 2021

A near node local storage innovation called Rabbit factored heavily into Lawrence Livermore National Laboratory’s decision to select Cray’s proposal for its CORAL-2 machine, the lab’s first exascale-class supercomputer, El Capitan. Details of this new storage technology were revealed... Read more…

African Supercomputing Center Inaugurates ‘Toubkal,’ Most Powerful Supercomputer on the Continent

February 25, 2021

Historically, Africa hasn’t exactly been synonymous with supercomputing. There are only a handful of supercomputers on the continent, with few ranking on the Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized in his keynote, Nvidia doesn’t plan to build... Read more…

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

The History of Supercomputing vs. COVID-19

March 9, 2021

The COVID-19 pandemic poses a greater challenge to the high-performance computing community than any before. HPCwire's coverage of the supercomputing response t Read more…

HPE Names Justin Hotard New HPC Chief as Pete Ungaro Departs

March 2, 2021

HPE CEO Antonio Neri announced today (March 2, 2021) the appointment of Justin Hotard as general manager of HPC, mission critical solutions and labs, effective Read more…

  • arrow
  • Click Here for More Headlines
  • arrow