ExaFEL Addresses Need for Exascale Data Analysis Workflow For LCLS at SLAC

June 10, 2022 — The Exascale Computing Project’s (ECP’s) ExaFEL effort aims to help researchers who are making molecular movies using the Linac Coherent Light Source (LCLS). Their exascale data analysis workflow for serial femtosecond crystallography will assist in the observation of the dynamic movement of atoms.[1] The LCLS is located at the SLAC National Accelerator Laboratory and is operated by Stanford University for the US Department of Energy. The project is being built on a prior demonstration project and collaboration between NERSC, ESnet, and SLAC.[2]

LCLS is the world’s first hard, x-ray free electron laser facility, which makes it a superb instrument for observing the dynamics of atomic interactions in a molecular system. This is due in part to the resolving power (e.g., ability to resolve atomic-level detail) of the instrument (x-rays have a much shorter wavelength than visible light) combined with the ultrafast pulse and brightness (also referred to as power) of the laser.[3]

Scientists use ultrafast pulses of the powerful LCLS laser energy to illuminate a carefully prepared sample of some system of interest. The sample can be chosen to elucidate a chemical reaction, how photosynthesis works, the formation of chemical bonds, the acceleration of reactions through catalysis, and more.[4] Data are captured by sensors during each laser pulse and processed by the LCLS workflow to effectively create a stop-motion snapshot of atoms and molecules in the system.^[5] The concept is similar to that of a strobe light, which can be used to illuminate and create the visual appearance of a stop motion image of moving objects. SLAC provides a short video explaining the concept. Unlike capturing a picture with a camera, the LCLS workflow must use computationally expensive x-ray diffraction algorithms to process each x-ray snapshot.

Creating a movie from these x-ray snapshots is computationally challenging because each x-ray pulse destroys the sample. This means that the x-ray snapshots cannot be simply viewed one after the other like what we see when a strobe light illuminates dancers moving on a dance floor.[6] Instead, scientists use sophisticated algorithms that examine large aggregates of x-ray snapshots, in which each snapshot presents a randomly oriented view of the sample, to organize and piece together a molecular movie that captures the dynamics of how the atoms move over time.

The complexity of the algorithms, coupled with the large number of snapshots that must be processed, makes molecular movie generation a very data intensive and computationally expensive task. The scientific benefits are undeniable as the resulting movies provide an invaluable and unique source of experimental observation (some transformative examples are shown here). Scientists study these movies to create and verify or refute hypotheses about the dynamics of atomic behavior in their system of interest. The ability to observe and form hypotheses that are verified or refuted by data is a foundation of the scientific method.

Need for Exascale Computing

Accelerating the LCLS workflow is essential to help scientists by providing results while their experiment is running so they collect the best data during their use of LCLS. Real time results give experimentalists the opportunity to make adjustments and gather better, more informative data. The result is better science and utilization of the instrument.

The need for performance is vital to processing data from the LCLS-II upgrade because the laser can be programmed to operate at 1 million pulses per second compared to the 120 per second pulse rate of the current LCLS laser. [7] [8] The faster pulse rate will generate orders of magnitude more data that must be processed quickly. Exascale supercomputing hardware provides the necessary network and computing capability to handle the massive increase in data produced by the LCLS-II sensors. Amedeo Perazzo, ExaFEL PI and Controls and Data Systems Division director at the SLAC National Accelerator Laboratory, notes, “Both now and in the future, fast turnaround is necessary so scientists can make the best use of their time at LCLS and are not flying blind.”

Rethinking the Current Workflow

Adapting the current tools so they can run on the forthcoming exascale hardware requires innovative thinking and new approaches.

Perazzo notes that the ExaFEL team must consider new algorithms and computing frameworks to leverage GPUs and other high-performance capabilities in the forthcoming US exascale supercomputers. These new approaches mean the team must replace and/or augment existing CPU-only algorithms and computing frameworks. The expanded capability afforded by GPU-accelerated machines along with new AI technology enable the team to explore new approaches that can increase the resolution of the computed results and ultimately improve the quality of the movies viewed by scientists.

Creation of Snapshots

GPUs are instrumental in generating diffraction patterns of multiple conformations of a protein sample to account for beam fluctuations, parasitic beamline scattering, and detector noise. These simulated images will be leveraged for characterizing the performance of the new algorithms under realistic conditions while the team waits for large datasets to be produced by future LCLS-II experiments.

Making Molecular Movies

Chuck Yoon, Advanced Methods for Analysis Group lead at the SLAC National Accelerator Laboratory, observes, “We want to sample an ensemble set of experiments from the initial state to their final state of the system. This requires sophisticated and established algorithms to reconstruct the pathway.” He notes that making movies of molecular systems can require processing data collected from very short to very long timeframes on the order of femtoseconds (10⁻¹⁵ second or 1 quadrillionth of a second) to minutes owing to the orders-of-magnitude variation in the reactions’ timescales. Many snapshots must be taken to capture a few fleeting moments when some of the most interesting conformational changes occur. Figure 1 illustrates the order-of-magnitude variation in the timescale for a spectrum of important reactions being studied with LCLS. In addition to improving performance, Yoon notes, “the team is looking to use AI and GPU technology to create and establish new higher-resolution algorithms that can run in the desired timeframe.”

To read the full version of Ron Farber’s technical highlight, visit this link.

Source: Rob Farber, contributing writer for ECP