High-Performance Computing Cuts Particle Collision Data Prep Time

Nov. 29, 2017 — For the first time, scientists have used high-performance computing (HPC) to reconstruct the data collected by a nuclear physics experiment—an advance that could dramatically reduce the time it takes to make detailed data available for scientific discoveries.

The demonstration project used the Cori supercomputer at the National Energy Research Scientific Computing Center (NERSC), a high-performance computing center at Lawrence Berkeley National Laboratory in California, to reconstruct multiple datasets collected by the STAR detector during particle collisions at the Relativistic Heavy Ion Collider (RHIC), a nuclear physics research facility at Brookhaven National Laboratory in New York. By running multiple computing jobs simultaneously on the allotted supercomputing cores, the team transformed 4.73 petabytes of raw data into 2.45 petabytes of “physics-ready” data in a fraction of the time it would have taken using in-house high-throughput computing resources, even with a two-way transcontinental data journey.

“The reason why this is really fantastic,” said Brookhaven physicist Jérôme Lauret, who manages STAR’s computing needs, “is that these high-performance computing resources are elastic. You can call to reserve a large allotment of computing power when you need it—for example, just before a big conference when physicists are in a rush to present new results.” According to Lauret, preparing raw data for analysis typically takes many months, making it nearly impossible to provide such short-term responsiveness. “But with HPC, perhaps you could condense that many months production time into a week. That would really empower the scientists!”

The accomplishment showcases the synergistic capabilities of RHIC and NERSC—U.S. Department of Energy (DOE) Office of Science User Facilities located at DOE-run national laboratories on opposite coasts—connected by one of the most extensive high-performance data-sharing networks in the world, DOE’s Energy Sciences Network (ESnet), another DOE Office of Science User Facility.

“This is a key usage model of high-performance computing for experimental data, demonstrating that researchers can get their raw data processing or simulation campaigns done in a few days or weeks at a critical time instead of spreading out over months on their own dedicated resources,” said Jeff Porter, a member of the data and analytics services team at NERSC.

Billions of data points

To make physics discoveries at RHIC, scientists must sort through hundreds of millions of collisions between ions accelerated to very high energy. STAR, a sophisticated, house-sized electronic instrument, records the subatomic debris streaming from these particle smashups. In the most energetic events, many thousands of particles strike detector components, producing firework-like displays of colorful particle tracks. But to figure out what these complex signals mean, and what they can tell us about the intriguing form of matter created in RHIC’s collisions, scientists need detailed descriptions of all the particles and the conditions under which they were produced. They must also compare huge statistical samples from many different types of collision events.

Cataloging that information requires sophisticated algorithms and pattern recognition software to combine signals from the various readout electronics, and a seamless way to match that data with records of collision conditions. All the information must then be packaged in a way that physicists can use for their analyses.

Since RHIC started running in the year 2000, this raw data processing, or reconstruction, has been carried out on dedicated computing resources at the RHIC and ATLAS Computing Facility (RACF) at Brookhaven. High-throughput computing (HTC) clusters crunch the data, event-by-event, and write out the coded details of each collision to a centralized mass storage space accessible to STAR physicists around the world.

But the challenge of keeping up with the data has grown with RHIC’s ever-improving collision rates and as new detector components have been added. In recent years, STAR’s annual raw data sets have reached billions of events with data sizes in the multi-Petabyte range. So the STAR computing team investigated the use of external resources to meet the demand for timely access to physics-ready data.

To read the full article, follow this link: https://www.bnl.gov/newsroom/news.php?a=212581

Source: Brookhaven National Laboratory