Researchers with the Lawrence Berkeley National Laboratory and Cray squeezed more I/O from the NERSC Cray XE6 “Hopper” supercomputer than ever before in a plasma physics simulation involving more than a trillion particles. The work paved the way for even bigger, exascale HDF5 workloads in the future, and yielded a scientific discovery as a bonus.
Researchers ran 10 separate trillion-particle datasets, each ranging from 30 to 42 TB in size, through Hopper, an Opteron-based system that is currently 19th on the Top 500 list. The files were written as HDF5 files on the scratch system, achieving a sustained I/O rate of 27 gigabytes per second.
The simulation–which used about 80 percent of Hopper’s computing resources, 90 percent of the available memory on each node, and 50 percent of the Lustre scratch file system–was the largest ever for a NERSC application, researchers say.
“It is quite a feat when you consider that even the smallest bottleneck in a production I/O stack can degrade performance at scale,” says Prabhat, a researcher in Berkeley Lab’s Scientific Visualization Group and the leader of the ExaHDF5 group, which is working to stretch the I/O capabilities of HDF5.
Prabhat (who goes by one name) says the simulation validates the work of the ExaHDF5 group, which is funded by the Department of Energy’s Office of Advanced Scientific Research.
“If we had attempted a run like this three years ago, I would have been unsure about the level of performance that we could get from HDF5,” Prabhat says in a NERSC story. “But thanks to substantial progress made over the course of the ExaHDF5 project, we are now able to demonstrate that HDF5 can scale to petascale platforms like Hopper and achieve near peak I/O rates.”
Surendra Byna, a research scientist at the Berkeley Lab and lead author of the award-winning paper about the particle simulation job, says the project will lead to better parallel I/O auto-tuning tools and help HDF5 continue to scale. “Our goal with this project was to identify parameters that would make apps of this scale successful for a broad base of science users,” Byna says in the NERSC story.
Hopper ran VPIC, a large-scale plasma physics application that describes how particles behave in magnetic reconnection, which is the physical mechanism involved in the aurora borealis, solar flares, and how gaps develop in Earth’s magnetic field. Thanks to the work, physicists at University of California, San Diego and Los Alamos National Lab were able to discover a power-law distribution in the particles’ energy spectrum. The power-law was expected to exist, but hadn’t yet been validated.