Fusion energy is having a moment – an increasingly fruitful one – but as radio astronomers and particle physicists know, bigger and better experiments and simulations mean data deluges that quickly become difficult to manage. In a paper for the 22nd Smoky Mountains Computational Sciences and Engineering Conference – hosted virtually last year – researchers from General Atomics, Oak Ridge National Laboratory and the University of Virginia outlined their vision for a science gateway to help manage and share fusion data that the authors expect to “substantially balloon in the near future.”
General Atomics operates (on behalf of the Department of Energy) the DIII-D National Fusion Facility in San Diego, where researchers work on the magnetic confinement model of fusion energy. The researchers say that DIII-D experiments themselves have leveraged the write-once PTDATA system for raw experimental data and the MDSplus data management system for analyzed data.
“However,” they write, “on the high-performance computing … simulation side of fusion research, the requirements have been very different. Recent petascale simulations from the community have pushed the limits of machine storage, and presented challenges for data persistence and storage.” The authors elaborate that these capability computing simulations are now complemented by an increasing number of capacity computing fusion simulations, and conclude that “the production of HPC databases, from both capacity and larger capability simulations, will substantially balloon in the near future.”
By way of illustration, the paper points out that one of the key databases for fusion simulation data in the U.S. – NERSC’s CGYRODB database – has just 4TB available, with a typical dataset on the order of 0.1GB. “However,” they again write, “for burning plasmas, these new gyrokinetic databases will need to be vastly expanded to include reactor-relevant effects that increase the computational cost as well as the data output of the first-principles simulations.” A single dataset of that type: 50-100GB. “Thus,” they write, “a complete gyrokinetic database for fusion reactor optimization is expected to require on the order of 1PB of data storage.” And, of course, with tighter and tighter coupling between fusion experiments and HPC-driven analysis/simulation, capacity, speed and reliability requirements will expand even more.
This leads the paper to the authors’ vision: a “DOE-wide long-term data storage system as a science gateway for fusion experimental and HPC databases.” They go on to list the practical requirements for such a database:
- Open access (publicly available to the scientific community)
- Persistent identifier (e.g. digital object identifier)
- Provenance tracking
- Cross-platform accessibility across leadership-computing facilities
- Longevity beyond project/allocation length
- Keyword-attribute filtering
“Such a gateway may also serve the broader DOE scientific community with large projects and extreme datasets,” they add, citing climate science, high-energy physics, astrophysics and other fields.
The paper is titled “A Vision for Coupling Operation of U.S. Fusion Facilities with HPC Systems and the Implications for Workflows and Data Management” and was written by Sterling Smith, Emily Belli, Orso Meneghini, Reuben Budiardja, David Schissel, Jeff Candy, Tom Neiser and Adam Eubanks. To read the paper, click here.