OAK RIDGE, Tenn., Feb. 23 — The Department of Energy’s Oak Ridge National Laboratory has announced the latest release of its Adaptable I/O System (ADIOS), a middleware that speeds up scientific simulations on parallel computing resources such as the laboratory’s Titan supercomputer by making input/output operations more efficient.
While ADIOS has long been used by researchers to streamline file reading and writing in their applications, the production of data in scientific computing is growing faster than I/O can handle. Reducing data “on the fly” is critical to keep I/O up to speed with today’s largest scientific simulations and realize the full potential of resources such as Titan to make real-world scientific breakthroughs. And it’s also a key feature in the latest ADIOS release.
“As we approach the exascale, there are many challenges for ADIOS and I/O in general,” said Scott Klasky, scientific data group leader in ORNL’s Computer Science and Mathematics Division. “We must reduce the amount of data being processed and program for new architectures. We also must make our I/O frameworks interoperable with one another, and version 1.11 is the first step in that direction.”
The upgrade boasts a number of new improvements aimed at ensuring these challenges are met, including
- a simplified write application programming interface (API) that reduces complexity via introduction of a novel buffering technique;
- lossy compression with ZFP, a software from Peter Lindstrom at Lawrence Livermore National Laboratory, that reduces the size of data on storage;
- a query API with multiple indexing/query methods, from John Wu at Lawrence Berkeley National Laboratory and Nagiza Samatova of North Carolina State University;
- a “bprecover” utility for resilience that exploits the ADIOS file format’s multiple copies of metadata;
- in-memory time aggregation for file-based output, allowing for efficient I/O with difficult write patterns;
- novel Titan-scale-supported staging from Manish Parashar at Rutgers University; and
- a laundry list of various other performance improvements.
These modifications represent the latest evolution in ADIOS’s journey from research to production, as version 1.11 now makes it easier to move data from one code to another. ADIOS’s user base has gone from just a single code to hundreds of parallel applications spread across dozens of domain areas.
“ADIOS has been a vital part of our large-scale XGC fusion code,” said Choong-Seock Chang, head of the Center for Edge Physics Simulation at Princeton Plasma Physics Laboratory. “With the continuous version updates, the performance of XGC keeps getting better; during one of our most recent ITER runs, we were able to further accelerate the I/O, which enabled new insights into our scientific results.”
ADIOS’s success in the scientific community has led to its adoption among several industrial applications seeking more efficient I/O. Demand for ADIOS has grown sufficiently so that the development team is now partnering with Kitware, a world leader in data visualization infrastructure, to construct a data framework for the scientific community that will further the efficient location and reduction of data plaguing parallel scientific computing and likely further grow ADIOS’s user base.
Throughout its evolution, ADIOS’s development team has ensured that the middleware remains fast, concurrent, scalable, portable, and perhaps most of all, resilient (the bprecover feature in 1.11 that allows for the recovery of uncorrupted data). According to Klasky, being part of the DOE national lab system was critical to ensuring the scalability of the ever-growing platform, an asset that will remain critical as ORNL moves towards the exascale.
Because exascale hardware is widely expected to be disruptive, particularly in terms of incredibly fast nodes that will make it difficult for networks and I/O to keep up, researchers are preparing now for the daunting I/O challenge to come.
ADIOS was one of four ORNL-led software development projects to receive funding from the Exascale Computing Project, a collaborative effort between the DOE’s Office of Science and the National Nuclear Security Administration to develop a capable exascale ecosystem, encompassing applications, system software, hardware technologies and architectures, and workforce to meet the scientific and national security mission needs of DOE in the mid-2020 timeframe.
The award is a testament to ADIOS’s ability in making newer technologies sustainable, usable, fast, and interoperable – so that they will all be able to read from and possibly write to other important file formats.
As the journey to exascale continues, ADIOS’s unique I/O capabilities will be necessary to ensure that the world’s most powerful computers, and the applications they host, can continue to facilitate scientific breakthroughs impossible through experimentation alone.
“With ADIOS we saw a 20-fold increase in I/O performance compared to our best previous solution,” said Michael Bussmann, a junior group leader in computational radiation physics at Helmholtz-Zentrum Dresden-Rossendorf. “This made it possible to take full snapshots of the simulation, enabling us to study our laser-driven particle accelerator from the single-particle level to the full system. It is a game changer, going from 20 minutes to below one minute for a snapshot.”
The Titan supercomputer is part of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility.
ORNL is managed by UT-Battelle for DOE’s Office of Science. DOE’s Office of Science is the single largest supporter of basic research in the physical sciences in the United States, and is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.