April 18, 2022 — Rob Farber is a global technology consultant and author with an extensive background in HPC and in developing machine learning technology that he applies at national laboratories and commercial organizations. The following is an excerpt from his article concerning the ExaIO project which is a part of the Exascale Computing Project (ECP).
As the word exascale implies, the forthcoming generation exascale supercomputer systems will deliver 1018 flop/s of scalable computing capability. All that computing capability will be for naught if the storage hardware and I/O software stack cannot meet the storage needs of applications running at scale—leaving applications either to drown in data when attempting to write to storage or starve while waiting to read data from storage.
Suren Byna, PI of the ExaIO project in the Exascale Computing Project (ECP) and computer staff scientist at Lawrence Berkeley National Laboratory, highlights the need for preparation to address the I/O needs of exascale supercomputers by noting that storage is typically the last subsystem available for testing on these systems. In addressing the I/O needs of many ECP software technology (ST), application development (AD), and hardware integration (HI) projects, Byna observes that the storage focused ExaIO project must prepare now to be ready when these systems enter production. “Success for the ExaIO project means addressing three trends that are becoming a gating factor at the exascale,” Byna said. “(1) too much data being generated, (2) too much data being consumed, and (3) the fact that storage performance is becoming a gating factor for many applications. Further, exascale-capable hardware solutions involve both novel and complex storage and I/O architectures that require enhancing existing I/O libraries. We are addressing these trends and hardware needs in ExaIO via the HDF5 [Hierarchical Data Format version 5] library and UnifyFS.”
Byna emphasized the importance of adapting I/O technologies so they are exascale ready, noting that “Without the funding provided by DOE and ECP to enhance the HDF5 I/O libraries, applications using HDF5 will not be able to take advantage of the novel exascale storage architectures. The funding gives us the ability to develop novel systems (like UnifyFS) that are pushing the I/O technologies into next generation. Byna also reflected on the breadth of technical support that arises from recognition of the general need for performant storage. “We are adding new features to HDF5, a popular data model, file format, and I/O library. The ExaIO team is also developing a new file system, called UnifyFS, for taking advantage of fast storage layers that are distributed across compute nodes in a supercomputing system. The project involves members from Lawrence Berkeley Lab, The HDF Group (THG) who is the main developer and maintainer of HDF5, Argonne National Laboratory, Lawrence Livermore Laboratory, Oak Ridge National Lab, and North Carolina State University.
To read the rest, visit this link.
Source: Rob Farber, Contributing Writer to ECP