Software implementation in high-performance computing is getting more fragmented as organizations opt for tools in their walled garden environments.
However, a new organization formed under the Linux Foundation could bring some order to the chaos.
The non-profit at Supercomputing 2023 announced its intent to create the High-Performance Software Foundation (HPSF), which will encourage developing and sharing development tools for massive computing resources.
Public-private participation should also boost software innovation through collaboration.
U.S. national labs that are part of the Department of Energy’s Exascale Computing Project is joining the project and will make contributions, said Lori Diachin, project director at the DoE.
The private sector members include Intel, Kitware, and Nvidia, all big players in the HPC market.
HPC began in the 1940s and has fragmented over time due to limited access to computing resources because of security concerns, researchers from Lawrence Livermore National Laboratory and the National Center for Supercomputing Applications said in a research paper published this year.
Government organizations working on applications related to national security developed their systems to limit access and protect sensitive information, the researchers said.
But things started changing with social developer sites like Github and Gitlab when coders started sharing resources.
HPC is also shifting toward accelerated computing, which adds more software layers to the development process. Developers download code from repositories but have guardrails to ensure the programs are safe to use.
The national labs include a continuous integration model where code suggestions are tested before addition to the development cycle. Labs also typically own proprietary applications that they do not want to share.
Labs are already providing many open-source tools for HPC. However, development environments are getting complicated, with various accelerators being added to HPC systems.
For example, the upcoming Jupiter exascale supercomputer in Europe may include quantum systems alongside Nvidia GPU accelerators. There are tools to seamlessly break down the code for execution between the processors but adds new layers with separate libraries and compilers.
Nvidia’s GPUs require proprietary no-cost CUDA tools and compilers to create binaries that harness the full computing power of its GPUs.
The typical HPC software development cycle starts with the application, which then moves to libraries and is broken down to the infrastructure layer (such as Docker). It then goes to the compiler/toolchain (LLVM, GCC, or OneAPI) and the OS (Linux) and then finally reaches the hardware systems and accelerators, which could include GPUs or FPGAs.
An ARES multi-physics codebase in Lawrence Livermore National Lab (LLNL) has 31 internal proprietary packages, of which 13 are open-source packages developed at LLNL. These rely on 72 external open-source software packages.
The added layers of hardware, compilers, and other tools create a matrix of complicated software dependencies, which becomes hard to audit. That could add many vulnerabilities that national labs want to keep closed, mainly to maintain code integrity and protect system access from malicious code.
“Technical, security, and political issues all make it extremely difficult to integrate externally developed open-source software with internal applications and machines. Even though many HPC software projects are developed in the open, they must run on closed HPC resources, and it is increasingly difficult to ensure that the vast majority of modern open-source applications will run reliably on HPC systems,” the LLNL and NCSA researchers said.
HPSF is looking to solve that problem, and create a common and stable open-source computing environment that can be reliably used across HPC computing environments.
The HPSF will officially form in May 2024. The open software packages that will be part of the project include Spack, the popular package manager, Kokkos, AMReX, WarpX, TrilinosApptainer, VTK-m, HPCToolkit; and E4S, which is the Extreme Scale Software Stack.
HPSF aims to standardize the open software stack, providing an easier path to deploying software packages.