June 23, 2020 — The Programming Models group at Barcelona Supercomputing Center (BSC) has published a new release (version 2020.06) of the OmpSs-2 programming model. In this release, we have added several new major features such as a compiler based on LLVM, an integrated tracing tool, and support for OpenACC kernels. Moreover, we have optimized the scheduler infrastructure, the memory allocator and the discrete dependency system to improve performance and scalability of OmpSs-2 applications on many-core systems.
1. LLVM based compiler
In this release we have included, for the very first time, a compiler based on the LLVM compiler infrastructure that will complement the venerable Mercurium source-to-source compiler. This extended LLVM compiler is in a beta stage, but it already supports most of the OmpSs-2 features when targeting the Nanos6 runtime system. Moreover, the LLVM OpenMP runtime distributed with our extended LLVM compiler has been modified to support the TAMPI library that allows a seamless use of non-blocking MPI calls inside OpenMP tasks.
2. Enhanced support for accelerators
This release is the first one to support kernels specified with OpenACC pragmas. To that end, the Mercurium source-to-source compiler and the Nanos6 runtime have been extended to support a subset of the OpenACC pragmas and the PGI runtime API respectively. Moreover, the CUDA device has been refactored to include automatic data prefetching when CUDA Unified Memory is used. This version also includes support for cuBLAS and similar libraries.
3. General performance enhancements
In this release, we have modified the runtime to use the low-level API of jemalloc to improve the performance and scalability of small memory allocations inside the runtime. The CPU manager and scheduling infrastructure has been refactored to improve performance and scalability on many-core systems. The implementation of work-sharing tasks has been modified to exploit better data-locality across task fors instances. Finally, a new turbo variant of the runtime is available. This variant enables some processor floating-point optimizations, as well as, the discrete dependency system.
4. Integrated tracing library
Nanos6 has a new experimental lightweight tracing module that generates traces in the Common Trace Format (CTF). The module is lockless for most common cases and emits a minimalistic set of Nanos6 events with optional PAPI hardware counters support. Future releases will support MPI and Linux Kernel events. Nanos6 converts CTF to Paraver traces automatically, which can be inspected using the provided new set of Paraver configurations.
5. Enhanced implementation of the discrete dependency system
In this release we have extended the lock-free discrete dependency system to support weak, commutative and concurrent dependencies, so now, it already supports all the OmpSs-2 dependency types but regions.
Barcelona Supercomputing Center-Centro Nacional de Supercomputación (BSC-CNS) is the national supercomputing centre in Spain. The center is specialised in high performance computing (HPC) and manage MareNostrum, one of the most powerful supercomputers in Europe, located in the Torre Girona chapel. BSC is involved in a number of projects to design and develop energy efficient and high performance chips, based on open architectures like RISC-V, for use within future exascale supercomputers and other high performance domains. The centre leads the pillar of the European Processor Project (EPI), creating a high performance accelerator based on RISC-V. More information: www.bsc.es
Source: Barcelona Supercomputing Center