In this regular feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here.
Parallel space-time likelihood optimization for air pollution prediction on large-scale systems
A team of researchers from the extreme computing research center at the King Abdullah University of Science and Technology in Saudi Arabia present “a parallel implementation of geostatistical space-time modeling that can predict air pollution using observations in a specific space-time domain, illustrating the importance of relaxing the assumption of independence of space and time.” In this conference paper for the Platform for Advanced Scientific Computing Conference, the researchers “use the proposed implementation to model two air pollution datasets from the Middle East and US regions with 550 spatial locations ×730 time slots and 945 spatial locations ×500 time slots, respectively.” They demonstrated that the “approach satisfies high prediction accuracy on both synthetic datasets and real particulate matter (PM) datasets in the context of the air pollution problem.” In addition, they achieved “up to 757.16 TFLOPS/s using 1024 nodes (75% of the peak performance) using 490 geospatial locations on a Cray XC40 system.”
Authors: Mary Lai O. Salvaña, Sameh Abdulah, Hatem Ltaief, Ying Sun, Marc G. Genton, and David E. Keyes
nOS-V: co-executing HPC applications using system-wide task scheduling
Spanish researchers from the Barcelona Supercomputing Center believe the future of exascale supercomputers is one of massive parallelism, manycore processors and heterogeneous architectures. In cases like these, “it is increasingly difficult for HPC applications to fully and efficiently utilize the resources in system nodes. Moreover, the increased parallelism exacerbates the effects of existing inefficiencies in current applications,” they write. To address the problem, the researchers introduce “nOS-V, a lightweight tasking library that supports application co-execution using node-wide scheduling.” Co-execution is “a novel fine-grained technique to execute multiple HPC applications simultaneously on the same node, outperforming current state-of-the-art approaches.” The authors demonstrated “how co-execution with nOS-V significantly reduces schedule makespan for several applications on single node and distributed environments, outperforming prior node-sharing techniques.”
Authors: David Álvarez, Kevin Sala, and Vicenç Beltran
A multi-institutional team of researchers from the City College of the City University of New York and Argonne National Laboratory describes a proxy application, IMEXLBM, developed for the Exascale Proxy Applications Project. The Project was created within the Exascale Computing Project (ECP) to “improve the quality of proxies created by the ECP, provide small, simplified codes which share important features of large applications, and capture programming methods and styles that drive requirements for compilers and other elements of the toolchain.” IMEXLBM is “an open-source, self-contained code unit, with minimal dependencies, that is capable of running on heterogeneous platforms like those with graphic processing units for accelerating the calculation.” Using the ThetaGPU machine at the Argonne Leadership Computing Facility, researchers demonstrated the code’s “functionality by solving a benchmark problem in computational fluid dynamics.” In addition, the authors point out that the “the code-unit is designed to be versatile and enable new physical models that can capture complex phenomena such as two-phase flow with interface capture.”
Authors: Geng Liu, Saumil Patel, Ramesh Balakrishnan, and Taehun Lee
Simulation-based optimization and sensibility analysis of MPI applications: variability matters
In this paper, Tom Cornebize Arnaud Legrand from the University Grenoble Alpes in France argues that “finely tuning MPI applications and understanding the influence of key parameters (number of processes, granularity, collective operation algorithms, virtual topology, and process placement) is critical to obtain good performance on supercomputers.” The researchers present in this paper “an extensive validation study which covers the whole parameter space of High-Performance Linpack.” Performing all experiments using the Dahu Cluster from the Grid’5000 testbed, the researchers demonstrate “how the open-source version of HPL can be slightly modified to allow a fast emulation on a single commodity server at the scale of a supercomputer.” In addition, they show “an extensive (in)validation study that compares simulation with real experiments and demonstrates our ability to predict the performance of HPL within a few percent consistently.” Lastly, they demonstrate their ‘surrogate’ “allows studying several subtle HPL parameter optimization problems while accounting for uncertainty on the platform.”
Authors: Tom Cornebize and Arnaud Legrandz
Lifetime-based method for quantum simulation on a new Sunway supercomputer
A multi-institutional team of researchers from the National Supercomputing Center Wuxi, Tsinghua University, Zhejiang Lab, Shanghai Research Center for Quantum Sciences, and the Information Engineering University in Zhengzhou, China, introduced “lifetime-based methods to reduce the slicing overhead and improve the computing efficiency.” Researchers demonstrated that the “in place slicing strategy reduces the slicing overhead to less than 1.2 and obtains 100-200 times speedups over related efforts. The resulting simulation time is reduced from 304s (2021 Gordon Bell Prize) to 149.2s on Sycamore RQC, with a sustainable mixed precision performance of 416.5 Pflops using over 41M cores to simulate 1M correlated samples.”
Authors: Yaojian Chen, Yong Liu, Xinmin Shi, Jiawei Song, Xin Liu, Lin Gan, Chu Guo, Haohuan Fu, Dexun Chen, and Guangwen Yang
cuHARM: a new GPU accelerated GR-MHD code and its application to ADAF disks
“A new GPU-accelerated general-relativistic magneto-hydrodynamic (GR-MHD) code based on HARM” is introduced by a multi-institutional team of researchers from Bar Ilan University in Israel, the School of Astronomy and Space Science at the Nanjing University in China, the Key Laboratory of Modern Astronomy and Astrophysics at the Nanjing University in China. cuHARM is “code is written in CUDA-C and uses OpenMP to parallelize multi-GPU setups. Researchers tout that “a 2563 simulation is well within the reach of an Nvidia DGX-V100 server, with the computation being a factor about 10 times faster if only the CPU was used.” Using this code, researchers “examine several disk structures all in the ‘Standard And Normal Evolution’ state.” The results of their experiments found that “(i) increasing the magnetic field, while in the SANE state does not affect the mass accretion rate; (ii) Simultaneous increase of the disk size and the magnetic field, while keeping the ratio of energies fixed, lead to the destruction of the jet once the magnetic flux through the horizon decrease below a certain limit… [and] (iii) the structure of the jet is a weak function of the adiabatic index of the gas, with relativistic gas tend to have a wider jet.”
Authors: Damien Bégué, Asaf Pe’er, Guoqiang Zhang, BinBin Zhang, Benjamin Pevzner
A multi-institutional team of researchers from the University of Minho and the Institute for Systems and Computer Engineering, Technology and Science in Portugal, the University of the West of England in the UK, and the Texas Advanced Computing Center University of Texas at Austin in Texas, are working towards the goal of developing “a fully quantum rendering system.” In this preprint, the researchers investigate “hybrid quantum-classical algorithms for ray tracing, a core component of most rendering techniques.” The authors “propose algorithms to significantly reduce the computation required for quantum ray tracing through exploiting image space coherence and a principled termination criteria for quantum searching.” They demonstrate “results for both Whitted style ray tracing, and for accelerating ray tracing operations when performing classical Monte Carlo integration for area lights and indirect illumination.”
Authors: Luís Paulo Santos, Thomas Bashford-Rogers, João Barbosa, Paul Navrátil
Do you know about research that should be included in next month’s list? If so, send us an email at [email protected]. We look forward to hearing from you.