The three finalists for this year’s Gordon Bell Prize in High Performance Computing have been announced. They include two papers on projects run on China’s Sunway TaihuLight system and a third paper on 3D image reconstruction from a group of Purdue University researchers. The award carries a $10,000 award and will be made on Thursday, November 16, during SC2017 being held in Denver.
“The Gordon Bell Prize recognizes the extraordinary progress made each year in the innovative application of parallel computing to challenges in science, engineering, and large-scale data analytics. Prizes may be awarded for peak performance or special achievements in scalability and time-to-solution on important science and engineering problems. Financial support of the $10,000 prize is made possible by Gordon Bell, a pioneer in high-performance and parallel computing and past winner of the IEEE Seymour Cray Award for his exceptional contributions in the design of several computer systems that changed the world of high performance computing.”
Here’s a brief description of the finalist papers:
Redesigning CAM-SE for Petascale Climate Modeling Performance on Sunway TaihuLight
Description: We refactor and optimize the entire Community Atmosphere Model (CAM) to the full system of the Sunway TaihuLight, and provide a petascale climate modeling performance. We scale the CAM to 1.5 million cores with a simulation speed of 2.81 simulated years per day using OpenACC directives at the first stage. We then apply a more aggressive and challenging finer-grained redesign of the HOMME dynamical core, to achieve finer memory control, more efficient vectorization and overlap between computation and communication. Besides, a register communication based parallelism scheme is proposed to minimize the data dependencies in the modules. By doing so, our optimized kernels running on a 260-core Sunway processor outperform the established HOMME kernels on a platform with up to 184 Intel Xeon E5-2680V3 CPU cores. And our implementation has achieved a sustainable double-precision performance around 2.5 Pflops for a 0.75 km global simulation when using 8,519,680 cores
Authors: Haohuan Fu, Junfeng Liao, Nan Ding, Xiaohui Duan, Lin Gan, Yishuang Liang, Xinliang Wang, Jinzhe Yang, Yan Zheng, Weiguo Liu, Lanning Wang, Guangwen Yang
15-Pflops Nonlinear Earthquake Simulation on Sunway TaihuLight: Enabling Depiction of Realistic 10 Hz Scenarios
Description: This paper reports our work on building a highly efficient earthquake simulation platform on Sunway TaihuLight, with 125 Pflops computing power and over 10 million cores. With the platform originated from AWP-ODC and CG-FDM, a large part of our efforts focuses on redesigning the velocity, stress, and plasticity processing kernels for the completely different microarchitecture and significantly increased parallelism of Sunway TaihuLight. By a combined approach including (1) an optimized parallelization scheme, (2) the most suitable blocking configuration, (3) fusion of co-located arrays, (4) register communication with CPE ID remapping for halo exchanges, and (5) customized ROM-less evaluation of elementary function, we manage to achieve an efficient utilization of over 12.2% of the theoretical peak of the entire system. Our program provides a sustained performance of over 15 Pflops and enables the simulation of the Tangshan earthquake with a spatial resolution of 25 m and a frequency of 10 Hz.
Authors: Haohuan Fu, Conghui He, Bingwei Chen, Zekun Yin, Zhenguo Zhang, Wenqiang Zhang, Tingjian Zhang, Wei Xue, Weiguo Liu, Wanwang Yin, Guangwen Yang, Xiaofei Chen
Massively Parallel 3D Image Reconstruction
Description: Computed Tomography (CT) image reconstruction is an important technique used in a wide range of applications. Among reconstruction methods, Model-Based Iterative Reconstruction (MBIR) generally produces higher quality images. However, the irregular data access pattern, the difficulty of effective parallelization and slow algorithmic convergence have made MBIR impractical for many applications. This paper presents a new algorithm for MBIR, Non-Uniform Parallel Super-Voxel (NU-PSV), that regularizes the data access pattern, enables massive parallelism and ensures fast convergence. We compare the NU-PSV algorithm with two state-of-the-art implementations on a 69632-core distributed system. Results indicate that the NU-PSV algorithm has an average speedup of 1665 compared to the fastest state-of-the-art implementations.
Authors: Xiao Wang, Amit Sabne, Putt Sakdhnagool, Sherman J. Kisner, Charles A. Bouman, Samuel P. Midkiff
Gordon Bell prize finalists are selected by a committee comprising past Gordon Bell winners, as well as leaders in the field of high performance computing. Solving an important scientific or engineering problem in HPC is important, but scientific outcomes alone are not sufficient for this prize—finalists are selected from submissions that describe the innovations of the project, detail the performance levels achieved on one or more real-world applications, and outline what the implications of the approach are for the broader HPC community.
Subhash Saini, chair of the Gordon Bell Award for the last two years, will chair the three sessions of ACM Gordon Bell Finalists presentations at SC17.