The past decade has seen a sharp rise in heterogenous computing, processing or coprocessing using more than one processor type. One of the most prominent examples of heterogenous elements in HPC is the GPU computing ecosystem that has been fostered by NVIDIA and AMD. General-purpose GPU (GPGPU) adoption has become widespread in HPC, and student supercomputing competitions are no exception.
For the last seven international supercomputing challenges – SC in the United States, ISC in Germany and ASC in China – the winning contestants have relied on “hybrid” CPU-GPU machines with NVIDIA parts. The most recent team to do so is from Shanghai Jiao Tong University (SJTU). The team took the top spot in the largest student supercomputer challenge, ASC14, held last month at Sun Yat-Sen University in Guangzhou, China.
Using a cluster comprised of eight NVIDIA K20 GPU accelerators that they built, SJTU earned the highest combined scores for a series of six tests, including an elastic wave modeling application, 3D-EW; a quantum chemistry application, Quantum ESPRESSO; and other real-world scientific codes.
Although SJTU performed best overall, China’s Sun Yat-sen University team set a new record using 216 processor cores and eight NVIDIA K40 GPUs. The team’s cluster achieved 9.27 teraflops as measured by the HPC industry standard Linpack performance benchmark, besting the previous record of 8.45 teraflops, set by Huazhong University of Science and Technology at ISC13.
According to Dr. Ye Weicai, advisor of Sun Yat-Sen University team, the participants were focused on deep and fine strategic optimization for LINPACK testing that could best exploit the heterogeneous acceleration technology and improve floating point computing capacity. Credit was also given to the HPC management software Cluster Engine for helping the contestants optimize performance and control power consumption simultaneously.
As detailed in a recent blog entry, NVIDIA’s Simon See, who is also an adjunct professor at SJTU, reached out to James Lin, team advisor and vice director of the Center for HPC, to get his thoughts on the contest and the role of GPUs. Preparation was critical, notes Lin. He relates how the team practiced running code on SJTU’s “π” supercomputer. With 100 NVIDIA Tesla K20 GPUs, graphics coprocessing comprises half of the system’s computational power. The team also reviewed the source codes used in the competition and identified the best optimization methods.
When asked about the most challenging aspect of the competition, Lin hits on one of the main issues in HPC, the separation of computer science and domain science.
“All of my students are from the computer science department, so they knew very little about the background of scientific applications, like Quantum ESPRESSO, before the contest,” Lin says. “Fortunately, some of the π users are experienced with these applications, so they were able to help. In the end, we received the top score for three of the five applications.”