At the Leibniz Supercomputing Centre (LRZ) in München, Germany – one of the constituent centers of the Gauss Centre for Supercomputing (GCS) – the SuperMUC-NG system has stood tall for several years, placing 15th on the most recent Top500 list. But LRZ isn’t done with the nearly 6,500-node system yet: the center has announced that it will expand SuperMUC-NG through “phase two,” which will focus on advancing the supercomputer’s AI capabilities through a partnership with Intel and Lenovo.
Currently, SuperMUC-NG (pictured in header) includes 6,336 “thin” nodes, each with dual Intel Xeon “Skylake” CPUs and 96 GB of memory, alongside 144 “fat” nodes, equipped with the same CPUs but an additional 672 GB of memory per node. The water-cooled system, which also includes over 70 petabytes of storage, delivers 19.5 Linpack petaflops.
Phase two will see SuperMUC-NG upgraded with 240 new nodes outfitted with the upcoming Intel Xeon “Sapphire Rapids” CPUs and Intel’s planned “Ponte Vecchio” GPU, integrated in Lenovo’s future SD650-I v3 server platform. Intel’s GPU, planned to power Argonne National Laboratory’s forthcoming Aurora exascale supercomputer, has been the source of much intrigue, particularly after a defect in Intel’s 7nm process delayed its deployment. SuperMUC-NG phase two marks the first announced application of Ponte Vecchio outside of Aurora.
The accompanying one petabyte of additional storage will include Intel’s distributed asynchronous object storage (DAOS) powered by Intel Xeon “Ice Lake” CPUs and Intel Optane Persistent Memory 200-series.
“We are thrilled that LRZ has chosen to partner with Intel in bringing their SuperMUC system to market based on Intel’s XPU product portfolio, advanced packaging and memory technologies and the unified oneAPI software stack to power the next generation of high performance computing,” said Raja Koduri, senior vice president, chief architect and general manager of architecture, graphics and software at Intel.
The faster CPUs – and perhaps more importantly, the GPUs – will help LRZ better serve increasing AI demands on its systems. The center explains that until recently, the bulk of the demand was coming from historically compute-intensive fields like physics and engineering, but the boom in both HPC and AI has induced new (and different) demand from fields ranging from medicine to the humanities.
“At the core of all LRZ activities is the user. It is our utmost priority to provide researchers with the resources and services they need to excel in their scientific domains,” said Dieter Kranzlmüller, director of LRZ. “Over the last years, we’ve observed our users accessing our systems not only for classical modeling and simulation, but increasingly for data analysis with artificial intelligence methods.”
Lenovo’s contribution shows signs of the increasing lean toward locally sourced technologies in European supercomputers: the company will be manufacturing all of its components for SuperMUC-NG phase two in Hungary, as opposed to its American or Asian production facilities.
Currently, elements of SuperMUC-NG phase two are in testing in LRZ’s test environment, BEAST (Bavarian Energy, Architecture, Software Testbed). The DAOS storage system is expected to arrive in fall of this year ahead of the new compute system, which is planned for delivery in the spring of 2022. In order to prepare its researchers, LRZ has granted them access to GPU-based systems and its in-house experts are helping the teams adapt their codes and algorithms in advance of the supercomputer’s new capabilities.