In the bevy of news from Nvidia’s GPU Technology Conference this week, another new system has come to light: Pegasus, which entered operations at the University of Tsukuba’s Center for Computational Sciences in January. Center director Taisuke Boku shared details of the new “big memory” system, which is among the first to use Nvidia H100 GPUs and Intel Sapphire Rapids CPUs.
Built by NEC, Pegasus comprises 120 compute nodes, each equipped with one Nvidia H100 PCIe GPU and one Intel Sapphire Rapids 48-core CPU (running at 2.1 GHz), delivering an aggregate 6.5 petaflops of theoretical double-precision performance. The system also includes Intel 300-series Optane persistent memory (2 tebibytes per node), DDR5 memory (128 gibibytes per node), NVMe SSD storage (2 x 3.2 terabytes per node), and Nvidia NDR200 InfiniBand networking. A parallel file system supplied by DDN provides 7.1 petabytes of 40 Gbps storage.
An additional three log-in nodes each house dual Sapphire Rapids CPUs, 256 gibibytes DDR5 memory, and NVMe SSD storage.
“The new supercomputer Pegasus is one of the first systems in the world to introduce 4th Gen Intel Xeon Scalable processors (formerly codenamed Sapphire Rapids), Intel Optane persistent memory (codenamed Crow Pass), and the Nvidia H100 Tensor Core GPU with 51 teraflops of breakthrough acceleration,” reported the University of Tsukuba’s Center for Computational Sciences.
It may well also be one of the last systems to use Optane as Intel announced the discontinuation of that product last year. The parts are warrantied for five-years and Intel has promised support for Pegasus through that period. CXL-based memory technologies are being looked at for a future persistent memory option.
The project team is reporting a Linpack score for Pegasus of 3.47 petaflops, which should secure it a spot on the upcoming – in May – Top500 list. Gains in energy-efficiency are expected, owing significantly to the Hopper GPU and the persistent memory. Boku said he expects Pegasus to be more energy-efficient than Henri, the H100-powered, U.S.-based system that achieved the highest green ranking in November, clocking 65.09 gigaflops per watt.
By the University of Tsukuba’s measure, Pegasus also has a higher Linpack efficiency, that is the usable portion of theoretical peak flops: 54% for Pegasus versus Henri’s 37.6%. Both numbers come up short of the list’s ~65% average. Further optimizations could be in store for either system, however, so these numbers are in a sense provisional until the next Top500 list is published.
The new system joins Cygnus, which came online in 2019 and was unique in combining GPU and FPGA technology. All of Cygnus’ 80 nodes are equipped with four Nvidia V100 GPUs and half of those nodes are additionally equipped with two Intel Stratix 10 FPGA devices.
Asked during his GTC presentation why Pegasus doesn’t make use of FPGAs, Boku indicated the systems were designed for different purposes, while also noting the high cost of FPGAs. “On Cygnus, we are researching the very interesting combination of GPU+FPGA, but currently the programming is not easy for application users. So we focus on PMEM and the new H100 for HPC+AI on Pegasus.”
“[Further,] Cygnus pursues performance, and Pegasus has a different viewpoint of expanding HPC + AI applications. For example, PMEM’s 2 tebibytes-per-node is useful for AI solutions that say, ‘I don’t want to force MPI parallelism, but I want memory.’ Many AI applications are running on one node, and this is strongly supported.”
Pegasus, which in its planning stages went by the name Cygnus-BD, will enable much larger simulations on traditional HPC applications in fields such as astrophysics, climate and bioscience, and the large memory will also be brought to bear for big data and AI workloads across a range of domains, including drug discovery. Preliminary testing shows an astrophysical simulation code, called ARGOT, running 1.86x faster on Pegasus’ H100 GPU compared with Cygnus’ V100.
On the origin of the name Pegasus and associated cabinet art, Boku shared, “The big wings represent the space of big memory, and the flying horse represents high-speed GPU computation. It also has the implication that it is a sibling machine of Cygnus that has been operated so far. These two constellations are almost next to each other in the sky.”