Pegasus ‘Big Memory’ Supercomputer Now Deployed at the University of Tsukuba

By Tiffany Trader

March 25, 2023

In the bevy of news from Nvidia’s GPU Technology Conference this week, another new system has come to light: Pegasus, which entered operations at the University of Tsukuba’s Center for Computational Sciences in January. Center director Taisuke Boku shared details of the new “big memory” system, which is among the first to use Nvidia H100 GPUs and Intel Sapphire Rapids CPUs.

Built by NEC, Pegasus comprises 120 compute nodes, each equipped with one Nvidia H100 PCIe GPU and one Intel Sapphire Rapids 48-core CPU (running at 2.1 GHz), delivering an aggregate 6.5 petaflops of theoretical double-precision performance. The system also includes Intel 300-series Optane persistent memory (2 tebibytes per node), DDR5 memory (128 gibibytes per node), NVMe SSD storage (2 x 3.2 terabytes per node), and Nvidia NDR200 InfiniBand networking. A parallel file system supplied by DDN provides 7.1 petabytes of 40 Gbps storage. 

An additional three log-in nodes each house dual Sapphire Rapids CPUs, 256 gibibytes DDR5 memory, and NVMe SSD storage.

Pegasus compute node diagram. Courtesy of the University of Tsukaba CCS.

“The new supercomputer Pegasus is one of the first systems in the world to introduce 4th Gen Intel Xeon Scalable processors (formerly codenamed Sapphire Rapids), Intel Optane persistent memory (codenamed Crow Pass), and the Nvidia H100 Tensor Core GPU with 51 teraflops of breakthrough acceleration,” reported the University of Tsukuba’s Center for Computational Sciences.

It may well also be one of the last systems to use Optane as Intel announced the discontinuation of that product last year. The parts are warrantied for five-years and Intel has promised support for Pegasus through that period. CXL-based memory technologies are being looked at for a future persistent memory option.

The project team is reporting a Linpack score for Pegasus of 3.47 petaflops, which should secure it a spot on the upcoming – in May – Top500 list. Gains in energy-efficiency are expected, owing significantly to the Hopper GPU and the persistent memory. Boku said he expects Pegasus to be more energy-efficient than Henri, the H100-powered, U.S.-based system that achieved the highest green ranking in November, clocking 65.09 gigaflops per watt.

Above, Professor Boku lays out why big memory is needed. Below, the summary slide from his GTC presentation.

By the University of Tsukuba’s measure, Pegasus also has a higher Linpack efficiency, that is the usable portion of theoretical peak flops: 54% for Pegasus versus Henri’s 37.6%. Both numbers come up short of the list’s ~65% average. Further optimizations could be in store for either system, however, so these numbers are in a sense provisional until the next Top500 list is published.

Cygnus node diagram (Source: Professor Boku)

The new system joins Cygnus, which came online in 2019 and was unique in combining GPU and FPGA technology. All of Cygnus’ 80 nodes are equipped with four Nvidia V100 GPUs and half of those nodes are additionally equipped with two Intel Stratix 10 FPGA devices.

Asked during his GTC presentation why Pegasus doesn’t make use of FPGAs, Boku indicated the systems were designed for different purposes, while also noting the high cost of FPGAs. “On Cygnus, we are researching the very interesting combination of GPU+FPGA, but currently the programming is not easy for application users. So we focus on PMEM and the new H100 for HPC+AI on Pegasus.” 

“[Further,] Cygnus pursues performance, and Pegasus has a different viewpoint of expanding HPC + AI applications. For example, PMEM’s 2 tebibytes-per-node is useful for AI solutions that say, ‘I don’t want to force MPI parallelism, but I want memory.’ Many AI applications are running on one node, and this is strongly supported.”

Pegasus, which in its planning stages went by the name Cygnus-BD, will enable much larger simulations on traditional HPC applications in fields such as astrophysics, climate and bioscience, and the large memory will also be brought to bear for big data and AI workloads across a range of domains, including drug discovery. Preliminary testing shows an astrophysical simulation code, called ARGOT, running 1.86x faster on Pegasus’ H100 GPU compared with Cygnus’ V100.

On the origin of the name Pegasus and associated cabinet art, Boku shared, “The big wings represent the space of big memory, and the flying horse represents high-speed GPU computation. It also has the implication that it is a sibling machine of Cygnus that has been operated so far. These two constellations are almost next to each other in the sky.”

Source: Professor Boku’s MVAPICH User Group (MUG) 2022 keynote
Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Quantum Internet: Tsinghua Researchers’ New Memory Framework could be Game-Changer

April 25, 2024

Researchers from the Center for Quantum Information (CQI), Tsinghua University, Beijing, have reported successful development and testing of a new programmable quantum memory framework. “This work provides a promising Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Point. The system includes Intel's research chip called Loihi 2, Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical.  A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Quantum Internet: Tsinghua Researchers’ New Memory Framework could be Game-Changer

April 25, 2024

Researchers from the Center for Quantum Information (CQI), Tsinghua University, Beijing, have reported successful development and testing of a new programmable Read more…

Intel’s Silicon Brain System a Blueprint for Future AI Computing Architectures

April 24, 2024

Intel is releasing a whole arsenal of AI chips and systems hoping something will stick in the market. Its latest entry is a neuromorphic system called Hala Poin Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire