Argonne’s New Sunspot Testbed Provides On-Ramp for Aurora Exascale Supercomputer

April 20, 2023

April 20, 2023 — Researchers preparing scientific codes and workloads to run on the Aurora exascale supercomputer at the U.S. Department of Energy’s (DOE) Argonne National Laboratory now have a new resource at their disposal.

Sunspot is a two-rack test and development system equipped with 128 nodes of the same technologies that will power Argonne’s Aurora exascale supercomputer. Image: Argonne National Laboratory.

Named Sunspot, the new test and development system has the exact same architecture as Aurora, which is currently under construction at the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science user facility. Aurora, an Intel-Hewlett Packard Enterprise (HPE) system, will be comprised of more than 10,000 nodes, each equipped with two new Intel Xeon CPU Max Series processors and six Intel Data Center GPU Max Series processors. Sunspot is a two-rack testbed with 128 nodes of the same technologies.

“Sunspot is basically a miniature version of Aurora,” said Susan Coghlan, ALCF project director for Aurora. “It gives teams a platform to optimize code performance on the actual Aurora hardware, including the system’s Intel CPUs (central processing units) and GPUs (graphics processing units), and the HPE Slingshot interconnect that connects all the components together.”

Prior to Sunspot’s arrival, development teams leveraged earlier Aurora testbeds (Iris, Arcticus, and Florentia at Argonne, and Borealis at Intel) and DOE supercomputers, including Argonne’s Polaris, to carry out exascale code development. While those systems continue to be useful tools for Aurora preparations, Sunspot’s identical architecture gives researchers an ideal environment to further optimize application performance for the exascale supercomputer.

“Test and development systems are an important on-ramp for larger production supercomputers,” said Tim Williams, co-manager of the ALCF’s Aurora Early Science Program (ESP). “With our Early Science Program for new supercomputers, the goal is to be ready for science on day one of deploying a new system. Testbeds like Sunspot allow researchers to carry out performance studies and scale up their workloads to run on much larger supercomputers while those systems are still being built.”

Sunspot is powered by the same Intel Xeon CPU Max Series processors and Intel Data Center GPU Max Series processors that are found in Aurora. Image: Argonne.

Since Sunspot’s launch in December, over 180 researchers from over 20 application development teams from the ESP and DOE’s Exascale Computing Project (ECP) have begun accessing the testbed for scaling and performance optimization research. The Aurora ESP is supporting 15 research teams tasked with preparing key applications for the architecture and scale of the new supercomputer, with a strong emphasis on incorporating data-intensive computing and AI applications. In the process, the ESP teams also help solidify software libraries and infrastructure to pave the way for other researchers to run on the system. The ECP, on the other hand, is a broader effort with a similar end goal. Launched in 2016, the ECP is a massive multi-institutional initiative focused on building a capable exascale computing ecosystem. This includes developing the applications, software, and hardware technologies that will support science on the nation’s first exascale systems.

Williams noted that the ESP and ECP teams’ early runs on the Intel Max Series GPUs have been promising. At the recent HPC Asia 2023 conference, Williams and colleagues — Venkat Vishawanath, ALCF data science team lead and ESP co-manager, and Scott Parker, ALCF performance engineering team lead — presented some initial performance results compared to leading alternative GPUs.

  • As part of the ECP ExaSMR (Exascale Small Modular Reactor) project, researchers achieved 30-70% performance improvements with NekRS, a GPU-oriented thermal-fluids simulation code, across a set of benchmark problems.
  • Another ExaSMR code, OpenMC, which is used for neutron and photon transport simulations, showed a 205% performance advantage on the Intel GPUs.
  • Supported by ESP and ECP projects, the Argonne-developed Hardware/Hybrid Accelerated Cosmology Code (HACC) has seen 2.6x speedups in early runs on the hardware.
  • QMCPACK, a quantum Monte Carlo code used for electronic structure calculations, has shown a 50% improvement in runs thus far. QMCPACK’s exascale development is supported by both ESP and ECP.
  • XGC, a fusion plasma simulation code that is also supported by ESP and ECP, has performed 60% faster using an initial test problem.

The ALCF team expects the codes to see further performance improvements as the teams continue to do multi-node scaling and optimization work on Sunspot and other available computing resources. The ALCF is also using the testbed for various Aurora training events, including ESP hackathons and a tutorial at the ECP’s recent 2023 annual meeting.

In addition to helping researchers prepare applications for Aurora, Sunspot is also extremely valuable to the ALCF and Intel as they continue work to stand up the exascale system. For example, the team is using Sunspot’s Intel DAOS (Distributed Asynchronous Object Storage) storage system to test and enhance I/O performance.

“Sunspot is the first time we’re seeing how everything is working together,” Coghlan said. “We learn a lot from these runs. It gives us a chance to iron out some of the kinks before Aurora is ready for users.”

“Some bugs don’t show up until you start running real applications on the hardware, that’s the whole idea behind the Early Science Program,” Williams added. “These early runs help with uncovering and, in some cases, actually diagnosing issues.”

Sunspot is expected to serve a role even after Aurora is powered on. Like the ALCF’s previous test and development systems, Sunspot can be proving ground for new users to test and optimize code performance before moving to Aurora. ALCF staff can also use it to validate and benchmark new software that is targeted for Aurora.

About Argonne

The Argonne Leadership Computing Facility provides supercomputing capabilities to the scientific and engineering community to advance fundamental discovery and understanding in a broad range of disciplines. Supported by the U.S. Department of Energy’s (DOE’s) Office of Science, Advanced Scientific Computing Research (ASCR) program, the ALCF is one of two DOE Leadership Computing Facilities in the nation dedicated to open science.

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation’s first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America’s scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy’s Office of Science.


Source: Jim Collins, Argonne Lab

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Quantum Market, Though Small, will Grow 22% and Hit $1.5B in 2026

December 7, 2023

Few markets as small as the quantum information sciences market generate as much lively discussion. Hyperion Research pegged the worldwide quantum market at $848 million for 2023 and expects it to reach ~$1.5 billion in Read more…

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed its new Instinct MI300X GPU is the fastest AI chip in the worl Read more…

Finding Opportunity in the High-Growth “AI Market” 

December 6, 2023

 “What’s the size of the AI market?” It’s a totally normal question for anyone to ask me. After all, I’m an analyst, and my company, Intersect360 Research, specializes in scalable, high-performance datacenter Read more…

Imagine a Beowulf Cluster of SuperNODEs …
(They did)

December 6, 2023

Clustering resources for faster performance is not new. In the early days of clustering, the Beowulf project demonstrated that high performance was achievable from commodity hardware. These days, the "Beowulf cluster mem Read more…

The IBM-Meta AI Alliance Promotes Safe and Open AI Progress

December 5, 2023

IBM and Meta have co-launched a massive industry-academic-government alliance to shepherd AI development. The new group has united under the AI Alliance banner to promote responsible innovation in AI. Historically, techn Read more…

AWS Solution Channel

Shutterstock 2030529413

Reezocar Rethinks Car Buying Using Computer Vision and ML on AWS

Overview

Every car that finds its way to a landfill marks another dent in the fight for a sustainable future. Reezocar, an online hub for buying and selling used cars, has a mission to change this. Read more…

QCT Solution Channel

QCT and Intel Codeveloped QCT DevCloud Program to Jumpstart HPC and AI Development

Organizations and developers face a variety of issues in developing and testing HPC and AI applications. Challenges they face can range from simply having access to a wide variety of hardware, frameworks, and toolkits to time spent on installation, development, testing, and troubleshooting which can lead to increases in cost. Read more…

ChatGPT Friendly Programming Languages
(hello-world.llm)

December 4, 2023

 Using OpenAI's ChatGPT to write code is an alluring goal. Describing "what to" solve, but not "how to solve" would be a huge breakthrough in computer programming. Alas, we are nowhere near this capability. In particula Read more…

Quantum Market, Though Small, will Grow 22% and Hit $1.5B in 2026

December 7, 2023

Few markets as small as the quantum information sciences market generate as much lively discussion. Hyperion Research pegged the worldwide quantum market at $84 Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Finding Opportunity in the High-Growth “AI Market” 

December 6, 2023

 “What’s the size of the AI market?” It’s a totally normal question for anyone to ask me. After all, I’m an analyst, and my company, Intersect360 Res Read more…

Imagine a Beowulf Cluster of SuperNODEs …
(They did)

December 6, 2023

Clustering resources for faster performance is not new. In the early days of clustering, the Beowulf project demonstrated that high performance was achievable f Read more…

The IBM-Meta AI Alliance Promotes Safe and Open AI Progress

December 5, 2023

IBM and Meta have co-launched a massive industry-academic-government alliance to shepherd AI development. The new group has united under the AI Alliance banner Read more…

Shutterstock 1336284338

ChatGPT Friendly Programming Languages
(hello-world.llm)

December 4, 2023

 Using OpenAI's ChatGPT to write code is an alluring goal. Describing "what to" solve, but not "how to solve" would be a huge breakthrough in computer programm Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

The Annual SCinet Mandala

November 30, 2023

Perhaps you have seen images of Tibetan Buddhists creating beautiful and intricate images with colored sand. These sand mandalas can take weeks to create, only Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

SC23 Booth Videos

Achronix @ SC23
AMD @ SC23
AWS @ SC23
Altair @ SC23
CoolIT @ SC23
Cornelis Networks @ SC23
CoreHive @ SC23
DDC @ SC23
HPE @ SC23 with Justin Hotard
HPE @ SC23 with Trish Damkroger
Intel @ SC23
Intelligent Light @ SC23
Lenovo @ SC23
Penguin Solutions @ SC23
QCT Intel @ SC23
Tyan AMD @ SC23
Tyan Intel @ SC23
HPCwire LIVE from SC23 Playlist

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire