TACC’s New Stampede3 Enhances NSF Supercomputing Ecosystem: Interview

By Doug Eadline

July 24, 2023

The Texas Advanced Computing Center (TACC) today announced Stampede3, a powerful new Dell Technologies and Intel based supercomputer that will enable groundbreaking open science research projects in the U.S. while leveraging the nation’s previous high performance computing investment funds. 

For over a decade, the Stampede systems — Stampede (2012) and Stampede2 (2017) — have been flagships in the U.S. National Science Foundation’s (NSF) ACCESS scientific supercomputing ecosystem. The Stampede systems continue to provide a vital capability for researchers in every field of science. 

Made possible by a $10 million award for new computer hardware from the NSF, Stampede3 will be the newest strategic resource for the nation’s open science community when it enters full production in early 2024. It will enable thousands of researchers nationwide to investigate questions that require advanced computing power. 

Stampede3 will deliver: 

  • A new 4 petaflop capability for high-end simulation: 560 new Intel® Xeon® CPU Max Series processors with high bandwidth memory-enabled nodes (HBM), adding nearly 63,000 cores for the largest, most performance-intensive compute jobs. 
  • A new graphics processing unit/AI subsystem including 10 Dell PowerEdge XE9640 servers adding 40 new Intel® Data Center GPU Max Series processors (Pointe Vecchio) for AI/ML and other GPU-friendly applications. 
  • Reintegration of 224 3rd Gen Intel Xeon Scalable processor nodes for higher memory applications (added to Stampede2 in 2021). 
  • Legacy hardware to support throughput computing — more than 1,000 existing Stampede2 2nd Gen Intel Xeon Scalable processor nodes will be incorporated into the new system to support high-throughput computing, interactive workloads, and other smaller workloads. 
  • The new Omni-Path Fabric 400 Gb/s technology offers highly scalable performance through a network interconnect with 24 TB/s backplane bandwidth to enable low latency, excellent scalability for applications, and high connectivity to the I/O subsystem. 
  • 1,858 compute nodes with more than 140,000 cores, more than 330 terabytes of RAM, 13 petabytes of new storage, and almost 10 petaflops of peak capability.  

In addition, Stampede3 will be the only system in the NSF ACCESS environment to integrate the new Intel Max Series GPUs.  

Recently, HPCwire had an opportunity to talk with Dan Stanzione, executive director of TACC about the new Stampede3 system.  

HPCWire: The Stampede system has been around for quite a while. Is the plan for a complete replacement or to reuse of some hardware existing hardware for Stampede3? 

Dan Stanzione: Historically, we have had 11 straight years of Stampede1 and Stampede2 is now in its seventh year. We hit six years about two months ago and it’s getting pretty long in the tooth. NSF had a program for new ACCESS production resources, and we pitched given that Stampede3 is smaller in scale than many new systems, Stampede3 will have new systems and add to the best parts of the current Stampede2 cluster. We ended up doing Stampede2 in various phases as well. Phase 1 started in 2016 and ended up going into 2017 and was the Knights Landing Xeon Phi systems, which was two thirds of that machine.  

And then a little later we added a bunch of Skylake nodes to it for the other third, which was always the plan. They were supposed to be a year apart, but they ended up only being a few months apart. One was a little late and the other was a little early with Intel hardware, and it ran great for years. But when NSF asked us to do one of the extensions because originally it was a four-year machine and was going to shut down in 2021 (mid-June 2021 was supposed to be the original shutdown date) I said, look, if we are going to extend it. We need something new. 

HPCwire: What was the general plan? 

Stanzione: We took out the original set of 500 or so Knights Landing systems that we had installed at the end of the Stampede1 project, they were even older than the other hardware in that system, and replaced them with Ice Lake just around beginning of 2022 (or late 2021) We now have a few hundred Ice Lake nodes in there and then we still have a bunch of Skylake nodes in there as well. And we have about 3800 nodes that are going offline this weekend (mid-July 2023), so that we already extended by over two years of their original lifespan. The Ice Lakes are new and there is no point in tossing them out. Given what we have them in-house already and that we have a smaller budget on the Stampede3, what could we really leverage? Stampede has always been an Intel based system for us, but the ACCESS audience we address with Stampede, we wanted to keep the software environment and the user model consistent. 

HPCwire: The new systems will have HBM memory, how do plan on using it? 

Stanzione: You know what we’re always starved for on these CPU systems is memory bandwidth and we are putting in 560 new Sapphire Rapids nodes. It’s about four petaflops worth of Sapphire Rapids and we are doing HBM only.  

HPCwire: Why HMB Only? 

Stanzione: These are the top skew the 56 cores per socket, so 112 cores per node systems. To really leverage that part, and in our opinion for HPC, you don’t put DIMMs in because that slows HBM down. The systems have, 128G of pure HBM and the thinking is, if you need more per core than and you can’t scale out to more cores to reduce your memory footprint, then that’s what we still have those Ice Lake nodes for. They have 256GB of memory and less cores, so they offer 3GB per core.  

In pure HBM mode there is still a little over a 1GB per core, but if we put the DIMMs in, you pay for the memory twice because you pay for all the HBM and then you pay for added memory. Adding another 24 DIMMs to those nodes increases the cost by about four grand. And then when you hit standard memory by accident things run slower. 

With this arrangement, we guarantee that every application runs only out of HBM. The best way to enforce that is to not put in any other RAM. We are seeing in some cases as high as a factor of two or better versus a regular Sapphire Rapids with just standard DIMMs. We did some side-by-side comparisons. When you look back at the Cascade Lakes and Frontera or the Sky Lakes in Stampede2 we are seeing a 5X per socket with the HBM bandwidth from both the Sapphire Rapids and the HBM improvement. There are a few codes where performance is not that much different because they are not memory bandwidth bound, but on average I think we are seeing 60 to 70% improvement for our most common codes. Keep in mind one of the big advantages GPUs have had, is a lot more memory bandwidth per flop than a CPU has. So the core of what we’re doing in a big a Sapphire Rapid system with very fast memory. 

HPCwire: Are you including any other hardware? 

Stanzione: Again, we are adding 560 nodes of Sapphire Rapids with HBM and along with that we are going to put in a small system with exploratory capability using Intel Ponte Vecchio. We are still negotiating exactly how much of that will have, but I would say a minimum of 40 nodes and maximum of a hundred or so. We have our Lone Star systems with a few hundred Nvidia GPUs and that’s where we’re doing most of our AI work. But we want to see if we can move some of that workload onto Intel. But we haven’t really exposed that to the user community. We don’t know what the uptake is going to be. So, we’re just putting a couple of racks of Ponte Vecchio out there to see how people work with it. It is coming up on Aurora right now, if we get good adoption, we hope to add some more. But at this point, we need to get the user software base to sort-of kick the tires on this and figure out the software frameworks and all that kind of stuff for it.  

On the smaller Ponte Vecchio system, there will be four-way nodes, that is four GPUs per node, again with Sapphire Rapids. And then what we’re going to do is re-purpose a fair amount of what’s left over from Stampede starting with the Ice Lake nodes we’ll have about 224 nodes with 256 GB and 80 cores per node. 

Those nodes will deal with the larger memory workloads, and then to continue the broad mission of Stampede2 with Stampede3 as we see a ton of single node Python and Matlab throughput jobs that don’t really care about turnaround time that much. Actually, we don’t have maintenance on our Skylake nodes anymore, so we’re going to keep a reserve and let some fail. And we promised to keep 1000 Skylake nodes going as a throughput system providing another 48,000 cores of Skylake. So altogether Stampede3 is about 1858 nodes with 140,000 cores.  

HPCWire: What about storage? 

Stanzione: We are going to completely replace the storage system and share with another system we’re putting in. We have decided to go with VAST as our storage and do an all-solid state scratch and home volumes. That will be split with another new system we’re putting in this year. We have successful exploratory work with VAST on Frontera, and we’re actually going to tie it into the Frontera fabric because we want to try it at a bigger scale and see how it does. 

We have been really impressed by the scalability of VAST so far. And of course, the disk on Stampede is the thing that is showing its age the most because those disks on scratch get beat-up and it’s been over six years of it day-in and day-out use and Lustre has been a great file system. But, you know, the disk drive failures are going way up as it gets older. So, as you might imagine a new file system is in order. 

HPCwire: And what about the interconnect? 

Stanzione:. When we did Stampede2, it was all Intel Omnipath. So, it’s an all Omnipath system at the moment. For Stampede3, we’re going to reuse a lot of that fabric because we have a ton of 100G Omnipath. All those director class switches are getting old, however, so we’re getting new top of rack switches but now it is a smaller fabric than Stampede2 with 6000 nodes. For the Ice Lake and the Skylake legacy nodes and initially for the Sapphire Rapids we will do 100G Omnipath.  But then when it comes out next year, in 2024, we’ll add 400G Omnipath to the Sapphire Rapids nodes and build a non-blocking network for that side Stampede3. 

HPCwire: How will do the transition to this hardware? 

Stanzione: That will come as a second phase because we really want to do this without a real break in service from Stampede 2. We are hoping to get the nodes in by Supercomputing (SC23) but we’re certainly want to be in production by the first quarter of next year (2024).  To make that first quarter, we didn’t want to wait on the 400G Omnipath and decided to bring it up at 100G and then update it when the 400G parts come out. And again, it’s just those 560 nodes and the GPU nodes that will get the 400G network. The other 1,300 nodes we don’t have to re-cable.  

So, in general updated core fabric, new 400G fabric for the Sapphire Rapids, new VAST storage system, new Ponte Vecchio systems, and then reusing about half of the Skylakes, and we’ll keep the rest for spare parts so we can keep them running longer. We’ll have a few hundred spares. We’ll keep the Ice Lake for high memory applications since those are only two years old to begin with. 

It should be an almost transparent migration for the users because anything you have already built ought to still work. I sometimes forget how old the system is. We are using CentOS-7 now on Stampede2, but we might have been older than that when it started in 2017. The plan is to update it to Rocky-9.  

HPCwire: That was my next question. What distribution do you plan to use given the recent changes with Red Hat? 

Stanzione: There are some things I’m not willing to say yet on that, but for now, we went with Rocky-8 on LoneStar, our AMD based system last year, and we are going to do Rocky-9 for now. There are some conversations happening, but the plan of record for today is Rocky-9. 

HPCwire:  So, the transition/shutdown is basically in progress? 

Stanzione: So, again, we are hoping for the first quarter 2024. A little sooner if we get lucky with hardware deliveries. We are shutting down parts of Stampede on the 15th of July. Actually, we’re shutting down submission and we will let everything drain down. So really, the last days are more like the 17th or 18th of July. We are going to keep the remaining Skylake and Ice Lake partitions running at full speed until mid-October when we start doing the switch over to the new file system. We’re going to bring up the file system first, let users migrate data that they want from scratch on the old one to scratch on the new one. We’ll take care of home directories. And then we’ll slowly reduce the amount of Sky Lakes and Ice Lakes available as we bring them up on the new machine. So, the idea is that there is always a Stampede running at some scale for the users. The plan is to never cut user’s access.  To help with this plan, we did stop taking allocations in the spring so that we have less people actually allocated on the machine for this fall/winter when the big transition happens. 

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

A Big Memory Nvidia GH200 Next to Your Desk: Closer Than You Think

February 22, 2024

Students of the microprocessor may recall that the original 8086/8088 processors did not have floating point units. The motherboard often had an extra socket for an optional 8087 math coprocessor. The math coprocessor ma Read more…

IonQ Reports Advance on Path to Networked Quantum Computing

February 22, 2024

IonQ reported reaching a milestone in its efforts to use entangled photon-ion connectivity to scale its quantum computers. IonQ’s quantum computers are based on trapped ions which feature long coherence times and qubit Read more…

Apple Rolls out Post Quantum Security for iOS

February 21, 2024

Think implementing so-called Post Quantum Cryptography (PQC) isn't important because quantum computers able to decrypt current RSA codes don’t yet exist? Not Apple. Today the consumer electronics giant started rolling Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to derive any substantial value from it. However, the GenAI hyp Read more…

QED-C Issues New Quantum Benchmarking Paper

February 20, 2024

The Quantum Economic Development Consortium last week released a new paper on benchmarking – Quantum Algorithm Exploration using Application-Oriented Performance Benchmarks – that builds on earlier work and is an eff Read more…

AWS Solution Channel

Shutterstock 2283618597

Deep-dive into Ansys Fluent performance on Ansys Gateway powered by AWS

Today, we’re going to deep-dive into the performance and associated cost of running computational fluid dynamics (CFD) simulations on AWS using Ansys Fluent through the Ansys Gateway powered by AWS (or just “Ansys Gateway” for the rest of this post). Read more…

Atom Computing Reports Advance in Scaling Up Neutral Atom Qubit Arrays

February 15, 2024

The scale-up challenge facing quantum computing (QC) is daunting and varied. It’s commonly held that 1 million qubits (or more) will be needed to deliver practical fault tolerant QC. It’s also a varied challenge beca Read more…

A Big Memory Nvidia GH200 Next to Your Desk: Closer Than You Think

February 22, 2024

Students of the microprocessor may recall that the original 8086/8088 processors did not have floating point units. The motherboard often had an extra socket fo Read more…

Apple Rolls out Post Quantum Security for iOS

February 21, 2024

Think implementing so-called Post Quantum Cryptography (PQC) isn't important because quantum computers able to decrypt current RSA codes don’t yet exist? Not Read more…

QED-C Issues New Quantum Benchmarking Paper

February 20, 2024

The Quantum Economic Development Consortium last week released a new paper on benchmarking – Quantum Algorithm Exploration using Application-Oriented Performa Read more…

The Pulse of HPC: Tracking 4.5 Million Heartbeats of 3D Coronary Flow

February 15, 2024

Working in Duke University's Randles Lab, Cyrus Tanade, a National Science Foundation graduate student fellow and Ph.D. candidate in biomedical engineering, is Read more…

It Doesn’t Get Much SWEETER: The Winter HPC Computing Festival in Corpus Christi

February 14, 2024

(Main Photo by Visit Corpus Christi CrowdRiff) Texas A&M University's High-Performance Research Computing (HPRC) team hosted the "SWEETER Winter Comput Read more…

Q-Roundup: Diraq’s War Chest, DARPA’s Bet on Topological Qubits, Citi/Classiq Explore Optimization, WEF’s Quantum Blueprint

February 13, 2024

Yesterday, Australian start-up Diraq added $15 million to its war chest (now $120 million) to build a fault tolerant computer based on quantum dots. Last week D Read more…

2024 Winter Classic: Razor Thin Margins in HPL/HPCG

February 12, 2024

The first task for the 11 teams in the 2024 Winter Classic student cluster competition was to run and optimize the LINPACK and HPCG benchmarks. As usual, the Read more…

2024 Winter Classic: We’re Back!

February 9, 2024

The fourth edition of the Winter Classic Invitational Student Cluster Competition is up and running. This year, we have 11 teams of eager students representin Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire