Aurora the Survivor: Exascale Supercomputer Arrives After Eight Years of Doom

By Agam Shah

November 13, 2023

Many products were sacrificed in the eight years it took to bring the Aurora supercomputer to life. Nonetheless, anticipation for the second U.S.Exascale system reached a fever pitch over the last two years.

The supercomputer, built on Intel technology, finally crossed the finish line and made it to the Top500 list, but it has not reached the promised two exaflops of performance.

The supercomputer at Argonne National Labs placed second in the Top500 November list released this week, behind Frontier, which retained the top spot. The system delivered a peak performance of 585.34 PFlop, ranking it second on the list but still not into Exascale territory.

Aurora has more than 60,000 GPUs, making it the largest GPU installation in the world. It has over 10,000 computing nodes, over 166 racks, and over 80,000 networking nodes.

Argonne submitted HPL runs for a portion of Aurora, so the benchmarking is incomplete.

The system could ultimately pass two exaflops when all the testing and finetuning is complete, Top500 wrote in a statement.

“Aurora is currently being commissioned and will reportedly exceed Frontier with a peak performance of 2 EFlop/s when finished,” Top500 wrote.

Intel does not have a major GPU until 2025, so this may be the company’s last major Top 10 entry for a few years. Intel has canceled its next-generation Rialto Bridge GPU and has scheduled its next major GPU upgrade for release in 2025.

Meanwhile, Nvidia has three new GPUs coming in the next three years, and AMD’s Epyc CPUs and MI300A will be in a two-exaflop system called El Capitan, which is being installed at the Lawrence Livermore National Lab.

Aurora was first announced in 2015 as a 200-petaflop system and has survived eight years of configuration changes, hardware cancellations, and budget delays.

The initial Aurora system was scheduled to come online in 2018. At the time, it was due to have Intel’s now-canned Xeon Phi code-named Knights Hill, Xeon CPUs, and silicon photonics. At the time, Xeon Phi was Intel’s response to supercomputing GPUs.

Aurora’s plans changed after Intel axed the Xeon Phi chips in 2017 and replaced it with “a new platform and new microarchitecture specifically designed for exascale,” the company said.

Xeon Phi mixed vector processors with low-power CPUs, and it wasn’t a complete failure. It was in four top 10 systems, including China’s Tianhe-2A, in the November 2017 Top500 list.

But Nvidia’s GPUs broke Phi in the June 2018 list, when it took the first and third spot, in the Top500 with Summit and Sierra, which were based on IBM’s Power9 chips.

There was a lot of action when Intel discontinued Phi in 2017. In September of that year, the Advanced Scientific Computing Advisory Committee announced a change of plans that made Aurora US’s first exascale system. Intel and Cray were retained as vendors, and the server’s delivery date was moved to 2021 from 2018.

In late 2017, Intel hired graphics guru Raja Koduri away from AMD, which signaled the chipmaker’s intent to create a GPU. Intel wanted to replicate the success of Nvidia’s GPUs in supercomputers.

In March 2019, the U.S. Department of Energy announced Intel and Cray would deliver Aurora by 2021. Intel announced Ponte Vecchio on the sidelines of Supercomputing 2019 and said it would use the GPU in Aurora.

Intel Aurora Blade

But then more problems beset Intel. In July 2020, Intel delayed its move to the 7-nm process technology on which Ponte Vecchio would be made.

Ultimately, Intel had to turn to manufacturing rival TSMC for some Ponte Vecchio parts. The GPU has a chiplet design and more than 100 billion transistors and has sixteen compute tiles made on TSMC’s 5-nm process and eight tiles made on Intel’s 7-nm process.

The supercomputer’s main CPU, the 4th Gen Xeon chip code-named Sapphire Rapids, was also delayed by over a year, which delayed the installation.

Over the years, the U.S. Department of Energy has remained patient with Aurora. In a 2024 budget request, the DoE mentioned that Covid-related supply chain issues had delayed Aurora and that shortages had slowed the technical implementation.

In June 2023, Intel finally announced that it had completed the supercomputer installation. But the system still has not reached its peak of 2 exaflops, and further software finetuning will push the system speed even further.

The Aurora is being used for many A.I. and scientific computing applications, said Ogi Brkic, vice president and general manager for data center and HPC solutions, in a press briefing.

Aurora will be used to train a 1-trillion-parameter large-language model for scientific research. The supercomputer is also being used to reconstruct the mouse brain, which could take three years.

“This gives you a complexity of the problems being solved here. If you want to map the human brain, that’s not even close.” Brkic said.

The mouse brain reconstruction project called Connectome was running on 512 Aurora nodes and was showing better performance than Polaris, a top-20 supercomputer deployed recently by Argonne.

“Applications today are continuously being optimized, and they are not just functional, but they’re scalable, which is very important when trying to get to the solution fast,” Brkic said.

The A.I. capabilities on Aurora are also being used to understand the interaction between particles. Some data sets are polluted by other particles in the cosmos and noise, which is where A.I. fits in.

“Understanding these interactions requires a training algorithm that allows you to train and understand and reference these interactions most effectively,” Brkic said.

Intel also shared additional details about its next-generation GPU and A.I. chips. Next year, the company will start shipping the Gaudi 3 chip, competing with Nvidia’s GPUs.

The 2025 enterprise GPU, Falcon Shores, will mix Gaudi3 AI accelerator technology and general-purpose GPU cores. It will also have HBM memory and standard Ethernet switching and will support a wide range of large-language models, Brkic said.

Brkic also talked about the next Xeon chip, Emerald Rapids, which will be officially announced on December 14 and is considered an incremental upgrade to Sapphire Rapids. The successor to Emerald Rapids, called Granite Rapids, has many excited. It can do both A.I. and conventional and has breakthrough memory and bandwidth technology.

Intel is also trying to get Nvidia customers off the proprietary CUDA parallel programming framework and onto its OneAPI through tools such as SYCLomatic, which can strip proprietary code so A.I. models can work on industry-standard hardware.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Nvidia Is Increasingly the Secret Sauce in AI Deployments, But You Still Need Experience

October 14, 2024

I’ve been through a number of briefings from different vendors from IBM to HP, and there is one constant: they are all leaning heavily on Nvidia for their AI services strategy. That may be a best practice, but Nvidia d Read more…

Zapata Computing, Early Quantum-AI Software Specialist, Ceases Operations

October 14, 2024

Zapata Computing, which was founded in 2017 as a Harvard spinout specializing in quantum software and later pivoted to an AI focus, is ceasing operations, according to an SEC filing last week. Zapata had gone public one Read more…

AMD Announces Flurry of New Chips

October 10, 2024

AMD today announced several new chips including its newest Instinct GPU — the MI325X — as it chases Nvidia. Other new devices announced at the company event in San Francisco included the 5th Gen AMD EPYC processors, Read more…

NSF Grants $107,600 to English Professors to Research Aurora Supercomputer

October 9, 2024

The National Science Foundation has granted $107,600 to English professors at US universities to unearth the mysteries of the Aurora supercomputer. The two-year grant recipients will write up what the Aurora supercompute Read more…

VAST Looks Inward, Outward for An AI Edge

October 9, 2024

There’s no single best way to respond to the explosion of data and AI. Sometimes you need to bring everything into your own unified platform. Other times, you lean on friends and neighbors to chart a way forward. Those Read more…

Google Reports Progress on Quantum Devices beyond Supercomputer Capability

October 9, 2024

A Google-led team of researchers has presented more evidence that it’s possible to run productive circuits on today’s near-term intermediate scale quantum devices that are beyond the reach of classical computing. � Read more…

Nvidia Is Increasingly the Secret Sauce in AI Deployments, But You Still Need Experience

October 14, 2024

I’ve been through a number of briefings from different vendors from IBM to HP, and there is one constant: they are all leaning heavily on Nvidia for their AI Read more…

NSF Grants $107,600 to English Professors to Research Aurora Supercomputer

October 9, 2024

The National Science Foundation has granted $107,600 to English professors at US universities to unearth the mysteries of the Aurora supercomputer. The two-year Read more…

VAST Looks Inward, Outward for An AI Edge

October 9, 2024

There’s no single best way to respond to the explosion of data and AI. Sometimes you need to bring everything into your own unified platform. Other times, you Read more…

Google Reports Progress on Quantum Devices beyond Supercomputer Capability

October 9, 2024

A Google-led team of researchers has presented more evidence that it’s possible to run productive circuits on today’s near-term intermediate scale quantum d Read more…

At 50, Foxconn Celebrates Graduation from Connectors to AI Supercomputing

October 8, 2024

Foxconn is celebrating its 50th birthday this year. It started by making connectors, then moved to systems, and now, a supercomputer. The company announced it w Read more…

The New MLPerf Storage Benchmark Runs Without ML Accelerators

October 3, 2024

MLCommons is known for its independent Machine Learning (ML) benchmarks. These benchmarks have focused on mathematical ML operations and accelerators (e.g., Nvi Read more…

DataPelago Unveils Universal Engine to Unite Big Data, Advanced Analytics, HPC, and AI Workloads

October 3, 2024

DataPelago this week emerged from stealth with a new virtualization layer that it says will allow users to move AI, data analytics, and ETL workloads to whateve Read more…

Stayin’ Alive: Intel’s Falcon Shores GPU Will Survive Restructuring

October 2, 2024

Intel's upcoming Falcon Shores GPU will survive the brutal cost-cutting measures as part of its "next phase of transformation." An Intel spokeswoman confirmed t Read more…

Shutterstock_2176157037

Intel’s Falcon Shores Future Looks Bleak as It Concedes AI Training to GPU Rivals

September 17, 2024

Intel's Falcon Shores future looks bleak as it concedes AI training to GPU rivals On Monday, Intel sent a letter to employees detailing its comeback plan after Read more…

Granite Rapids HPC Benchmarks: I’m Thinking Intel Is Back (Updated)

September 25, 2024

Waiting is the hardest part. In the fall of 2023, HPCwire wrote about the new diverging Xeon processor strategy from Intel. Instead of a on-size-fits all approa Read more…

Ansys Fluent® Adds AMD Instinct™ MI200 and MI300 Acceleration to Power CFD Simulations

September 23, 2024

Ansys Fluent® is well-known in the commercial computational fluid dynamics (CFD) space and is praised for its versatility as a general-purpose solver. Its impr Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Shutterstock_1687123447

Nvidia Economics: Make $5-$7 for Every $1 Spent on GPUs

June 30, 2024

Nvidia is saying that companies could make $5 to $7 for every $1 invested in GPUs over a four-year period. Customers are investing billions in new Nvidia hardwa Read more…

Shutterstock 1024337068

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Leading Solution Providers

Contributors

IBM Develops New Quantum Benchmarking Tool — Benchpress

September 26, 2024

Benchmarking is an important topic in quantum computing. There’s consensus it’s needed but opinions vary widely on how to go about it. Last week, IBM introd Read more…

Intel Customizing Granite Rapids Server Chips for Nvidia GPUs

September 25, 2024

Intel is now customizing its latest Xeon 6 server chips for use with Nvidia's GPUs that dominate the AI landscape. The chipmaker's new Xeon 6 chips, also called Read more…

Quantum and AI: Navigating the Resource Challenge

September 18, 2024

Rapid advancements in quantum computing are bringing a new era of technological possibilities. However, as quantum technology progresses, there are growing conc Read more…

IonQ Plots Path to Commercial (Quantum) Advantage

July 2, 2024

IonQ, the trapped ion quantum computing specialist, delivered a progress report last week firming up 2024/25 product goals and reviewing its technology roadmap. Read more…

Google’s DataGemma Tackles AI Hallucination

September 18, 2024

The rapid evolution of large language models (LLMs) has fueled significant advancement in AI, enabling these systems to analyze text, generate summaries, sugges Read more…

Microsoft, Quantinuum Use Hybrid Workflow to Simulate Catalyst

September 13, 2024

Microsoft and Quantinuum reported the ability to create 12 logical qubits on Quantinuum's H2 trapped ion system this week and also reported using two logical qu Read more…

US Implements Controls on Quantum Computing and other Technologies

September 27, 2024

Yesterday the Commerce Department announced export controls on quantum computing technologies as well as new controls for advanced semiconductors and additive Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire