Many products were sacrificed in the eight years it took to bring the Aurora supercomputer to life. Nonetheless, anticipation for the second U.S.Exascale system reached a fever pitch over the last two years.
The supercomputer, built on Intel technology, finally crossed the finish line and made it to the Top500 list, but it has not reached the promised two exaflops of performance.
The supercomputer at Argonne National Labs placed second in the Top500 November list released this week, behind Frontier, which retained the top spot. The system delivered a peak performance of 585.34 PFlop, ranking it second on the list but still not into Exascale territory.
Aurora has more than 60,000 GPUs, making it the largest GPU installation in the world. It has over 10,000 computing nodes, over 166 racks, and over 80,000 networking nodes.
Argonne submitted HPL runs for a portion of Aurora, so the benchmarking is incomplete.
The system could ultimately pass two exaflops when all the testing and finetuning is complete, Top500 wrote in a statement.
“Aurora is currently being commissioned and will reportedly exceed Frontier with a peak performance of 2 EFlop/s when finished,” Top500 wrote.
Intel does not have a major GPU until 2025, so this may be the company’s last major Top 10 entry for a few years. Intel has canceled its next-generation Rialto Bridge GPU and has scheduled its next major GPU upgrade for release in 2025.
Meanwhile, Nvidia has three new GPUs coming in the next three years, and AMD’s Epyc CPUs and MI300A will be in a two-exaflop system called El Capitan, which is being installed at the Lawrence Livermore National Lab.
Aurora was first announced in 2015 as a 200-petaflop system and has survived eight years of configuration changes, hardware cancellations, and budget delays.
The initial Aurora system was scheduled to come online in 2018. At the time, it was due to have Intel’s now-canned Xeon Phi code-named Knights Hill, Xeon CPUs, and silicon photonics. At the time, Xeon Phi was Intel’s response to supercomputing GPUs.
Aurora’s plans changed after Intel axed the Xeon Phi chips in 2017 and replaced it with “a new platform and new microarchitecture specifically designed for exascale,” the company said.
Xeon Phi mixed vector processors with low-power CPUs, and it wasn’t a complete failure. It was in four top 10 systems, including China’s Tianhe-2A, in the November 2017 Top500 list.
But Nvidia’s GPUs broke Phi in the June 2018 list, when it took the first and third spot, in the Top500 with Summit and Sierra, which were based on IBM’s Power9 chips.
There was a lot of action when Intel discontinued Phi in 2017. In September of that year, the Advanced Scientific Computing Advisory Committee announced a change of plans that made Aurora US’s first exascale system. Intel and Cray were retained as vendors, and the server’s delivery date was moved to 2021 from 2018.
In late 2017, Intel hired graphics guru Raja Koduri away from AMD, which signaled the chipmaker’s intent to create a GPU. Intel wanted to replicate the success of Nvidia’s GPUs in supercomputers.
In March 2019, the U.S. Department of Energy announced Intel and Cray would deliver Aurora by 2021. Intel announced Ponte Vecchio on the sidelines of Supercomputing 2019 and said it would use the GPU in Aurora.
But then more problems beset Intel. In July 2020, Intel delayed its move to the 7-nm process technology on which Ponte Vecchio would be made.
Ultimately, Intel had to turn to manufacturing rival TSMC for some Ponte Vecchio parts. The GPU has a chiplet design and more than 100 billion transistors and has sixteen compute tiles made on TSMC’s 5-nm process and eight tiles made on Intel’s 7-nm process.
The supercomputer’s main CPU, the 4th Gen Xeon chip code-named Sapphire Rapids, was also delayed by over a year, which delayed the installation.
Over the years, the U.S. Department of Energy has remained patient with Aurora. In a 2024 budget request, the DoE mentioned that Covid-related supply chain issues had delayed Aurora and that shortages had slowed the technical implementation.
In June 2023, Intel finally announced that it had completed the supercomputer installation. But the system still has not reached its peak of 2 exaflops, and further software finetuning will push the system speed even further.
The Aurora is being used for many A.I. and scientific computing applications, said Ogi Brkic, vice president and general manager for data center and HPC solutions, in a press briefing.
Aurora will be used to train a 1-trillion-parameter large-language model for scientific research. The supercomputer is also being used to reconstruct the mouse brain, which could take three years.
“This gives you a complexity of the problems being solved here. If you want to map the human brain, that’s not even close.” Brkic said.
The mouse brain reconstruction project called Connectome was running on 512 Aurora nodes and was showing better performance than Polaris, a top-20 supercomputer deployed recently by Argonne.
“Applications today are continuously being optimized, and they are not just functional, but they’re scalable, which is very important when trying to get to the solution fast,” Brkic said.
The A.I. capabilities on Aurora are also being used to understand the interaction between particles. Some data sets are polluted by other particles in the cosmos and noise, which is where A.I. fits in.
“Understanding these interactions requires a training algorithm that allows you to train and understand and reference these interactions most effectively,” Brkic said.
Intel also shared additional details about its next-generation GPU and A.I. chips. Next year, the company will start shipping the Gaudi 3 chip, competing with Nvidia’s GPUs.
The 2025 enterprise GPU, Falcon Shores, will mix Gaudi3 AI accelerator technology and general-purpose GPU cores. It will also have HBM memory and standard Ethernet switching and will support a wide range of large-language models, Brkic said.
Brkic also talked about the next Xeon chip, Emerald Rapids, which will be officially announced on December 14 and is considered an incremental upgrade to Sapphire Rapids. The successor to Emerald Rapids, called Granite Rapids, has many excited. It can do both A.I. and conventional and has breakthrough memory and bandwidth technology.
Intel is also trying to get Nvidia customers off the proprietary CUDA parallel programming framework and onto its OneAPI through tools such as SYCLomatic, which can strip proprietary code so A.I. models can work on industry-standard hardware.