Intel’s 7nm node delay has raised questions about the status of the Aurora supercomputer that was scheduled to be stood up at Argonne National Laboratory next year. Aurora was in the running to be the United States’ first exascale supercomputer although it was on a contemporaneous timeline with Oak Ridge National Lab’s Frontier supercomputer (with both systems scheduled for delivery inside of 2021).
With a one-year delay of Intel’s 7nm node that is integral to Aurora’s GPU engine (the Intel Xe-based Ponte Vecchio), would Intel contract an outside foundry to fab the GPU die? And what would the impact be on the speeds and feeds and delivery schedule for Aurora?
We don’t have all those answers yet, but we did get broad confirmation of the disruption from the DOE’s Office of Science.
There are indications that Aurora will in fact be delayed, but Frontier at Oak Ridge National Laboratory is on track as is the Exascale Computing Project, reported Barb Helland, associate director of the Office of Science for Advanced Scientific Computing Research (ASCR) during an Advanced Scientific Computing Advisory Committee (ASCAC) meeting, held last week (Sept. 24-25).
“It’s not unexpected that when we’re entering into contracts for the most advanced supercomputers in the world, four to five years before they’re deployed, that there will be some schedule delays,” said Helland. “For that reason, we build both cost and schedule contingencies into our project budgets.”
The DOE Office of Science was not ready to provide further details at this time but did state they are working closely with Intel.
“Yes, we have indications that the Aurora system will be delayed. But Argonne is currently working with Intel to mitigate the consequences not only to Argonne, but to the Exascale Computing Project and to the nation’s high-performance computing users.”
While seeming to downplay the setback, Helland reiterated that Oak Ridge’s Frontier machine is on track to be delivered in the calendar year 2021, and that the ECP project is also on track to complete on time (by Q4 of FY24 at the outside).
“I’m confident we’re going to get through this in a way that gets this problem resolved to the to the benefit of the country and the program,” said Chris Fall, director of the Office of Science. “We’re still having conversations and figuring out the details, but I’m very comfortable. I think we’ll get where we need to get on that.”
It is reasonable for systems pushing the boundaries of scope and scale to encounter unforeseen circumstances that impact target goalposts, but Aurora has already been significantly redefined after previous delays and cancellations in Intel’s roadmaps. Originally conceived as a pre-exascale supercomputer to be stood up at Argonne in 2018, Aurora was recast in 2017 as the nation’s first exascale machine with a 2021 target.
It would appear that Oak Ridge National Lab’s Frontier supercomputer is now lined up to be the nation’s first exascale system. The DOE is working with Oak Ridge, HPE and AMD to stand up the 1.5 exaflops (minimum peak) Frontier in late 2021. Lawrence Livermore Lab’s El Capitan system (slated to deliver 2 exaflops peak using HPE and AMD technology) is scheduled for approximately one year later (delivery in early 2023). The question is: where will Aurora fit in the timeline?
HPE Cray EX supercomputing is the foundation of all three planned exascale systems. HPE is the prime contractor on Frontier and El Capitan, while Intel is the prime on Aurora.
In a statement provided to HPCwire, Intel said it “remains committed to delivering the Aurora supercomputer to Argonne National Laboratory and enabling exascale leadership at the U.S. Department of Energy.”