At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, in Arlington, Va., yesterday (Sept. 26), it was revealed that the “Aurora” supercomputer is on track to be the United States’ first exascale system. Aurora, originally named as the third pillar of the CORAL “pre-exascale” project, will still be built by Intel and Cray for Argonne National Laboratory, but the delivery date has shifted from 2018 to 2021 and target capability has been expanded from 180 petaflops to 1,000 petaflops (1 exaflop).
The fate of the Argonne Aurora “CORAL” supercomputer has been in limbo since the system failed to make it into the U.S. DOE budget request, while the same budget proposal called for an exascale machine “of novel architecture” to be deployed at Argonne in 2021. Until now, the only official word from the U.S. Exascale Computing Project was that Aurora was being “reviewed for changes and would go forward under a different timeline.”
Officially, the contract has been “extended,” and not cancelled, but the fact remains that the goal of the Collaboration of Oak Ridge, Argonne, and Lawrence Livermore (CORAL) initiative to stand up two distinct pre-exascale architectures was not met.
According to sources we spoke with, a number of people at the DOE are not pleased with the Intel/Cray (Intel is the prime contractor, Cray is the subcontractor) partnership. It’s understood that the two companies could not deliver on the 180-200 petaflops system by next year, as the original contract called for. Now Intel/Cray will push forward with an exascale system that is some 50x larger than any they have stood up.
It’s our understanding that the cancellation of Aurora is not a DOE budgetary measure as has been speculated, and that the DOE and Argonne wanted Aurora. Although it was referred to as an “interim,” or “pre-exascale” machine, the scientific and research community was counting on that system, was eager to begin using it, and they regarded it as a valuable system in its own right. The non-delivery is regarded as disruptive to the scientific/research communities.
Another question we have is that since Intel/Cray failed to deliver Aurora, and have moved on to a larger exascale system contract, why hasn’t their original CORAL contract been cancelled and put out again to bid? With increased global competitiveness, it seems that the DOE stakeholders did not want to further delay the non-IBM/Nvidia side of the exascale track. Conceivably, they could have done a rebid for the Aurora system, but that would leave them with an even bigger gap if they had to spin up a new vendor/system supplier to replace Intel and Cray. Starting the bidding process over again would delay progress toward exascale – and it might even have been the death knell for exascale by 2021, but Intel and Cray now have a giant performance leap to make and three years to do it. There is an open question on the processor front as the retooled Aurora will not be powered by Phi/Knights Hill as originally proposed.
These events beg the question regarding the IBM-led effort and whether IBM/Nvidia/Mellanox are looking very good by comparison. The other CORAL thrusts — Summit at Oak Ridge and Sierra at Lawrence Livermore — are on track, with Summit several weeks ahead of Sierra, although it is looking like neither will make the cut-off for entry onto the November Top500 list as many had speculated.
We reached out to representatives from Cray, Intel and the Exascale Computing Project (ECP) seeking official comment on the revised Aurora contract. Cray and Intel declined to comment and we did not hear back from ECP by press time. We will update the story as we learn more.