While not a golden HPC spike, the final blade has been loaded into Aurora. As mentioned previously, final preparation of Aurora is underway. Aurora the “almost ready” Exascale machine is expected to come on-line at the U.S. Department of Energy’s (DOE) Argonne National Laboratory. One of the first of three U.S. exascale machines expected will deliver over two peak exaflops of computing power, an increase over its initial target when it was slated for an earlier delivery date. Powered by Intel Xeon Max series GPUs (Ponte Vecchio) and dual Intel Xeon Max series CPUs with HBM (High Bandwidth Memory) the Exascale machine will include 10,624 blades (compute nodes) connected by the Slingshot Ethernet interconnect from Hewlett Packard Enterprise.
After years of hard work and adaptive planning, the system now contains all the hardware that will make it one of the most powerful supercomputers in the world. Built by Intel and Hewlett Packard Enterprise (HPE), Aurora will be theoretically capable of delivering more than two ExaFLOPS of computing performance.
According to Susan Coghlan, Argonne Leadership Computing Facility (ALCF) project director for Aurora:
“We’re looking forward to putting Aurora through its paces to make sure everything works as intended before we turn the system over to the broader scientific community.”
Aurora, one of the nation’s first exascale supercomputers, will enable science that is impossible today.
“We have been living and breathing the Aurora installation since the first pieces were delivered in November of 2021,” said Susan Coghlan, ALCF project director for Aurora. “While we still have a lot of work to do before we can roll the system out to scientists worldwide, it is incredibly exciting to have the final hardware in place.”
Not your common system blade, each blade weighing in at around 70 pounds (32 Kilos) requires a team of technicians and a specialized machine to delicately place the units vertically into Aurora’s refrigerator-sized racks. Each of the system’s 166 racks contains 64 blades and are spread out across eight rows, occupying the space of two professional basketball courts in the ALCF data center.
Before the system could be installed, Argonne had to carry out some major facility upgrades. This included adding new data center space to provide enough room for the supercomputer and building mechanical rooms and equipment to provide increased power and cooling capacity.
The challenge remains for users to scale applications and take advantage of the Exascale performance available to them. For the past few months, they have been working on the two-rack Sunspot testbed to a test and development systems with the exact same architecture as Aurora. These early user applications will also help to stress test the supercomputer and identify potential bugs that need to be resolved ahead of its full deployment.
“We’re looking forward to putting Aurora through its paces to make sure everything works as intended before we turn the system over to the broader scientific community,” Coghlan said.
Based on original story by Jim Collins