Sierra, the 125 petaflops (peak) machine based on IBM’s Power9 chip being built at Lawrence Livermore National Laboratory, sometimes takes a back seat to Summit, the ~200 petaflops system being built at Oak Ridge National Laboratory and expected to crest the Top500 list in June. Like Sierra, Summit features a heterogeneous architecture based on Power9 and Nvidia V100 GPUs, with Mellanox EDR InfiniBand connecting the nodes.
Livermore today posted a brief update on Sierra’s progress along with a short video. Trucks began delivering racks and hardware over the summer with system acceptance scheduled in fiscal 2018. Sierra, part of the CORAL effort, is expected to provide four to six times the sustained performance of the Lab’s current workhorse system, Sequoia.
“Sierra is what we call an advanced technology platform,” says Mike McCoy, program director, Advanced Simulation and Computing, in the video. “[It] will serve the three NNSA (National Nuclear Security Administration) laboratories. So the ATS2, which is Sierra, is the second in a series of four systems that are on a roadmap to get us to exascale computing [around] 2024.”
Sierra is expected to have roughly 260 racks and will be the biggest computer installed at Livermore in size, number of racks, and speed.
“IBM analyzed our benchmark applications, showed us how the system would perform well for them, and how we would be able to achieve similar performance for our real applications,” said Bronis de Supinski, Livermore Computing’s chief technology officer and head of Livermore Lab’s Advanced Technology (AT) systems, in the article. “Another factor was that we had a high probability, given our estimates of the risks associated with that proposal, of meeting our scheduling requirements.”
While Lab scientists have positive indications from their early access systems, de Supinski said until Sierra is on the floor and running stockpile stewardship program applications, which could take up to two years, they won’t be certain how powerful the machine will be or how well it will work for them.
Sierra will feature two IBM Power9 processors and four Nvidia Volta GPUs per node. The Power9s will provide a large amount of memory bandwidth from the chips to Sierra’s DDR4 main memory, and the Lab’s workload will benefit from the use of second-generation NVLink, forming a high-speed connection between the CPUs and GPUs.
As Livermore’s first extreme-scale CPU/GPU system, Sierra has presented challenges to Lab computer scientists in porting codes, identifying what data to make available on GPUs and moving data between the GPUs and CPUs to optimize the machine’s capability. Through the Sierra Center of Excellence, Livermore Lab code developers and computer scientists have been collaborating with on-site IBM and NIVIDIA employees to port applications.
Feature Image: Sierra, LLNL