The Department of Energy (DOE) pre-exascale supercomputer Summit is not scheduled to go live until early 2018, yet support staff at the Oak Ridge Leadership Computing Facility (OLCF) have been preparing for Summit’s arrival since the contract was announced last November. The degree of planning is only natural considering the expense and resources involved in standing up one of the first machines in its class with an expected 150-300 petaflops of performance.
To prepare for Summit, OLCF staff – including OLCF Scientific Computing (SciComp), Technology Integration (TechInt), and High-Performance Computing Operations (HPC Ops) groups – constructed a test bed early last year comprised of two clusters, Pike and Crest, each designed to represent elements of Summit’s hybrid CPU–GPU computing architecture. By probing the workings of Pike and Crest, staff and vendors have the opportunity to identify and fix problems in a preemptive fashion, ensuring that the transition to Summit goes as smoothly as possible.
The clusters both employ IBM Power8 parts, the predecessor to the Power9 CPUs that will power Summit, but are otherwise distinct to enable different aspects of Summit to be assessed.
Crest is a compute test bed comprised of four nodes, each with the aforementioned Power chips and four GPUs, presumably the most current Tesla chips since Summit will be built with future NVIDIA Volta GPUs. Crest will be used for scaling up scientific codes and testing early versions of software.
“We’re checking out compilers and building and running codes; that’s a good outcome of this,” said HPC Ops system administrator Don Maxwell, the team lead for Crest. “We will also begin using Crest to test new software that IBM is developing for Summit to ensure it meets our requirements.”
The other test system Pike has 14 Power nodes attached to non-volatile memory disk. It was designed to help OLCF become familiar with Summit’s high-speed data storage system. OLCF has primarily relied on Lustre file systems, but Summit will use IBM’s Elastic Storage System (ESS), which is based on IBM’s General Parallel File System technology. By running benchmark jobs on Pike, OLCF staff will have the opportunity to study attributes such as metadata performance, block I/O, random/sequential performance, and data management.
While Crest and Pike are first test systems to enable early exploration of Summit’s proposed computing architecture, they are by no means the last. The next planned test unit will incorporate NVLink, Summit’s node integration interconnect that NVIDIA will debut with Pascal and Volta GPUs.