Harvard’s Faculty of Arts & Sciences Research Computing (FASRC) center announced a refresh of their primary HPC resource. The new cluster, called Cannon after the pioneering American astronomer Annie Jump Cannon, is supplied by Lenovo and utilizes their SD650 NeXtScale servers with direct-to-node water-cooling. FASRC, writing about the recent acquisition, cites the benefits of increased performance, density, ease of expansion, and controlled cooling.
The main Cannon system spans 670 SD650 NeXtScale servers, equipped with Intel Xeon 8268 (24-core) “Cascade Lake” processors, for a total of 48 cores and 192 GB of RAM per node. Communication takes place over HDR 100 Gbps InfiniBand connected in a single Fat Tree with 200 Gbps IB core.
Lenovo’s Neptune liquid cooling technology is enabling the Platinum 8286 processors to be operated at a higher clock rate of about 3.4GHz, compared to their 2.90GHz base frequency. At the higher clock rate, the theoretical peak for the 670 dual-CPU nodes (using “AVX-512 mode”) is nearly 3.5 petaflops. FASRC is also installing a GPU partition of 16 Lenovo SR670 servers each with four Nvidia V100 GPUs and 384 GB of RAM, connected by HDR — adding about another 450 teraflops.
The new gear arrives at FASRC as their previous cluster, Odyssey, is being decommissioned. The switchover occurred on September 24th, 2019. Odyssey had undergone continued improvements over the years. It received an infusion of 15,000 Intel Xeon Broadwell cores in November 2017 and Singularity containers were enabled in March 2018.
The FASRC Cannon cluster will support scientific modeling and simulation for thousands of Harvard researchers. It was installed with the support of the Faculty of Arts and Sciences, but all told it will occupy some 10,000 square feet spread out across three datacenters. The primary partition is housed in the Massachusetts Green High Performance Computing Center (MGHPCC), in Holyoke, Mass, and Harvard’s Boston and Cambridge facilities will house storage and login nodes, virtual machines, and specialty computing resources. FASRC maintains over 40 PB of storage, with Isilon, Lustre, Gluster and NFS filesystems all in the mix.
Harvard FASRC is a CentOS shop, relying on Puppet for cluster configuration management and SLURM for workload scheduling. Approximately 29,000,000 jobs are processed each year.
More information at https://www.rc.fas.harvard.edu/about/cluster-architecture/