Amid the much-deserved fanfare for the exascale Frontier system at Oak Ridge National Laboratory (ORNL), the lab, in April, launched a much smaller, but more laser-focused system: Cumulus-2, an HPC cluster serving the DOE’s Atmospheric Radiation Management (ARM) user facility.
Cumulus-2, built by Dell, consists of 32 total chassis. 28 of those are equipped with four nodes each, with each of those nodes containing dual AMD Epyc “Milan” 7713 CPUs and 256GB of memory. Four additional chassis contain four high-memory nodes each, which contain 512GB of memory rather than 256GB. (ORNL didn’t have a flops estimate on-hand, but we estimate it around 524 theoretical peak teraflops of computing power.) Cumulus-2 is networked with Nvidia’s InfiniBand HDR200 and connected to a 7PB filesystem.
The ORNL team started conceptualizing the system in early 2020. “Then,” Ryan Prout, HPC data analytics engineer for ORNL, told HPCwire, “the supply chain issues hit pretty hard for a while.” Nevertheless, by 2021, things were underway with Dell, and the system was eventually delivered over the course of a year or so. It was commissioned in April of 2022.
“One of the contributing factors for such a lead time … is our workflows,” Giri Prakash, director of the ARM datacenter at ORNL, told HPCwire. “So the previous Cumulus-1 and the current, new Cumulus-2, the architecture is so different that almost all of our scripts and workflows needed to be restructured quite significantly.”
Cumulus-2 is, as the name implies, a successor to the no-longer-operational Cumulus system, which was indeed very different: a Cray XC40 cluster with Intel Broadwell CPUs and Cray Aries networking. “The first round of Cumulus was a Cray-based system that had way fewer cores,” Prout said. “Now we’re going into more cores, more of a distributed system.” The Cumulus-2 cores are, however, lower-frequency—more of a “scaling-out” than a “scaling-up,” Prout explained.
“This new cluster will greatly accelerate processing speeds for simulations and boost capabilities to interpret ARM’s storehouse of data,” Prakash said in an interview with ORNL’s Coury Turczyn. “Cumulus-2 will offer roughly 4 times the power of Cumulus-1.”
The Cumulus systems—and a third, still-operational system called Stratus—support ARM in its mission to bolster climate research.
“In general, ARM, we are providing observational data to improve the uncertainty of climate research,” Prakash said. “We collect our really high-resolution measurements—both ground-based and aerial measurements—from three of our long-term observatories … but then on top of that we have three mobile facilities, where we ship multiple containers full of instruments to various places based on the scientific interest and then we do intensive study.”
ARM’s data stretches back to 1992, comprising some 11,000 data products across 3.4 PB of data—and Prakash said that 3 to 5PB of data from around 3,000 active data streams trickles in every day.
Cumulus-2 serves a couple of key functions in the context of ARM’s mission. “The primary use case for the new Cumulus is our high-resolution climate modeling using the LES [Large-Eddy Simulation] model,” Prakash said. “The modeling program itself is called LASSO.” Beyond the high-resolution modeling, though, Cumulus-2 also helps process and quality check ARM data before making it available to ARM users, and those who wish to undertake research projects using ARM data can also apply for time on Cumulus-2 via the group’s call for proposals.
“The majority of ARM users are from other labs, universities, and global research organizations. Last year, there were around 1,000 unique scientific users who produced over 200 journal articles,” Prakash said.
As for what’s next for ARM: Prakash said that ARM is still growing, with the group’s high-resolution modeling efforts expanding and a recent refresh of its aerial vehicle for data collection.
“Interestingly enough, [ARM] celebrated [its] 30-year anniversary this May,” Prakash said. “So it feels like we released the Cumulus-2 to the research community as part of celebrating the 30th year of high-quality research.”
To learn more about Cumulus-2 and ARM, read the reporting from ORNL’s Coury Turczyn here.
Header image: Ryan Prout (left) and Giri Prakash (right) with the Cumulus-2 system. Image courtesy of Carlos Jones/ORNL.