Like fighting fire with fire, Lawrence Livermore National Laboratory (LLNL) is using its ‘Corona’ supercomputer to battle the coronavirus, and thanks to a boost from vendor partner AMD, Corona will now have twice the computing firepower.
Under its commodity CTS-1 contract, awarded to Penguin Computing in 2016, LLNL debuted the Corona high-performance computing cluster in late 2018. The unclassified system, named for the total solar eclipse of 2017, is dedicated to open science. It is the first A+A supercomputer (powered by AMD Epyc CPUs and AMD Radeon GPUs), and can be considered an early precursor to the forthcoming exascale systems, Frontier at Oak Ridge National Laboratory and El Capitan at LLNL itself.
The Penguin Tundra system spans 170 dual-socket nodes populated with 24-core AMD Epyc 7401 “Naples” processors. In the original configuration, half of those nodes housed four AMD Radeon Instinct MI25 accelerators. An upgrade completed just ahead of SC19 filled out the rest of those GPU-ready nodes with an infusion of AMD Radeon Instinct MI60 accelerators, four per node.
Now, under a new agreement, AMD is sponsoring the swapping out of those earlier MI25s with MI50s that offer 9x more double-precision floating point performance and about 10 percent more single-precision performance. The upgraded graphics accelerators, 168 of them, will nearly double the system’s peak double-precision compute power from 2.8 petaflops to 4.7 petaflops.
Corona is one machine in a virtual army of HPC resources – more than 30 systems – deployed under the COVID-19 HPC Consortium, announced by the White House and Department of Energy last month (see HPCwire coverage). AMD joined the consortium earlier this month.
LLNL researchers have been applying HPC power to finding potential antibodies and anti-viral compounds for SARS-CoV-2, the virus that causes COVID-19. (For more details, check out Addison Snell’s “This Week in HPC” interview with LLNL’s Jim Brase featured here.)
Each MI50 AMD Instinct Radeon accelerator offers 6.6 teraflops of peak double-precision, or 13.3 teraflops of peak single-precision performance, roughly on par with the MI60, which provides 7.4 teraflops peak double-precision, 14.7 teraflops peak single-precision. Both the MI50 and MI60 are based on AMD’s 7nm Vega GPU and support 32 HBM2 ECC Memory with up to 1 TB/s memory bandwidth. The MI25 (based on 14nm Vega) comes close on single-precision — 12.29 teraflops — but its peak double-precision rating is only 768 gigaflops.
AMD is providing the upgraded GPUs at no cost to LLNL in support of COVID-19 research and collaborative development efforts, as part of its $15 million COVID-19 fund. “In exchange for the upgraded GPUs, AMD is securing compute cycles that will be used for a variety of purposes, including providing time for LLNL COVID-19 research and proposals approved by the COVID-19 HPC Consortium, as well as supporting development efforts by AMD software engineers and application specialists,” notes a lab announcement.
“The addition of these new state-of-the-art GPUs on Corona will boost the capability of the teams working on COVID-19,” said Jim Brase, LLNL’s deputy associate director for Programs. “It’s going to allow us to go faster, with more throughput. We’ll have more resources, so we can run more cases and potentially get to new designs for both antibodies and small molecules faster, that may lead to better treatments. They’ll also enable some of our new software, both for simulation and machine learning applications, to run more efficiently and better.”