After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its scope and operation in a briefing led by Undersecretary of Energy Paul Dabbar and attended by HPC leaders from national labs. The joint public-private effort will pool 16 systems which together offer more than 330 petaflops along with extensive cloud resources. A portal has been set up to receive COVID-19 project proposals.
This excerpt is from the portal:
“Researchers are invited to submit COVID-19 related research proposals to the consortium via this online portal, which will then be reviewed for matching with computing resources from one of the partner institutions. An expert panel comprised of top scientists and computing researchers will work with proposers to assess the public health benefit of the work, with emphasis on projects that can ensure rapid results.
“Fighting COVID-19 will require extensive research in areas like bioinformatics, epidemiology, and molecular modeling to understand the threat we’re facing and form strategies to address it. This work demands a massive amount of computational capacity. The COVID-19 High Performance Computing Consortium helps aggregate computing capabilities from the world’s most powerful and advanced computers to help COVID-19 researchers execute complex computational research programs to help fight the virus.”
Key government partners so far include Argonne National Laboratory, Lawrence Livermore National Laboratory, Los Alamos National Laboratory, Oak Ridge National Laboratory, Sandia National Laboratories, National Science Foundation, and NASA. Among industry partners are IBM, HPE, Amazon Web Services, Google Cloud, and Microsoft. A few examples from academia include MIT, Rensselaer Polytechnic Institute, University of Chicago, and Northwestern University. IBM is also hosting a central portal.
As explained by Dabbar, the current plan is to reallocate resources (compute cycles and expertise) rather than attempt to acquire and stand-up new resources. That said, additional resources could be made available as they come online. “We’re showing a very high degree of precedence towards this consortium and overall COVID-19 research,” said Dabbar. Systems will include leadership platforms such as Summit (ORNL) currently the fastest supercomputer in the world (Top500 List, Nov. 2019) and Sierra (LLNL). A fuller list of the computational resources available along with information about joining the consortium is at the end to the article.
In fact, several COVID-19 projects are already underway. Dabbar referenced an Oak Ridge National Lab project in which researchers explored 8000 compounds of interest narrowing that to 77 promising small molecule drug compounds. Not surprisingly the early COVID-19 drug research is focused on already approved drugs (~10,000) because they have already passed safety hurdles and more is known about them.
It’s worth noting DoE and other government agencies already have aggressive computational life science projects. The CANDLE project being run by the DOE Exascale Computing Project and supported by NCI is a good example. It’s focused on building machine learning tools for use in cancer research. There’s also the ATOM (Accelerating Therapeutics for Opportunities in Medicine) project at LLNL. Both CANDLE and ATOM are pivoting efforts toward COVID-19.
How the various supercomputing resources will be deployed varies. A key piece is working out therapies. This answer by one of the lab directors to a question on what the key bottleneck is that computational resources are being applied to does a nice job summarizing the directions:
“We kind of understand this virus because it’s similar to other Coronaviruses. It does have some mutations from SARS and from MERS. Computation is being used to build evolutionary trees, phylogenetic trees to understand the mutational patterns and how those are related geographically and temporally. Computation is being used to refine epitopes of small sub sequences that can be antigenic and are the first stage in trying to decide targets for vaccines. Computation is being used to design antibodies, which are also related to trying to improve a vaccine or antibiotic based treatments.
“In the case of the small molecule, there’s about 26-27 proteins that the virus codes for. Sixteen or so that are non-structural there that are involved in the virus replication inside the cell, and the rest are structural proteins that form the coat of the spike and so forth. Each of these is potentially affected by a number of pockets or sites that we can target with small molecule. Most of proteins form complexes, so they have many possible places you could potentially drug them.
“All told there’s probably 50 or 60 drug targets including interactions between the viral proteins and the host proteins. You have this large number of targets, many pockets potentially in each target. And you’ve got potentially billions of molecules that you want to look at in some efficient way to see whether or not they’re potentially good inhibitors of those interactions. And then screen them for toxicity, for all the things that the drugs would have to pass this criterion. So it’s a huge amount of computational work.
“We’re getting structures for the proteins from the light sources at Argonne, and neutron sources at Brookhaven and other places in the country as well as internationally. So there’s a constant flow of new protein structures that are refining the models that we already have of the 3D structures. And a number of those are coming with small molecule ligands bound into the pockets so we can understand the predictions from computation [and] how well they’re actually holding up in the laboratory when you co -crystallize with these small molecules.
“So the bottleneck is really in searching through this vast molecular space for good targets, with a particular focus on repurposing existing drugs since those are the probably the fastest route to point of care. But we’re looking at many, many libraries of molecules and trying many computational methods, including AI methods to try to accelerate the search, without doing just mechanistic modeling but AI based modeling as well.”
There’s a lot going on here and across other disciplines such as epidemiology, logistics, etc.
Dabbar likened the consortium’s efforts to a three-leg stool requiring the high-power systems themselves, expertise to run the systems, and subject matter experts with suitable problems to solve. One lab leader weighed in on their intent to mobilize internal resources to help researchers:
“[It’s] important to realize that these computing resources also have teams at each of the laboratories…experts at taking application codes and software and applying software tools from the laboratories to enhance the applications and to make them run more efficiently and effectively on the systems. So that as the projects are assigned, and teams are deployed and applications are deployed to the particular systems, I would also expect that there will be collaborations with the laboratory scientists to make sure that the applications can effectively use those resources.”
Dabbar noted, “DOE national labs and the 60,000 researchers that are at the National 17 sites is the largest basic research organization in the world. And it is the largest generator of Nobel Prize winners in the world.” He emphasized that supercomputing and AI are well-funded priorities in President Donald Trump Administration. The current administration’s enthusiasm for funding science more general and specifically in areas such NIH may be open to debate but now is not time.
Shown below are a screen shot of guidelines for preparing submissions (DoE web site) and a screen shot of computer resources (IBM portal):
Links to COVID-19 High Performance Computing Consortium: https://www.xsede.org/covid19-hpc-consortium; https://www.ibm.com/covid19/hpc-consortium/