For three years running, ACM has awarded not only its long-standing Gordon Bell Prize (read more about this year’s winner here!) but also its Gordon Bell Special Prize for High Performance Computing-Based Covid-19 Research. At SC22 in Dallas, ACM President Cherri Pancake announced that the third annual Covid award would go to a team that used several supercomputers (as well as Cerebras Systems’ CS-2) to transform large language models to analyze Covid-19 variants.
The Gordon Bell Special Prize recognizes “outstanding research achievement towards the understanding of the Covid-19 pandemic through the use of high-performance computing.” ACM explains that it selects nominees based on performance and innovation in computational methods and the researchers’ contributions toward understanding Covid’s nature, spread and treatments. Like the main Gordon Bell Prize, the Special Prize is accompanied by a $10,000 award courtesy of HPC luminary Gordon Bell.
This year, there were three nominees (of a possible six). On Wednesday, the finalists presented their research to an enthusiastic crowd – and then, on Thursday, ACM President Cherri Pancake announced that the winner of the Gordon Bell Special Prize was…
GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics
This project adapted large language models (LLMs) into GenSLMs, short for “genome-scale language models.” Training the models on 110 million prokaryotic gene sequences, they fine-tuned a model on 1.5 million SARS-CoV-2 genomes and used that model to model likely mutations of SARS-CoV-2. Using data just from the first year of the pandemic, the model suggested the development of the Delta and Omicron variants. The research was run on three major supercomputers: Argonne National Lab’s Polaris system, Nvidia’s Selene system and NERSC’s Perlmutter system; beyond those three, the researchers showed impressive results on Cerebras Systems’ AI-oriented CS-2 systems. The paper was authored by a 34-person team from Argonne National Laboratory, California Institute of Technology, Harvard University, Northern Illinois University, Technical University of Munich, University of Chicago, University of Illinois Chicago, Cerebras, Nvidia and Argonne National Laboratory.
This morning, HPCwire published extensive coverage of the winning project, which can be found at this link.
The other two finalists
While only one team could take home the prize, don’t miss out on the incredible research produced by the other two nominees. Read more about those projects below.
Running Ahead of Evolution – AI Based Simulation for Predicting Future High-Risk SARS-CoV-2 Variants
The dozen researchers behind this project – hailing from Peng Cheng Laboratory, Peking University, and Shandong University – also looked at Covid variants. Using the Peng Cheng Cloudbrain-II system, based on Huawei Ascend NPUs (neural network processing units), the team scaled across 4,096 accelerators to train a protein language model on 408 million protein sequences and build a screening process for the prediction of binding affinity and antibody escape. In their abstract, the team says that they “successfully identify mutations in the RBD regions of 5 VOCs and can screen millions of potential variants in seconds” – a result that they say will help to prepare for “a future pandemic that will inevitably take place.” The research achieved peak performance of 366.8 petaflops in mixed precision.
TwoFold: highly accurate structure and affinity prediction for protein-ligand complexes from sequences
The TwoFold project was led by eight researchers from Oak Ridge National Laboratory, who leveraged both of the lab’s flagship systems: the 148.6 Linpack petaflops Summit system and the exaflop-plus Frontier system that debuted this summer as the world’s first system to achieve exascale on Linpack. Using those behemoth systems, the researchers applied AI to study protein-ligand binding affinity prediction. More specifically, TwoFold gets its name from the two predictions it produces when given just a viral protein’s amino acid sequence: first, protein folding; and second, molecule folding. Producing both of these predictions at the same time reduces the need for protein analysis through crystallization, speeding the time to a meaningful result in drug discovery for a novel virus like Covid-19. To produce TwoFold, the researchers used Summit to train a neural network on drug-target interactions; the resulting million-plus-row equation matrix was solved on Frontier.
A new Gordon Bell Special Prize focus for 2023
As a teaser at the end of the session, Pancake unveiled a change for next year’s Gordon Bell Special Prize. While the prize was created for – and awarded to – outstanding research on Covid-19 from 2020 to 2022, Pancake shared that 2023 will be different. “Please know that next year,” she said, “the special Gordon Bell prize will be for applications addressing climate change.”