Four months ago, Rommie Amaro and her colleagues were accepting the first-ever Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research. At the time, cases were slowly ramping up in advance of what we now know was to become a devastating winter surge. When Amaro spoke to the National Science Foundation (NSF) last week, on the other hand, the setting was different: we now know the vaccines work, cases are plummeting, and looking back on the pandemic suddenly doesn’t seem like a fanciful notion.
It’s been over a year since Amaro and her co-authors published the first atomic-level simulation of the full-length spike protein – the now-notorious mechanism that allows SARS-CoV-2 to invade human cells, which is targeted by all of the approved vaccines. Just before the novel coronavirus truly began ramping up, Amaro’s lab at the University of California, San Diego was wrapping up some lengthy research.
“Until about February of last year, my lab … had been focused for a number of years in studying the influenza virus and its glycoproteins,” Amaro said. They published in early 2020 – and then as Italy began to fall to COVID-19, they realized what they had to do. By mid-February, researchers from the University of Texas at Austin and the National Institutes of Health (NIH) had provided the necessary data to get started: the first cryoelectron microscope model of the virus’ spike protein. “The day that structure dropped into the bioarchive … was when we really pivoted our efforts to SARS-CoV-2,” Amaro recalled.
Her lab worked quickly, bringing AI, analytics and HPC to bear with startling speed to produce that first computational model of the spike protein. “If two years ago, you would have said that you – this would have been unimaginable, to think that anybody could have accomplished [it],” Amaro said. “And it was just only possible because of just a tremendous sort of collaborative effort from a number of teams.”
The lab’s computational biology approach allowed them to “see” elements hidden from real-world microscopy. “We build these highly detailed atomic-level models and then we’re approximating that system down to its many atoms,” Amaro explained. “And so all we’re doing is defining a potential function [that] basically describes the interactions that all the atoms in our system have with each other, and then we’re simply integrating Newton’s equation of motion over time[.] And we perform this numerical integration millions and billions and trillions of time[.]”
“We want to sort of give more insight into the bits that they cannot see with the experiments,” she continued. “And this is, I think, the beautiful synergy that exists at this interdisciplinary interface between experimental science and computational science, but also together with physics and chemistry and biology and math.”
Into the abyss
So, with that first model done, the team delved deeper, producing models of the receptor-binding domain (RBD) of the spike protein in its “open” and “closed” conformations, which had been observed, but still vexed researchers. As they fleshed out more and more of the spike protein, they arrived at its glycans.
“The proteins get sort of this extra decoration, this extra flourish of sugars, or glycans,” Amaro said. “Literally, if you look at this, it sort of looks like ornaments on a Christmas tree.” These ornaments serve, by and large, to shield the protein from the scrutiny of the human immune system, which see the sugary coatings as innocuous.
In partnership with a wide range of institutions, the researchers were able to reconstruct these glycans on the spike protein using molecular “recipes” that determine their structure. And, finally, this allowed them to reconstruct the full-length spike protein in excruciating detail: multiple states, all of its glycans, membranes with different lipids and much more.
“And then, we sort of simulate it, right?” Amaro said. “And so we start to see how these atoms move and how they wiggle and jiggle.”
And, it turns out, that wiggling and jiggling was quite revealing. Using the new simulations, the researchers saw that those mysterious open and closed conformations of spike protein served, in fact, to expose the RBD beyond the glycan shields in preparation for binding with human cells.
“Instead of calling this ‘up’ and ‘down,’ if they had known about the sugars at the time when they were first naming, they would have called it a defending mode and an attacking mode,” Amaro said.
This invaluable work won the wide-ranging team the Gordon Bell Special Prize at SC20. Along the way, of course, they used a similarly wide range of supercomputers, including heavy-hitters like Summit at the Oak Ridge Leadership Computing Facility (OLCF) and Frontera at the Texas Advanced Computing Center (TACC).
“This is more than just graphics,” Amaro said. “These are more than just pretty pictures. It’s not a video game. These are molecular dynamics simulations – this is numerical, statistical mechanics. And so what that means is that … this motion that we’re predicting is done in accordance with rigorous theoretical laws – to, of course, some approximation, but what’s powerful about this is that it allows us to extract from these microscopic properties macroscopic, experimentally testable predictions.”
While there is now – at long last – an air of finality and retrospection to discussions of the COVID-19 pandemic, Amaro and her colleagues aren’t hanging up their hats just yet.
“As we keep going, you know, we’ve also been very interested to develop models of the entire virus,” she said, adding that she was “intensely interested” in the airborne transmissibility of the virus as a research topic.
The work also provided fertile ground for some new norms – at least, for now.
“Scientists, we always hold our cards close to our chest because science rewards people being first,” Amaro said. But, of course, such siloed work was not conducive to ending a pandemic. “And so in March, we drafted a set of principles that nearly every molecular simulation group in the world committed to. This included the use of preprint servers, fair data, sharing of systems and … all of that. It led to the creation of the Copenhagen Molecular Structure and Therapeutics Hub, which is another NSF-sponsored investment in molecular simulation. … It’s like a clearinghouse for simulations, data, systems, methods all over the world.”
The dataset Amaro and her colleagues produced on Frontera has already been downloaded more than 4,000 times. This kind of open access to data and software, she said, was crucial to ensure that when “the next thing hits,” researchers will be ready.
The work discussed in this article involved an extensive array of institutions and individuals who produced multiple academic papers. The Gordon Bell Prize-winning paper, titled “AI-Driven Multiscale Simulations Illuminate Mechanisms of SARS-CoV-2 Spike Dynamics,” was authored by Lorenzo Casalino, Abigail Dommer, Zied Gaieb, Emilia P. Barros, Terra Sztain, Surl-Hee Ahn, Anda Trifan, Alexander Brace, Anthony Bogetti, Heng Ma, Hyungro Lee, Matteo Turilli, View ORCID ProfileSyma Khalid, Lillian Chong, Carlos Simmerling, David J. Hardy, Julio D. C. Maia, James C. Phillips, Thorsten Kurth, Abraham Stern, Lei Huang, John McCalpin, Mahidhar Tatineni, Tom Gibbs, John E. Stone, Shantenu Jha, Arvind Ramanathan and Rommie E. Amaro. The paper can be accessed here.
Header image: an image from Rommie Amaro’s acknowledgements slide highlighting some of the collaborators on the research.