Decoding the replication mechanisms of the SARS-CoV-2 virus has been a key research quest as the COVID-19 pandemic continues. For the scientific computing world, creating accurate models of how the virus replicates itself so efficiently has been an all-hands-on-deck endeavor.
One company contributing to the fight with artificial intelligence and machine learning is Cerebras Systems. The second iteration of the company’s Wafer-Scale Engine machine, the CS-2 system, plays an important role as part of the AI Testbed at Argonne National Laboratory and has contributed to a multi-agency COVID-19 reproduction study that was nominated as a Gordon Bell Special Prize finalist.
The research team behind the Gordon Bell Special Prize-nominated paper, comprised of scientists from 12 national laboratories, universities and companies, began with taking three-dimensional cryo-electron microscopy images that show the virus in “near-atomic resolution.” However, these images are not robust enough to study the approximately 2 million atoms that comprise its intricate replication system.
The study’s lead author and computational biologist at Argonne, Arvind Ramanathan, says the virus is like a “Swiss watch, with precisely organized enzymes and nanomachines that come together like tiny gears,” in order to replicate itself.
To pinpoint the tiny gears of that molecular “machinery,” analysis tools are applied to the 3D images using a hierarchical artificial intelligence framework to obtain the missing data needed for modeling.
These simulation experiments require thousands of node-hours on a supercomputer, and the study’s authors sought to increase their computational efficiency in order to analyze more of the 3D images. Freeing up processing nodes saves time and computing power, and machine learning provides an answer for this via computational steering, halting erroneous simulations and encouraging more-promising simulations. This is accomplished by training a machine learning model called a “convolutional variational autoencoder,” or CVAE.
“We train the model by letting it observe snapshots of the simulations. We then run the reverse transformation – or decode it,” said Vishal Subbiah, tech lead manager for ML frameworks at Cerebras and co-author of the study, in a company blog post. “If the decoded version is a good match for the original, we know the CVAE is working. That trained model can then be used during ‘real’ experiments by another algorithm that does the actual steering.”
The researchers then did some comparison benchmarking: training the CVAE model on one Cerebras CS-2 system and also on 256 nodes of ORNL’s Summit supercomputer, which harnesses 1,536 GPUs. As stated in their paper, they found that “the CS-2 delivers out-of-the-box performance of 24,000 samples/s, or about the equivalent of 110-120 GPUs.”
Cerebras was happy to tout this achievement of GPU equivalence, but noted that the out-of-the-box aspect is equally important. Subbiah mentions how the CS-2 “is intentionally architected as a single, ultra-powerful node with cluster-scale performance,” and that their software “makes it easy to get a neural network running by changing just a couple of lines of code.”
Launched earlier this year, the CS-2 is based on Cerebras’ second-generation Wafer-Scale Engine (WSE-2) chip, made by TSMC on its 7nm node, featuring 2.6 trillion transistors and 850,000 cores. The WSE-2 has 40GB of on-chip SRAM memory, along with 20 petabytes of memory bandwidth and 220 petabits of aggregate fabric bandwidth. Argonne National Laboratory was an early user of Cerebras’ CS-1 machine and was one of the first customers to take delivery of the CS-2.