TACC’s Supercomputers Accelerate OpenFold’s AI-Enhanced Protein Modeling Effort

August 13, 2024

Aug. 13, 2024 — Form follows function, especially so for life’s building blocks—proteins. The folds and shape of molecular proteins reveal their function in supporting life. A new, open source software tool called OpenFold has been developed by scientists that uses artificial intelligence (AI) and harnesses the power of supercomputers to predict protein structures.

The research could help develop new medicines and better understand misshapen proteins such as those linked to neurodegenerative diseases like Parkinson’s and Alzheimer’s disease.

A new, open source software tool called OpenFold has been developed by scientists that uses artificial intelligence and harnesses the power of supercomputers to predict protein structures. Image illustrates OpenFold matching the accuracy of AlphaFold2, using predictions by OpenFold and AlphaFold2 overlaid with an experimental structure of the Streptomyces tokunonesis TokK protein. Credit: DOI: 10.1038/s41592-024-02272-z.

OpenFold builds on the success of AlphaFold2, developed by Google DeepMind and used since 2021 by over two million researchers for protein predictions in vaccine development, cancer treatments, and more.

“AlphaFold2 was a breakthrough for science,” said Nazim Bouatta, a senior research fellow at Harvard Medical School who works at the interface of AI and biology. “We built a fully open source version—OpenFold—that is now helping academia and industry to move the field forward.”

Bouatta co-authored a study in the journal Nature Methods published in May 2024 announcing OpenFold, a fast, memory efficient, and trainable implementation of AlphaFold2. He started the project with his colleague Mohammed AlQuraishi, formerly at Harvard but now at Columbia University. The project grew into the OpenFold Consortium, a syndicate of startup companies working in collaboration with academia.

“Extremely talented students from Harvard and Columbia also contributed to the work with Gustaf Ahdritz doing a remarkable job. They all did an amazing job implementing the code,” Bouatta said.

A core facet of AI is the large language models (LLMs), which takes vast quantities of text and generates new and meaningful text from it, such as the human-like ability of ChatGPT to answer queries based on substantial amounts of text data.

“We need about 100 graphic processing units (GPUs) to train a system like OpenFold. To put things into perspective, to train the latest ChatGPT, you need thousands and thousands of GPUs,” Bouatta said.

One of the very first applications of OpenFold came from Meta AI, previously Facebook. Meta AI recently released an atlas of more than 600 million proteins from bacteria, viruses, and other microorganisms that had not yet been characterized.

“They used OpenFold to integrate a ‘protein language model,’ very similar to ChatGPT, but where the language is the amino acids that make up proteins,” Bouatta said.

Nazim Bouatta, Senior Research Fellow, Harvard Medical School.

“In a way, the information in living organisms is organized in a language,” Bouatta explained, referring to the example of the letters A-C-G-T that represent the four bases of DNA—adenine, cytosine, guanine, and thymine. “This is the language that nature picked to build these sophisticated living organisms.”

Going even further, there is a second layer of language for proteins, the letters that represent the 20 amino acids that make up all proteins in the human body and characterize what the protein can do. Genome sequencing has generated large data on the letters of life, but missing until now is a ‘dictionary’ that can take those letters and yield the shape of a protein in three dimensions and model the sites to bind small molecules to it.

“Machine learning allows us to take a string of letters, the amino acids that describe any kind of protein that you can think of, run a sophisticated algorithm, and return an exquisite three-dimensional structure that is close to what we get using experiments. The OpenFold algorithm is very sophisticated and uses new developments that we’re familiar with from ChatGPT and others,” Bouatta said, referring to the concepts developed by Google transformers and elements of the main ChatGPT algorithm.

A key advantage of OpenFold lies in its ability to train the model with a scientist’s own data, something that is not possible with the publicly available version of AlphaFold2. “Having the ability to train a system with OpenFold is opening major avenues for research both in academia and industry,” Bouatta said.

In the coming months Bouatta expects to release a modality of OpenFold with the ability to characterize a protein-ligand complex, the complicated orientation of small molecules that bind to a protein.

“That’s how drugs achieve their mechanism of action. Understanding this is particularly important,” he explained.

TACC awarded the OpenFold team allocations on the Frontera and Lonestar6 supercomputers, in particular the GPU nodes that have been instrumental in powering AI applications worldwide.

“TACC has been an extremely good collaborator,” Bouatta said. “I would like to thank TACC for allowing us to access these resources, which allowed us to deploy machine learning and AI at the scales we needed.”

“Supercomputers in combination with AI are radically changing how we approach biology. The power of a supercomputer is that they allow us to predict 100 million structures in just a few months. Once the system is trained, we can get structures in seconds. They will not replace experiments, however, because we need to go back to the lab to test our ideas.”

The integration of AI systems like OpenFold with more traditional physics-based systems is helping scientists understand life at the most fundamental level and opening avenues for treating neurodegenerative disease.

“Supercomputers are the microscope of the modern era for biology and drug discovery,” Bouatta concluded. “If we keep putting more resources into using the AI/computational approach with supercomputers, we can bootstrap our abilities to understand life and cure diseases.”

The study, “OpenFold: retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization,” was published May 2024 in the journal Nature Methods. The study author are Gustaf Ahdritz, Nazim Bouatta, Christina Floristean, Sachin Kadyan, Qinghui Xia, William Gerecke, Timothy J. O’Donnell, Daniel Berenberg, Ian Fisk, Niccolò Zanichelli, Bo Zhang, Arkadiusz Nowaczynski, Bei Wang, Marta M. Stepniewska-Dziubinska, Shang Zhang, Adegoke Ojewole, Murat Efe Guney, Stella Biderman, Andrew M. Watkins, Stephen Ra, Pablo Ribalta Lorenzo, Lucas Nivon, Brian Weitzner, Yih-En Andrew Ban, Shiyang Chen, Minjia Zhang, Conglong Li, Shuaiwen Leon Song, Yuxiong He, Peter K. Sorger, Emad Mostaque, Zhao Zhang, Richard Bonneau & Mohammed AlQuraishi. The authors thank the Flatiron Institute, OpenBioML, Stability AI, the Texas Advanced Computing Center and NVIDIA for providing compute for experiments in this paper. This research used resources of the National Energy Research Scientific Computing Center, which is supported by the Office of Science of the US Department of Energy under contract no. DE-AC02-05CH11231. They acknowledge the Texas Advanced Computing Center at the University of Texas at Austin for providing HPC resources that have contributed to the research results reported within their paper.


Source: Jorge Salazar, TACC

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Argonne’s HPC/AI User Forum Wrap Up

September 11, 2024

As fans of this publication will already know, AI is everywhere. We hear about it in the news, at work, and in our daily lives. It’s such a revolutionary technology that even established events focusing on HPC specific Read more…

Quantum Software Specialist Q-CTRL Inks Deals with IBM, Rigetti, Oxford, and Diraq

September 10, 2024

Q-CTRL, the Australia-based start-up focusing on quantum infrastructure software, today announced that its performance-management software, Fire Opal, will be natively integrated into four of the world's most advanced qu Read more…

Computing-Driven Medicine: Sleeping Better with HPC

September 10, 2024

As a senior undergraduate student at Fisk University in Nashville, Tenn., Ifrah Khurram's calculus professor, Dr. Sanjukta Hota, encouraged her to apply for the Sustainable Research Pathways Program (SRP). SRP was create Read more…

LLNL Engineers Harness Machine Learning to Unlock New Possibilities in Lattice Structures

September 9, 2024

Lattice structures, characterized by their complex patterns and hierarchical designs, offer immense potential across various industries, including automotive, aerospace, and biomedical engineering. With their outstand Read more…

NSF-Funded Data Fabric Takes Flight

September 5, 2024

The data fabric has emerged as an enterprise data management pattern for companies that struggle to provide large teams of users with access to well-managed, integrated, and secured data. Now scientists working at univer Read more…

xAI Colossus: The Elon Project

September 5, 2024

Elon Musk's xAI cluster, named Colossus (possibly after the 1970 movie about a massive computer that does not end well), has been brought online. Musk recently posted the following on X/Twitter: "This weekend, the @xA Read more…

Shutterstock 793611091

Argonne’s HPC/AI User Forum Wrap Up

September 11, 2024

As fans of this publication will already know, AI is everywhere. We hear about it in the news, at work, and in our daily lives. It’s such a revolutionary tech Read more…

Quantum Software Specialist Q-CTRL Inks Deals with IBM, Rigetti, Oxford, and Diraq

September 10, 2024

Q-CTRL, the Australia-based start-up focusing on quantum infrastructure software, today announced that its performance-management software, Fire Opal, will be n Read more…

NSF-Funded Data Fabric Takes Flight

September 5, 2024

The data fabric has emerged as an enterprise data management pattern for companies that struggle to provide large teams of users with access to well-managed, in Read more…

Shutterstock 1024337068

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Shutterstock 1897494979

What’s New with Chapel? Nine Questions for the Development Team

September 4, 2024

HPC news headlines often highlight the latest hardware speeds and feeds. While advances on the hardware front are important, improving the ability to write soft Read more…

Critics Slam Government on Compute Speeds in Regulations

September 3, 2024

Critics are accusing the U.S. and state governments of overreaching by including limits on compute speeds in regulations and laws, which they claim will limit i Read more…

Shutterstock 1622080153

AWS Perfects Cloud Service for Supercomputing Customers

August 29, 2024

Amazon's AWS believes it has finally created a cloud service that will break through with HPC and supercomputing customers. The cloud provider a Read more…

HPC Debrief: James Walker CEO of NANO Nuclear Energy on Powering Datacenters

August 27, 2024

Welcome to The HPC Debrief where we interview industry leaders that are shaping the future of HPC. As the growth of AI continues, finding power for data centers Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Atos goes under, HPE will be the only major option for building large-scale systems. Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Shutterstock_1687123447

Nvidia Economics: Make $5-$7 for Every $1 Spent on GPUs

June 30, 2024

Nvidia is saying that companies could make $5 to $7 for every $1 invested in GPUs over a four-year period. Customers are investing billions in new Nvidia hardwa Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Shutterstock 1024337068

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Leading Solution Providers

Contributors

IonQ Plots Path to Commercial (Quantum) Advantage

July 2, 2024

IonQ, the trapped ion quantum computing specialist, delivered a progress report last week firming up 2024/25 product goals and reviewing its technology roadmap. Read more…

Intel’s Next-gen Falcon Shores Coming Out in Late 2025 

April 30, 2024

It's a long wait for customers hanging on for Intel's next-generation GPU, Falcon Shores, which will be released in late 2025.  "Then we have a rich, a very Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Department of Justice Begins Antitrust Probe into Nvidia

August 9, 2024

After months of skyrocketing stock prices and unhinged optimism, Nvidia has run into a few snags – a  design flaw in one of its new chips and an antitrust pr Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

MLPerf Training 4.0 – Nvidia Still King; Power and LLM Fine Tuning Added

June 12, 2024

There are really two stories packaged in the most recent MLPerf  Training 4.0 results, released today. The first, of course, is the results. Nvidia (currently Read more…

xAI Colossus: The Elon Project

September 5, 2024

Elon Musk's xAI cluster, named Colossus (possibly after the 1970 movie about a massive computer that does not end well), has been brought online. Musk recently Read more…

Spelunking the HPC and AI GPU Software Stacks

June 21, 2024

As AI continues to reach into every domain of life, the question remains as to what kind of software these tools will run on. The choice in software stacks – Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire