San Diego Supercomputer Center Helps Advance Computational Chemistry

Feb. 9, 2021 — Even though computational chemistry represents a challenging arena for machine learning, a team of researchers from the Massachusetts Institute of Technology (MIT) may have made it easier. Using Comet at the San Diego Supercomputer Center at UC San Diego and Bridges at the Pittsburgh Supercomputing Center, they succeeded in developing an artificial intelligence (AI) approach to detect electron correlation – the interaction between a system’s electrons – which is vital but expensive to calculate in quantum chemistry.

AI-based methods, however, show promise in making electron correlation detection much more tractable while improving the throughput, or number of materials that can be analyzed, of such computations. With Comet and Bridges, Professor Heather Kulik and her MIT colleagues developed several unique artificial neural network models, which are published in the Journal of Chemical Theory and Computation and the Journal of Physical Chemistry Letters. These simulations could help advance an array of new materials with predictive modeling.

“In these two papers, we first developed supervised models to predict high-quality, high-cost diagnostics of strong correlation at low computational cost,” said Kulik, a computational chemist and chemical engineering professor at MIT. “We overcame the fact that diagnostics seldom agree to build a consensus-based classifier model, so we used various low- and predicted high-cost as inputs to the virtual adversarial training of an artificial neural network model in what we believe to be the first semi-supervised learning model applied to computational chemistry.”

The simulations showed how certain strong correlations could be present in some, but not other, molecules typically explored during high-throughput screening of materials. This allowed the researchers to identify when more affordable computational models would be predictive.

Multi-reference character of 3,165 structures as evaluated by two of the 15 diagnostics used by experts in the field, nHOMO[MP2] (top left) and C02 (top right). Bottom panels show all 15 diagnostics displayed using the uniform manifold approximation and projection (UMAP), with the bottom/top 10 percent for the two metrics shown as solid blue/red circles. This machine learning approach makes it possible to predict multi-reference character and determine whether computationally inexpensive techniques such as density functional theory (DFT) are sufficient. Credit: Heather Kulik et al, MIT.

Using machine learning, the models were able to make the predictions for strong correlation in the materials at a much lower computational cost than conventional methods, potentially accelerating the search for materials in a range of applications, such as finding drug-like compounds for treating diseases or new materials for improving batteries.

“This type of machine learning model is uniquely suited to this multi-stage approach because it is robust and stands up to noisy/erroneous inputs,” further explained Fang Liu, an NSF Molecular Sciences Software Institute fellow who was co-author on both papers. “We used a great deal of theoretical chemistry codes to conduct our studies and that would not have been possible without Comet and Bridges.”

The team’s workflow, MultirefPredict, interfaced with at least three electronic structure codes and used both central processing units (CPUs) and graphics processing units (GPUs) on Comet and Bridges.

Kulik and her team were allocated time on the supercomputers via the National Science Foundation’s (NSF) Extreme Science and Engineering Discovery Environment (XSEDE). “Due to our complex requirements, having resources where we could set up workflows to run in an interoperable manner with different codes was very helpful for us,” said Kulik, who also teaches a course on XSEDE resources. “Using those supercomputers firsthand allowed me to think about ways I can teach students who may just be learning computational chemistry to complement their experimental research for ways that they can use not only now but in the future.”

This research was primarily supported by the U.S. Department of Energy (DE-SC0018096). Access to Comet and Bridges was provided by XSEDE (TG-CHE140073).

Source: Kimberly Mann Bruch, UCSD