Argonne AI for Science Colloquium Marks Challenges and Progress

By John Russell

November 9, 2021

It’s an understatement to say the effort to adapt AI technology for use in scientific computing has gained steam. Last spring, the Department of Energy released a formal report – AI for Science – suggesting an AI program not unlike the exascale program reaching fruition now. There’s also the broader U.S. National Artificial Intelligence Initiative pushing for AI use throughout society. Last week, as part of a year-long celebration of its 75th founding anniversary, Argonne National Laboratory held a Director’s Special Colloquium on AI for Science: From Atoms to the Cosmos.

Rick Stevens, ANL

Led by Rick Stevens (Argonne’s associate laboratory director for computing, environment and life sciences) and featuring a keynote and panel, the virtual meeting provided an interesting glimpse into both progress and challenges for AI use in science. It was the fourth colloquium in a series. The others tackled: Decarbonization Within Reach; The Quantum Revolution; and Energy Storage for a Changing World.

Jonathan Rowe, University of Birmingham

Keynote speaker Jonathan Rowe, a prominent computer scientist and mathematician with posts at the University of Birmingham and the Alan Turing Institute, set the stage: “Often, when I give talks on AI, I concentrate on all the great things that AI methods have already done in different scientific areas. But today, I’m going to do something slightly different, and we’ll talk about a number of challenges that that still remain in trying to make AI even better and more effective in helping with scientific research. We’ve made some progress on some, we’ve got some ideas about [some] and many, I think we’re still scratching our heads on.”

With that he plunged ahead, calling out ten AI challenges. (A recording of the full ANL colloquium is posted.) Perhaps number one wasn’t surprising – Data Management.

Considered boring by too many, said Rowe, “it’s essential that we start getting to grips with issues to do with data management. There’s all sorts of basic questions about how we manage data [of] that scale. For example, a very basic question is: what should you keep, and what should you throw away when you’re producing data? As an example, CERN routinely throw away nearly all the data that they generate from each experimental run because they simply can’t store everything they produce. They can barely store the essentials. They only store what’s necessary to support the conclusion they’re writing about as a result of the experiment.”

“So what does that mean? That means if you come up with an alternative theory about what might have happened in a particular experiment, you can’t go back and check the data. You have to ask to rerun the experiment and collect what you need for it. So that’s becoming a really big issue. We can generate so much data, but you have to somehow decide what are you going to keep and what are you going to throw away? That raises the question, ‘what [does] open access to data actually mean?’ The big programs like Square Kilometer Array, LSST (Large Synoptic Survey Telescope), they’re publicly funded, and they all promise their data is going to be open access. So LSST will produce something like 15 terabytes of data for one night’s observation. Could someone in the world just ask for that? It’s just not practical,” said Rowe.

“The second challenge is the whole question of how we incorporate scientific knowledge into our AI. The whole purpose of doing science is understanding and generating more knowledge. AI is really good at predicting stuff based on data and that’s useful in a lot of situations. But scientists are not satisfied with just being able to predict stuff. They want to be able to understand the system that’s producing the prediction. That’s really tricky. How can we use our existing scientific knowledge to make sure the AI methods produce better results? How can we make sure the results are even scientifically valid?

“One idea, which I guess quite a few people do, is to incorporate the current physical constraints or scientific constraints into your loss function when you’re when you’re doing your neural network. The example I’ve actually got here is from the lab for molecular biology, where they’re using some Bayesian system, and then incorporate information about molecular structure as a constraint in the Bayesian optimization system. That kind of gives you results that are, you know, physically more realistic than if you don’t do that, but still doesn’t necessarily produce stuff that really obeys the laws of physics,” he said.

“Another idea is to incorporate your physical understanding via a model or a simulation. What I’m showing here is the output of a system we developed with British Antarctic Survey for predicting at the Arctic sea ice. We trained it on hundreds of years’ worth of data, because that data was artificially generated through a physics based model. Then we fine-tuned it with actual observations. But that meant we’ve got a system that really does tend to conform to what’s known physically about the Arctic,” said Rowe.

Here’s the list of challenges that Rowe discussed: data management; scientific knowledge; uncertainty and noise; finding rare things; hidden structures; finding all of the 3D structures in a volume; counting and tracking; AI for digital twins; benchmarking; and closing the loop.

Rowe explored each challenge area and briefly discussed a few solution approaches being explored. One topic that touched on all of the challenges was benchmarking.

“We’ve got lots of different AI methods now. And we’ve got lots of different scientific data sets. What you’d like to be able to do is to work out for each different kind of data set, and for each different kind of scientific question, what are the best AI methods that are available. Or if you’re a computer scientist, and you’ve come up with what you think is a really neat AI method, you’d like to know how it compares to some of the other ones. And this is really hard right now,” said Rowe. “Similarly, the datasets are produced in different labs around the world and just kind of sit there. Somehow, you need to get these together, and you need to get them together in a way that makes it very easy to do comparisons and benchmarking.”

Rowe noted there is a fair amount of work is being done around benchmarking and cited work by the SciML group. “I want to call out one because this is done by the guys from the scientific machine learning group at the Rutherford [Appleton] labs, who we collaborate with. They’ve started putting forward something called SciML bench available on GitHub, which is really good start putting together a framework to do this. And they’ve got data now from environmental sciences, particle physics, astronomy, and so forth, where you can begin to do this benchmarking, if you’re interested in that. Please go and check it out and see how we might be able to add to it and help,” he said.

(An excerpt from the SciML website showing its tools is included at the end of the article)

The panel was also fascinating. Besides Stevens and Rowe, panelists included: Patrick Riley who leads the AI group at Relay Therapeutics, applying learning methods to the discovery process; Douglas Finkbeiner, professor of Astronomy and Physics at Harvard University; Subramanian Sankaranarayanan, group leader of the theory and modeling group in the Nanoscience and Technology division at ANL; and Rebecca Willett, a professor of statistics and computer science at the University of Chicago.

It’s best to watch the video directly to catch the interplay. One interesting current use of AI was cited by Douglas Finkbeiner of Harvard University. “For example, we want to see the dark mass of the universe. [Using] gravitational lensing, we can see the distortions in background objects behind the mass that requires measurements of the shapes and brightnesses of billions of galaxies,” said Finkbeiner. “Two ways we’ve used AI for that are to keep the telescope system in focus and to de-blend galaxies.”

“Jonathan [Rowe] mentioned the LSST survey at the Bureau Rubin telescope that will produce several petabytes of data over the years. I’ve probably ordered 1000 detections [for] each of tens of billions of objects. So it’s quite a bit of data. [J]ust keeping a telescope like that very complex eight-meter optical system in focus is a bit of a challenge. There are 50 control parameters in the optical system that need to be fixed correctly. It turns out, there’s a nice way to do that with convolutional neural nets. Then once you’ve actually got the images, you can pull out information about the individual galaxies. That’s not so hard if that galaxy is just off by itself isolated, but often these galaxies are kind of overlapping each other, and that is much more of a challenge. We’ve been applying convolutional neural nets to deep-learning the galaxies.”

Link to video: https://www.youtube.com/watch?v=sUYCCfdJkjM

Link to ANL website hosting AI in Science colloquia material: https://www.anl.gov/event/ai-for-science-from-atoms-to-the-cosmos

EXCERPT FROM SCIML WEBSITE

SciML: Open Source Software for Scientific Machine Learning

SciML is a NumFOCUS sponsored open source software organization created to unify the packages for scientific machine learning. This includes the development of modular scientific simulation support software, such as differential equation solvers, along with the methodologies for inverse problems and automated model discovery. By providing a diverse set of tools with a common interface, we provide a modular, easily-extendable, and highly performant ecosystem for handling a wide variety of scientific simulations.

Core Components

High Performance and Feature-Filled Differential Equation Solving. The library DifferentialEquations.jl is a library for solving ordinary differential equations (ODEs), stochastic differential equations (SDEs), delay differential equations (DDEs), differential-algebraic equations (DAEs), and hybrid differential equations which include multi-scale models and mixtures with agent-based simulations. The templated implementation allows arbitrary array and number types to be compatible, giving compatibility with arbitrary precision floating point numbers, GPU-based computations, unit-checked arithmetic, and other features. DifferentialEquations.jl is designed for both high performance on large-scale and small-scale problems, and routinely benchmarks at the top of the pack.

Physics-Informed Model Discovery and Learning. SciML contains a litany of modules for automating the process of model discovery and fitting. Tools like DiffEqParamEstim.jl and DiffEqBayes.jl provide classical maximum likelihood and Bayesian estimation for differential equation based models, while DiffEqFlux.jl enables the training of embedded neural networks inside of differential equations (neural differential equations or universal differential equations) for discovering unknown dynamical equations, DataDrivenDiffEq.jl estimates Koopman operators (DMD) and utilizes methods like SInDy to turn timeseries data into LaTeX for driving differential equations, and ReservoirComputing.jl for Echo State Networks that learn to predict the dynamics of chaotic systems.

A Polyglot Userbase. While the majority of the tooling for SciML is built using the Julia programming language, SciML is committed to ensure that these methodologies can be used throughout the greater scientific community. Tools like diffeqpy and diffeqr bridge the DifferentialEquations.jl solvers to Python and R respectively, and we hope to see many more developments along these lines in the near future.

Compiler-Assisted Model Analysis and Sparsity Acceleration. Scientific models generally have structures like locality which leads to sparsity in the program structures that can be exploited for major performance acceleration. The SciML builds a set of interconnected tools for generating numerical solver code directly on the models that are being simulated. SparsityDetection.jl can automatically detect the sparsity patterns of Jacobians and Hessians from arbitrary source code, while ModelingToolkit.jl can rewrite differential equation models to re-arrange equations for better stability and automatically parallelize code. These tools then connect with affiliated packages like SparseDiffTools.jl to accelerate solving with DifferentialEquations.jl and training with DiffEqFlux.jl.

ML-Assisted Tooling for Model Acceleration. SciML supports the development of the latest ML-accelerated toolsets for scientific machine learning. Methods like Physics-Informed Neural Networks (PINNs) and Deep BSDE methods for solving 1000 dimensional partial differential equations are productionized in the NeuralPDE.jl library. Surrogate-based acceleration methods are provided by Surrogates.jl.

Differentiable Scientific Data Structures and Simulators. The SciML ecosystem contains pre-built scientific simulation tools along with data structures for accelerating the development of models. Tools like LabelledArrays.jl and MultiScaleArrays.jl make it easy to build large-scale scientific models, while other tools like NBodySimulator.jl provide full-scale simulation simulators.

Tools for Accelerated Algorithm Development and Research. SciML is an organization dedicated to helping state-of-the-art research in both numerical simulation methods and methodologies in scientific machine learning. Many tools throughout the organization automate the process of benchmarking and testing new methodologies to ensure they are safe and battle tested, both to accelerate the translation of the methods to publications and to users. We invite the larger research community to make use of our tooling like DiffEqDevTools.jl and our large suite of wrapped algorithms for quickly test and deploying new algorithms.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Raja Koduri and Satoshi Matsuoka Discuss the Future of HPC at SC21

November 29, 2021

HPCwire's Managing Editor sits down with Intel's Raja Koduri and Riken's Satoshi Matsuoka in St. Louis for an off-the-cuff conversation about their SC21 experience, what comes after exascale and why they are collaborating. Koduri, senior vice president and general manager of Intel's accelerated computing systems and graphics (AXG) group, leads the team... Read more…

Jack Dongarra on SC21, the Top500 and His Retirement Plans

November 29, 2021

HPCwire's Managing Editor sits down with Jack Dongarra, Top500 co-founder and Distinguished Professor at the University of Tennessee, during SC21 in St. Louis to discuss the 2021 Top500 list, the outlook for global exascale computing, and what exactly is going on in that Viking helmet photo. Read more…

SC21: Larry Smarr on The Rise of Supernetwork Data Intensive Computing

November 26, 2021

Larry Smarr, founding director of Calit2 (now Distinguished Professor Emeritus at the University of California San Diego) and the first director of NCSA, is one of the seminal figures in the U.S. supercomputing community. What began as a personal drive, shared by others, to spur the creation of supercomputers in the U.S. for scientific use, later expanded into a... Read more…

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed

November 24, 2021

Details about two previously rumored Chinese exascale systems came to light during last week’s SC21 proceedings. Asked about these systems during the Top500 media briefing on Monday, Nov. 15, list author and co-founder Jack Dongarra indicated he was aware of some very impressive results, but withheld comment when asked directly if he had... Read more…

SC21’s Student Cluster Competition Winners Announced

November 19, 2021

SC21 may have been the first major supercomputing conference to return to in-person activities, but not everything returned to the live menu: the Student Cluster Competition – held virtually at ISC 2020, SC20 and ISC 2021 – was again held virtually at SC21. Nevertheless, Students@SC Chair Jay Lofstead took the physical stage at SC21 on Thursday to announce the... Read more…

AWS Solution Channel

Royalty-free stock illustration ID: 1616974732

Using the Slurm REST API to integrate with distributed architectures on AWS

The Slurm Workload Manager by SchedMD is a popular HPC scheduler and is supported by AWS ParallelCluster, an elastic HPC cluster management service offered by AWS. Read more…

MLPerf Issues HPC 1.0 Benchmark Results Featuring Impressive Systems (Think Fugaku)

November 19, 2021

Earlier this week MLCommons issued results from its latest MLPerf HPC training benchmarking exercise. Unlike other MLPerf benchmarks, which mostly measure the training and inference performance of systems that are availa Read more…

Raja Koduri and Satoshi Matsuoka Discuss the Future of HPC at SC21

November 29, 2021

HPCwire's Managing Editor sits down with Intel's Raja Koduri and Riken's Satoshi Matsuoka in St. Louis for an off-the-cuff conversation about their SC21 experience, what comes after exascale and why they are collaborating. Koduri, senior vice president and general manager of Intel's accelerated computing systems and graphics (AXG) group, leads the team... Read more…

Jack Dongarra on SC21, the Top500 and His Retirement Plans

November 29, 2021

HPCwire's Managing Editor sits down with Jack Dongarra, Top500 co-founder and Distinguished Professor at the University of Tennessee, during SC21 in St. Louis to discuss the 2021 Top500 list, the outlook for global exascale computing, and what exactly is going on in that Viking helmet photo. Read more…

SC21: Larry Smarr on The Rise of Supernetwork Data Intensive Computing

November 26, 2021

Larry Smarr, founding director of Calit2 (now Distinguished Professor Emeritus at the University of California San Diego) and the first director of NCSA, is one of the seminal figures in the U.S. supercomputing community. What began as a personal drive, shared by others, to spur the creation of supercomputers in the U.S. for scientific use, later expanded into a... Read more…

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed

November 24, 2021

Details about two previously rumored Chinese exascale systems came to light during last week’s SC21 proceedings. Asked about these systems during the Top500 media briefing on Monday, Nov. 15, list author and co-founder Jack Dongarra indicated he was aware of some very impressive results, but withheld comment when asked directly if he had... Read more…

SC21’s Student Cluster Competition Winners Announced

November 19, 2021

SC21 may have been the first major supercomputing conference to return to in-person activities, but not everything returned to the live menu: the Student Cluster Competition – held virtually at ISC 2020, SC20 and ISC 2021 – was again held virtually at SC21. Nevertheless, Students@SC Chair Jay Lofstead took the physical stage at SC21 on Thursday to announce the... Read more…

MLPerf Issues HPC 1.0 Benchmark Results Featuring Impressive Systems (Think Fugaku)

November 19, 2021

Earlier this week MLCommons issued results from its latest MLPerf HPC training benchmarking exercise. Unlike other MLPerf benchmarks, which mostly measure the t Read more…

Gordon Bell Special Prize Goes to World-Shaping COVID Droplet Work

November 18, 2021

For the second (and, hopefully, final) year in a row, SC21 included a second major research award alongside the ACM 2021 Gordon Bell Prize: the Gordon Bell Special Prize for High Performance Computing-Based COVID-19 Research. Last year, the first iteration of this award went to simulations of the SARS-CoV-2 spike protein; this year, the prize went... Read more…

2021 Gordon Bell Prize Goes to Exascale-Powered Quantum Supremacy Challenge

November 18, 2021

Today at the hybrid virtual/in-person SC21 conference, the organizers announced the winners of the 2021 ACM Gordon Bell Prize: a team of Chinese researchers leveraging the new exascale Sunway system to simulate quantum circuits. The Gordon Bell Prize, which comes with an award of $10,000 courtesy of HPC pioneer Gordon Bell, is awarded annually... Read more…

IonQ Is First Quantum Startup to Go Public; Will It be First to Deliver Profits?

November 3, 2021

On October 1 of this year, IonQ became the first pure-play quantum computing start-up to go public. At this writing, the stock (NYSE: IONQ) was around $15 and its market capitalization was roughly $2.89 billion. Co-founder and chief scientist Chris Monroe says it was fun to have a few of the company’s roughly 100 employees travel to New York to ring the opening bell of the New York Stock... Read more…

Enter Dojo: Tesla Reveals Design for Modular Supercomputer & D1 Chip

August 20, 2021

Two months ago, Tesla revealed a massive GPU cluster that it said was “roughly the number five supercomputer in the world,” and which was just a precursor to Tesla’s real supercomputing moonshot: the long-rumored, little-detailed Dojo system. Read more…

Esperanto, Silicon in Hand, Champions the Efficiency of Its 1,092-Core RISC-V Chip

August 27, 2021

Esperanto Technologies made waves last December when it announced ET-SoC-1, a new RISC-V-based chip aimed at machine learning that packed nearly 1,100 cores onto a package small enough to fit six times over on a single PCIe card. Now, Esperanto is back, silicon in-hand and taking aim... Read more…

US Closes in on Exascale: Frontier Installation Is Underway

September 29, 2021

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, held by Zoom this week (Sept. 29-30), it was revealed that the Frontier supercomputer is currently being installed at Oak Ridge National Laboratory in Oak Ridge, Tenn. The staff at the Oak Ridge Leadership... Read more…

AMD Launches Milan-X CPU with 3D V-Cache and Multichip Instinct MI200 GPU

November 8, 2021

At a virtual event this morning, AMD CEO Lisa Su unveiled the company’s latest and much-anticipated server products: the new Milan-X CPU, which leverages AMD’s new 3D V-Cache technology; and its new Instinct MI200 GPU, which provides up to 220 compute units across two Infinity Fabric-connected dies, delivering an astounding 47.9 peak double-precision teraflops. “We're in a high-performance computing megacycle, driven by the growing need to deploy additional compute performance... Read more…

Intel Reorgs HPC Group, Creates Two ‘Super Compute’ Groups

October 15, 2021

Following on changes made in June that moved Intel’s HPC unit out of the Data Platform Group and into the newly created Accelerated Computing Systems and Graphics (AXG) business unit, led by Raja Koduri, Intel is making further updates to the HPC group and announcing... Read more…

Intel Completes LLVM Adoption; Will End Updates to Classic C/C++ Compilers in Future

August 10, 2021

Intel reported in a blog this week that its adoption of the open source LLVM architecture for Intel’s C/C++ compiler is complete. The transition is part of In Read more…

Killer Instinct: AMD’s Multi-Chip MI200 GPU Readies for a Major Global Debut

October 21, 2021

AMD’s next-generation supercomputer GPU is on its way – and by all appearances, it’s about to make a name for itself. The AMD Radeon Instinct MI200 GPU (a successor to the MI100) will, over the next year, begin to power three massive systems on three continents: the United States’ exascale Frontier system; the European Union’s pre-exascale LUMI system; and Australia’s petascale Setonix system. Read more…

Leading Solution Providers

Contributors

Hot Chips: Here Come the DPUs and IPUs from Arm, Nvidia and Intel

August 25, 2021

The emergence of data processing units (DPU) and infrastructure processing units (IPU) as potentially important pieces in cloud and datacenter architectures was Read more…

D-Wave Embraces Gate-Based Quantum Computing; Charts Path Forward

October 21, 2021

Earlier this month D-Wave Systems, the quantum computing pioneer that has long championed quantum annealing-based quantum computing (and sometimes taken heat fo Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer... Read more…

HPE Wins $2B GreenLake HPC-as-a-Service Deal with NSA

September 1, 2021

In the heated, oft-contentious, government IT space, HPE has won a massive $2 billion contract to provide HPC and AI services to the United States’ National Security Agency (NSA). Following on the heels of the now-canceled $10 billion JEDI contract (reissued as JWCC) and a $10 billion... Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-ap Read more…

Quantum Computer Market Headed to $830M in 2024

September 13, 2021

What is one to make of the quantum computing market? Energized (lots of funding) but still chaotic and advancing in unpredictable ways (e.g. competing qubit tec Read more…

2021 Gordon Bell Prize Goes to Exascale-Powered Quantum Supremacy Challenge

November 18, 2021

Today at the hybrid virtual/in-person SC21 conference, the organizers announced the winners of the 2021 ACM Gordon Bell Prize: a team of Chinese researchers leveraging the new exascale Sunway system to simulate quantum circuits. The Gordon Bell Prize, which comes with an award of $10,000 courtesy of HPC pioneer Gordon Bell, is awarded annually... Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire