We are at the verge of a profound transformation of the scientific enterprise, and that transformation will be driven by AI, according to Rick Stevens, Associate Laboratory Director for Computing, Environment and Life Sciences at Argonne National Laboratory. The opening keynote speaker for the PEARC19 conference on July 30, in his presentation “AI for Science” Stevens reminded his audience of the paired promise and limitations of such a revolution.
“We’re off to a good start but we gotta play it cool,” he said. “It’s emerging much faster than most people imagined possible.”
PEARC19, in progress in Chicago this week (July 28-Aug. 1), explores current practice and experience in advanced research computing including modeling, simulation and data-intensive computing. A primary focus this year is on machine learning and artificial intelligence. The PEARC organization coordinates the PEARC conference series to provide a forum for discussing challenges, opportunities and solutions among the broad range of participants in the research computing community.
The first two waves of AI, driven by symbolic methods that first encoded human knowledge for the machine to use, had their successes, Stevens explained. But they didn’t fundamentally change how computational, let alone scientific, work was done. The machine learning wave, on the other hand, stands poised to change just about everything in the scientific enterprise. Still, it would be good to bear in mind the “failure” of symbolic methods, and the limitations of machine learning, in pursuing that revolution, he warned.
“[AI is] impacting society, starting to impact science … and staring to impact computer design,” he said. Much more fundamentally, in the next 10 years “learned models will begin to replace data and the experimental discovery process will be dramatically refactored … Theory becomes data for next generation AI [as AI advances theory], and AI becomes a common part of scientific laboratory activities.”
Perhaps the least predictable intersection 12 years ago, Stevens said, was that between AI and HPC. The direction to be taken by the HPC field—and the role of machine learning—was not clear when a town hall meeting of practitioners met to discuss the future of exascale computing. In their 2007 report, Stevens noted, “AI was never mentioned at all; machine learning was not mentioned. Data was just barely mentioned.” How things changed. At a repeat of the town hall this year, these topics dominated discussion. “Now it would be irresponsible not to mention them.”
In 1996, the EQP symbolic AI proved the Robbins Problem in eight days using the full 30 MB of memory in an RS/6000 system—impressive for the time. Progress thereafter was slow, though. In the image recognition front, before 2012 such AIs made only modest headway in identifying ImageNet images, with error rates over 25%. That year, the advent of machine learning increased the slope of the progress, such that by 2015 the AIs classified images with less than a 5% error rate—superior to human performance. The AI field has exploded, with the number of papers in arXiv expanding faster than Moore’s Law.
But this progress hides some flaws, Stevens argued. While machine learning “connectionist” methods are insensitive to noisy data and can learn non-symbolic data, they lack the virtues of data efficiency and interpretability found in the symbolic methods.
“What we really want is something that integrates these things,” Sevens said. The field needs to “learn from both encoded symbolic theory and large-scale data, so we can leverage the vast theoretical knowledge” developed over the centuries. The goal, as he sees it, is “automated and accelerated discovery from planning to conjecture to experiment to confirmation and analysis—end-to-end automated science.”
In such AI-driven experimental science, for example, a drug discovery framework would simulate some cases in a supercomputer, gain understanding, populate a library of compounds with likely therapeutic function, and keep going largely in silico. In only a small number of cases would the researcher resort to laboratory experiment—and that would be automated using robots, perhaps in general-purpose experimentation facilities that rent out to scientists in much the same way the cloud rents out computation.
“It’s a flipped scientific method,” he said. “It doesn’t start with experiments; experiments are the last resort. You have to do that if you want it to go fast.”
Stevens foresees this future via four clusters of activity: applications; learning systems; foundational knowledge in a number of fields including mathematics, algorithms, and general AI; and hardware.
“Deep learning is really great, it’s working amazingly well,” he said. “But in general, it requires too much data to be the long-term solution” and it needs to “tell us what we’ve learned in a kind of human way” so that human scientists can understand the rules by which it operates and learn from them. Researchers are working on interrogating deep learning models for such information, but they’re not there yet. Among other things, “we need the models to tell us how confident they are in their predictions.”
Machine learning has the promise to touch almost every field of scientific research, including drug discovery, high energy physics, material science, astronomy, chemistry and medical imaging, Stevens noted in conclusion. Even researchers in fields that were traditionally very labor intensive will find themselves venturing away from the workbench and start relying on code.