AI is the Next Exascale – Rick Stevens on What that Means and Why It’s Important

By Tiffany Trader and John Russell

August 13, 2019

Editor’s Note: Twelve years ago the Department of Energy (DOE) was just beginning to explore what an exascale computing program might look like and what it might accomplish. Today, DOE is repeating that process for AI, once again starting with science community town halls to gather input and stimulate conversation. The town hall program is being led by a trio of distinguished DOE scientists – Rick Stevens (Argonne National Laboratory), Kathy Yelick (Lawrence Berkeley National Laboratory, and Jeff Nichols (Oak Ridge National Laboratory). HPCwire’s coverage of the first AI Town Hall, held at ANL, can be found here. 

With the Exascale Initiative even its name gives a hint to its goal. Scale. The expectation was and is to boost computing power such that important problems currently out of reach of petascale computing can be tackled. Without diminishing Exascale’s immense potential impact, it’s worth noting that the problem-solving approaches are generally the same simulation and modeling techniques used for decades. AI promises something different. Scale is certainly an element, particularly in terms of handling massive datasets, but AI also brings new methodologies and new computing perspectives to science. What does that really mean? Well for one thing, perhaps surprisingly, it may mean a significant place for experimentalists who previously were relegated to hovering around the edges of HPC while computer scientists did the heavy lifting. More on that later.

Recently, HPCwire had an opportunity to talk at length with Rick Stevens about DOE’s effort to develop a strategy and eventually an implementation plan – an AI for Science Plan, not unlike the Exascale Initiative. It was a wide-ranging conversation touching on: infrastructure needs; constituency changes; AI’s effect on Post-Moore’s Law era; AI-driven code debugging and systems management; the need for scientific AI benchmarks; and much more. (Thank you Rick) Presented here is a portion of that conversation. It makes good midsummer reading. Enjoy.

HPCwire: Walk us through the program, give us a sense of what these AI and science town halls are all about and what they are trying to accomplish?

RS: If you remember back in 2007, we had three town hall meetings – at Argonne, Berkeley and Oak Ridge – that launched the whole DOE Exascale project and so forth. At that time the idea was to get people together and ask them, for exascale, what if we could build these faster machines, what would you do with them. It was a way to get people thinking about the possibility of that and of course it took long time to get the exascale computing program going. With these town halls we are kind of asking a variation on that question.

Rick Stevens, Associate Laboratory Director, Argonne National Laboratory

Now we’re asking the question of what’s the opportunity for AI in science or the application of science, particularly in the context of DOE, but more broadly because DOE’s got a lot of collaborations with NIH and other agencies. So really asking the fundamental question of what do we have to do in the AI space to make it relevant for science. The point of the town halls – three in the labs and one in Washington in October – is go get people thinking about what opportunities there are in different scientific domains for breakthrough science that can be accomplished by leveraging AI and working AI into simulation, and bringing AI into big data, bringing AI to the facility and so forth.

So that’s the concept; it’s really to get the community moving. Now DOE and other agencies are all part of this national AI initiative that’s been launched in part by the White House executive order this year. Maintain leadership in AI. In that announcement and subsequent OMB budget priority letters that went out to the agencies, it prioritized progress in AI as the number one priority across the agencies. In addition, it challenged agencies to come up with plans, to figure out resourcing levels, to make progress on managing their data, so better for training AI and so forth. It laid out a very high level blueprint as to what the country needs to do to maintain progress in AI and to complement in the academic sector and government what’s going on in the internet companies.

Clearly there’s huge progress in the internet space, but the Facebooks and Googles and Microsofts and Amazons and so on, they are not going to be the primary drivers for AI in areas like high-energy physics or nuclear energy or wind power or for cancer – it’s not their business focus. So the challenge is how to leverage the investments made by the private sector to build on those to add what’s missing for scientific applications – and there’s lots of things missing. Then figure out what the computing community has to do to position the infrastructure and our investments in software and algorithms and math and so on to bring the AI opportunity closer to where we currently are.

The town halls will produce a report. It will be out by the end of the year. That report will inform program planning, budget planning, strategic planning, certainly at the Department of Energy, but the October meeting will also have eight other agencies there, so it will influence their thinking as well.

Let me pause there.

HPCwire: We’ve talked at HPCwire about how AI writ large encompasses many technologies that have been bubbling for years but suddenly there’s a sense that AI as it becomes refined has the potential to have deliver a step function in progress. With exascale computing, for example, the expectation is it will allow us to do things at a scale that we were unable to do before. Are we thinking that the infusion of AI, combining it as part of scientific computing, is going to have that same kind of enormous step function impact?

RS: Absolutely. There’s two or three factors to that. In and of itself AI allows us to do things that dramatically can improve rates of discovery in science or just to be able to process large amounts of data using machine learning methods that you can’t do without them. To some degree, various communities have already been embracing machine learning and other AI techniques over the last few years but it’s starting to ramp up. We’re starting to see an exponential growth of the adoption of these things, not only in terms of the number of people but also the number of cycles that are being requested on the big machines.

So first answer is yes. The step function, a non-linear acceleration, and we see that happening various ways. One is clearly we are getting lots of data from experimental sources, from simulations, observations, and to gain insight into that data, to make predictions from that data and to have data-driven modeling, AI is one of the few ways that can keep up with the scale of data so that’s one version of it. Another version of it is we’re starting to see huge opportunities in hybridizations between simulation and machine learning methods, whether you’re using machine learning to control simulation or using machine learning functions embedded in simulations to replace certain functions that we otherwise would have computed explicitly with physical models, now replacing those with machine learning models that often are accurate enough for what we need, but also much faster and kind of self-improving over time.

Simulations themselves are going to evolve in new ways and in many cases by bringing the machine learning in really tight integration with simulations, [making] the simulations go faster; you’ll be getting a performance boost on top of the scaling from exascale and we can then avoid putting cycles into things that are less useful. So with a machine learning algorithm steering a computational campaign, we can probably more accurately determine which simulations we actually have to run to achieve some kind of result.

So there’s that kind of background effect, but probably more interesting is there’s whole new ways of problem solving that we’re starting to see emerging that combine generative networks–this is happening with materials and chemistry and biology in particular. We’re using a class of methods called generative models to generate candidate objects. They can be molecules, they can be molecular configurations in materials, or they can be biological sequences or something, based on training these models on some data sets and then using machine learning to predict properties of these things.

Let’s say you are searching for a drug molecule so you generate thousands or millions of drug candidates, you predict their properties, use active learning to figure out how good your models are, and then you prioritize through active learning say a whole bunch of simulations to prove your understanding. Then you prioritize experiments to collect data where you don’t have enough say parameters for your simulations or machine learning. So it’s the idea that there’s going to be machine learning coupled with simulations and coupled at prioritizing experiments, and that we’re going to add more automation of experiments. [One result is] we’re seeing this incredible interest in growth in robotics in laboratories. So robots that can test thousands of samples per day, or can do experiments in biology in an automated fashion, or can screen things. [Those approaches], of course, have been used for a while in pharma, but it’s now starting to break out in more basic biology and materials science and chemistry.

We are seeing a convergence of all this stuff at the same time, enormous progress in simulation capability, progress in AI, coupled with that progress in robotics and new thinking about how to tie all that together. That’s one of the emerging things from these town halls in spaces like chemistry, materials and biology. In high-energy physics, at CERN for example, those huge detectors, they need to filter data. The vast majority of data gets filtered just to the events that you are interested in, and all the code now that does that filtering was hand crafted years ago. As we improve the accelerators and as we get new detectors and so on, the community is thinking if there’s a better way to do that software. Can much of the trigger software and the analysis software be replaced by machine learning methods? And even can the simulations of detectors be replaced by machine learning methods?

We’re seeing it across the board, so what these town halls are doing is giving us a chance to level set across all the disciplines. We had about 350 people at the meeting here at Argonne. First day was in application breakouts so by science domain, and the second day, we kind of transposed everybody and it was all cross-cutting topics – ranging from data-less cycles to the mathematics of uncertainty quantifications to integration of simulation of AI to facilities issues, integrating with experiments and so on.

HPCwire: From an infrastructure perspective – the computational infrastructure required to run the AI methodologies and to run them in combination with traditional simulation and modeling – what are the key changes needed?

RS: Right now we’re at this interesting place because the exascale machines that were designed are in general similar to Summit and Sierra machines. So at least in the U.S., these machines built around fat nodes with GPUs, large memory, reasonable network, connected to a large amount of non-volatile memory. That’s more or less the same platform that people are using for training deep neural networks. So we’re at this particular moment where the platforms we’ve built for simulation also happen to be very similar to the platforms we are standing up for large neural network projects. Of course we need different precision, 64-bit precision for simulations and we need 32- or 16-bit or even lower precision for the AI things. But for the most part in the next couple of years, these things are going to be done on the same hardware platform, because a) we have it and b) they’re already pretty good – GPU-based systems are really good at training these models and pretty good for inference, and they’re tightly coupled to large amounts of memory, so we are already in a reasonable place.

If we look forward to say 100 exaflops or zettaflop kind of things, it may not be the case that the best architecture for AI problems is also a reasonable architecture for simulation and vice versa. It may be that we have this kind of divergence; it’s actually quite likely we’ll have a divergence because we know AI, at least current AI methods, current deep learning methods, ineffectively use limited precision. They need different kinds of sparsity than numerical simulations and their demands in terms of storage and I/O and memory bandwidth are different. They can be quite intensive but they have a different kind of pattern than the traditional simulations, so it’s quite possible when we look out 5 – 8 years from now that we’ll be faced with some choices. We have architectures that can get optimized for simulation versus things optimized for AI – but how tightly coupled do these functionalities need to be.

Kathy Yelick of Lawrence Berkeley Laboratory is one of three leaders of the AI for Science Town Halls program.

Part of what we’re looking at from the roadmap standpoint, hardware architecture roadmap, software architecture roadmap, is what is needed for the combined kind of simulation AI activities within the context of the DOE infrastructure and is it going to be different systems optimized for different things that are somehow coupled. Are we going to still build things that are really tightly coupled? Is there evolution of architecture? There’s a ton of private capital right now going into AI accelerator developments. Some 40 companies/startups in the U.S. going after this right now – including big guys as well. Not all of those companies are going to survive, some will, and there will be some different ideas than what we have now, things that go beyond GPUs.

One of the things that is coming out of the town halls is the need to have scientific AI benchmarks – benchmarks that reflect the kind of neural network models, deep learning models, that can be for science targets like cosmology or biology and are different than what we need for Facebook or Google. Then we have to understand whether or not the hardware architectures that are being developed by these startups–which are being really optimized for computer vision and natural language–whether those are [able] still to do double duty for the kinds of algorithms that are used in science or whether we have to do something different. So there is a lot of discussion there. I think it’s too early to know where that is going to go. I think the main message is that the exascale machines that are going to be stood up at Argonne and Oak Ridge and Livermore and other places in the next few years at least for the most part are going to be pretty good platforms for doing both simulation and AI in the short term.

HPCwire: A researcher from one of the labs expressed concern that computers will not get faster.

RS: People love to worry about things. The exascale machines will be pretty good architectures, but they’re incredibly expensive machines. I think the challenge isn’t so much that we won’t know how to build faster machines, we can already see how to [get] faster. What’s less clear is whether or not the country can afford to continue to make the investments at the scale needed to actually build faster machines. With the slowing of Moore’s law, the only way you can get faster machines in the future are with architectural improvements and buying more transistors, but transistors aren’t getting cheaper. So we can imagine a billion dollar machine, that’s only twice as expensive as our exaflop machines – and maybe we get some improvements – we get to a factor of 5x or 10x or something faster, but the price isn’t going to go down, it’s going to increase – and so how many billion dollar computers can the country afford. 

HPCwire: A bigger pressure than that is what happens if the market and technology moves to optimize for these things like machine vision that won’t satisfy the traditional modeling and simulation requirements and how expensive will it be to build a computer if you’re not leveraging the commodity economies of scale?

RS: They were really worried about this – if you go back – and you guys even probably wrote about it back 10 years ago, maybe a little further than that, everyone was worried that we were going to have to make our supercomputers out of set top boxes. Remember cable television. Everybody was installing set top boxes, [and] because there was a huge market for microprocessors, the vendors were sort of fixated on it. Then there were gaming chips, the Sony-IBM project and so on, and everyone was kind of saying we’re going to have to live off of whatever architectures the computer graphics world or the gaming world [spurs] because supercomputing wasn’t going to be big enough to drive new fundamental architectures for whatever we needed. Of course that has more or less happened. Machines are all being built out of GPUs, or maybe in Japan, Arm, but we’re not – we meaning the HPC community – we’re not doing a bottoms-up, ground-up, purpose-built architecture from the transistors up – we’re just not doing that; we’re leveraging a lot of commodity stuff.

The real thing people are nervous about is that if machine learning and the current algorithms kind of continue – and this is a big assumption that there won’t be radical changes in algorithms and let’s assume for a second there isn’t – and people get really good at training at low-precision, you may have this fundamental problem: there will be a huge amount of effort in optimizing low-precision and dense arithmetic essentially, and for higher precision sparse stuff, the machines will be sub-optimal. One could argue we’re already kind of in that phase now, and it could get worse.

On the other hand, the danger for the people and the datacenters that have put literally billions of dollars into the AI architectures is that all it takes is one or two good algorithms that could be invented tomorrow that changes fundamentally the kind of hardware that’s needed to make AI go fast, and it’s just way to early [say much about that]. In scientific computing, in traditional PDE solving stuff, we’ve had 30-40 years of trajectory and we have nearly asymptotically optical algorithms for many of these systems, and we know, at least so far, we haven’t found any short-cuts; we need memory bandwidth for example and so on. But in AI it’s still really really early. You could end up in this weird thing – I’m not saying this will happen – but if somebody does figure out how to use spiking neural network chips effectively… Right now we don’t know how to do that even though we have them, we can’t get them to be competitive with GPUs for real heavy lifting, but if somebody breaks that code, then if we get to the very data efficient learning, then GPUs won’t be so useful the way we are thinking of them now. I think it could go either way but I’m not losing any sleep over that part of it. I’m losing more sleep on just the scale of the federal deficit and whether or not the science budgets can support the scale of investments we need for keeping the infrastructure state of the art and competing internationally in terms of the amount of cash that we have to put into this thing.

HPCwire: Comparing developing plans for a national AI program with the Exascale Initiative, what are some of the likely differences between the two?

Jeff Nichols of Oak Ridge National Laboratory is the third leader of the AI for Science Town Halls

RS: One of the cool opportunities that is coming up is the fact that historically the DOE facilities and in some sense the exacale computing program has been built around what we could probably characterize as traditional modeling and simulation. Yes it’s pushing the scale; we want to do things bigger and faster and so on, but we are still doing things like modeling sub-surface or materials or fluid dynamics, or climate or wind turbines or aircraft. Standard fare in some sense. We are pushing the scale and it’s becoming harder, but the community that’s been working on that is essentially the theory, modeling and simulation community. It does touch experiments but usually only in this kind of tangential validation sense.

Once you bring AI into the center, you are no longer just dealing with [just] the part of the community that’s doing modeling and simulation; you are now dealing directly with experimentalists. People that use the light sources, people that use telescopes, people that use accelerators and all kinds of stuff, and who are not modeling people or not theory people, these are what you normally think of as experimentalists. These are people who go to work every day and they generate data and yeah they use computing a little bit but they are not big users of the DOE computing facilities. If those folks now start to see real value in analyzing their data with AI or using their data to train AI to do some predictive models that they use in their experiments, [it’s] almost like a fourth way of doing science. You think of theory and experiment and modeling and simulation as the third way of doing science, but we are kind of inventing a fourth way that’s this kind of data-driven modeling.

The difference that we are seeing is that this could expand by a considerable amount the number of people and the types of applications that the DOE computing facilities would then be working with. And so at this town hall, I asked this question on Monday morning and said how many of you consider yourselves experimentalists and about a third of the room raised their hands. That never would have been the population we were talking to, say, ten years ago. The experimentalists didn’t talk to the HPC people. But now they are first class citizens; they own the data, they generate the data, they have immediate needs for using AI to analyze their data and make predictions. So we potentially will have another whole community segment that becomes part of HPC that suddenly becomes users and becomes stakeholders in architecture and software and everything because they are going to need it for future of experimental science. That’s very cool; it’s a complete change in the composition of the community right. And of course I think there may be some people on the simulation side who are nervous about that because they are the VIPs right – people doing big simulations have been the whole reason that we’ve built the center – and now they will have to share with their experimental colleagues. 

HPCwire: Along those lines – do you see AI techniques and methodologies becoming built in to the software and some of the instruments and some extent less visible to these experimentalists?

RS: Yes, absolutely – it’s going to permeate everything. Their detectors and imagers are going to start to have AI functions, machine learning functions built in – that’s already starting to happen. Sensors will have to integrate data from IoT type stuff in order to do inference at the edge, if you’re training on big machines and you integrate data flows that way. We’re also going to see it in software development itself – this was one of the topics discussed in town halls – how are machine learning / AI methods going to change high performance computing software development? Where we have compilers at run times that get smarter, that can learn from all the code that they see, all the platforms that they generate code for [and] get feedback [from] – are we going to have runtimes that auto-tune not just in the kind of coarse way that we do now but in a really nuanced way by having machine learning functions that can learn not just from your code but from your friend’s code and your neighbors code and the code in the machines next door and so forth. AI that can help us write software and help us debug software.

People talked about operating systems, even just systems architectures that become more self-aware. With our current big machines – there can be  huge power swings based on efficiency and particular things that are running at any given moment, and we don’t really model that or take it much into account. We can do a little bit of that, but you can imagine the future where systems are much more aware of how much power they are consuming, how much I/O they have, whether they’re getting errors on communication channels. They become much more reactive in some sense to the load. So I think you are going to see AI not just affecting the simple kind of built models from a data stream or plug something into a simulation to control simulation – but the whole environment’s going to get permeated by the progress of machine learning, even to the point where we have projects where we want to use it to help evolve computer architectures, where we can train models on all the codes that are currently running, and really get a much better understanding of the distribution of instructions and operand types and reference patterns and so on, and then ask these models that have been learning those patterns, now to generate some architectural candidates that are optimized with respect to that data that they’ve been trained on and we might end up with things that are quite different.

It’s super exciting; it’s kind of like injecting energy at all levels of the ecosystem.

HPCwire: It sounds like all together AI is a driving force for computational progress – I’m wondering about that in the context of Beyond Moore’s. How do you view these AI efforts from a post-Moore’s lens?

RS: It’s somewhat orthogonal – post Moore people mostly think about it as the underlying materials and circuits scaling problem and it becomes an opportunity in architecture. If we can’t build faster and smaller transistors anymore – maybe we can make them smaller for a little while but they won’t be much faster and we’ll have to go 3D and start pushing new materials in at some level – but the real opportunity in post Moore if you factor out things like quantum for a second and think in normal digital computing is architecture. Architectures got boring for a long time because it was hard to compete with just the rate of improvement from Moore’s law. But now that slows down, [and] all of the sudden architecture is king again.

We are already seeing that, right, just the fact that there are all these startups trying to do AI architectures is kind of mind-blowing because that wouldn’t have happened 10-15 years ago because any startup would have gotten blown just by of the progress of Moore’s law. But now that it’s stalled, architecture actually matters. Whether or not AI is an integral part of that or whether it will just go on to leverage the opportunity of new architectures we don’t really know. You can use AI of course to help design architectures. If you go extreme post Moore to where you are talking about non-silicon materials, different radical computing models including at one end of the spectrum say quantum and the other end of the spectrum things like neuromorphic, then the fact that most of those ends are going to be hard to program — and you can imagine that AI based tools can help us program them. So do you want to have half a million quantum computing programmers? Maybe you need to have some AI powered tools that help people think in terms of what would work in those architectures? In that sense AI could be an empowerer of use cases.

But back up from the hyperbole for a second, I tend to think of the Moore issues as completely orthogonal. Silicon has a long way to go before it isn’t the default thing for all kinds of reasons. There will be a lot of architecture innovation and AI is going to drive part of that and benefit part of that. To the degree that we are going to be in a design rich world, then anything that helps you design something is going to be useful. AI methods can certainly help us to design things whether it’s in CMOS or some other technology.

So if I’m making predictions I’d say CAD tools, Cadence, those things are going to start to get smarter and smarter. They are going to take lessons from things like generative models and we already see that – this automated synthesis stuff, [where] you can sketch out what you want and the system will synthesize most of what you need. I think AI will affect a lot of that. It could also potentially affect more fundamental work in post Moore – that is trying to find combinations of materials that have headroom for circuits, things like that. Of course ultimately we want to build computers that have the power efficiency of brains – so we’re orders of magnitude away from that. Whether or not AI can help us accelerate that, it seems possible but it’s not yet clear how to do that.

I’m super optimistic. What I view of the post Moore stuff is actually a way for the materials scientists to participate in this broader ecosystem of contributing to computing. That’s really what’s happening in the DOE space. If you go back 20-30 years ago, the kind of materials science that was done in the labs really was staying away from silicon. They focused on superconductors. Oxalates. Weird cool things. But really not overlapping with silicon microelectronics because the companies that make a living there were investing so much money it was hard to academically compete with that. What’s happened in the last couple years is the realization that we have to go back to basics if we’re going to find something that will ultimately succeed CMOS. That’s a huge opportunity for the material science community to participate again. We’re starting to see DOE take a serious look. They have these basic research needs workshops earlier this year and programs in microelectronics and they haven’t really been in that space – but that’s because there’s opportunity now for fundamental science to make a contribution. I see all this stuff kind of converging, but I do think it’s kind of separate tracks.

I think the ability to invent a new kind of material substrate and get it into some architecture that will then be useful to accelerate AIs – that timeframe is probably 10 to 20-year window. So for the next ten years these things are all going in parallel.

 HPCwire: You spoke for an hour at the first Town Hall event; did we cover some of the same themes you touched on at the meeting?

RS: I gave you an outline – we went through all these things– chemistry, math, materials, climate, biology, high energy physics, nuclear physics, energy — we went through all these kinds of things – of course lots of people, lots of ideas. Then we turned everybody 90 degrees and looked at fundamental math issues, fundamental software issues, data issues, understandability issues, uncertainty quantification, infrastructure, computer architectures. So that’s what we covered– lots and lots of the same things we just talked about – giving you my view and a summary of it.

There’s gonna be a ton of things coming out of each of these town halls. We are kind of rolling them one into the next, so the next one will be influenced to some degree by what we’ve learned in the previous ones and so on. We’ll have a direct report that will be discussed at the Washington meeting, so maybe you guys can come there and get deeper into it. There will be a lot of the political people there – so it’s an opportunity to help get them onboard with what the community is thinking and how we see the possible ways of getting this organized. It’s exciting and we’re trying to get it going. We still have a few years in the exascale program [and] we want this to be sort of starting up as the exascale project rolls over, so we have some kind of continuity going forward.

HPCwire: You said about 350 people – what was the representation?

The first AI for Science Town Hall took place at Argonne National Lab

RS: They were mostly from the Midwest. These are kind of regional things because lots of people don’t have travel money but we did have people from lots of other labs – Oak Ridge and Berkeley and from PNL and Livermore and Los Alamos and SLAC and so on – so a lot of DOE lab people but also a lot of university people – people there from Chicago and Northwestern and from universities out East, Urbana, University of Michigan. So people from all around the Midwest but about 150 people coming from other parts of the country. There will be a core group that will be at all four of the town halls, probably about 40 people or so that are the organizing crew and the team that Kathy and Jeff and I have writing. The people that are organizing this are myself, Jeff Nichols from Oak Ridge and Kathy Yelick from Berkeley.

HPCwire: Great question list on the overarching agenda. I wanted to pose one to you: “What are the 3-5 open questions that need to be addressed to maximally contribute to AI impact in the science domains and AI impact in the enabling technologies?”

RS: There are more than three. The real question is what do I decide to prioritize there. But [I’ll address it] at the super high level. One of the main things is through applying AI and science, this notion of uncertainty quantification or what we sometimes call just model confidence – that’s super important because if you’re classifying cat videos – nobody really cares what your confidence interval is, where your error bars are exactly. But if you are using it in some scientific domain, medical domain – you want to know is that answer likely to be correct – 95 percent likely to be correct or 40 percent likely to be correct and so we have ways of doing that. One way to do that is to make Bayesian models that internally track their own internal degree of confidence. There are other ways to do it as well so that’s a huge important thing. I’d put that up there near the top.

The second thing that we need to know is the community is moving forward on architectures to accelerate AI. They are predominantly focused on two classes of problems: computer vision, and when I say computer vision I typically mean classifying pictures or generating pictures, 2-dimensional color images. The second problem that the community is working on is natural language processing, so language translation, language understanding but in normal speech like Wikipedia speech. [A] favorite example I like to use is the one Google is working on which is automating or making restaurant reservations. Those natural language problems are very common speech. They are not scientific. It doesn’t involve mathematics – it doesn’t involve complex terminology – it doesn’t involve scientific terms – it doesn’t involve vocabulary words that are technical – it doesn’t involve acronyms. And the vision ones are what you and I would normally think of as simple vision problems, not four-dimensional imaging, not hyper-spectral imaging or Fourier imaging, or any of the things we can do in science.

So the second kind of big question is, are the architectures that are being developed to accelerate general AI research – are they in fact even what we need for the types of data and the types of networks and systems we need to build for applying AI in science? We won’t know the answer to that question until we actually build a good library of scientific AI benchmarks and then try to measure how well those hardware architectures do. Of course we can do some of that theoretically, but that is the number two big question.

The number three big question—and I’ll stop at three here—is we do, in DOE in particular, massive amounts of simulation. In fact, our first way of thinking about the world is in some sense by, do we have a mechanistic model of it, a physical model to simulate? Most of the progress in AI involves non-physical modeling. If you think about natural language processing, there’s no physical model for that. If you think about computer vision, most of the kinds of things that people do with computer vision, there’s no physical model, there is no ground truth that you can generate from first principles. But in many scientific areas, we’ve had 400 years of progress—in physics and chemistry and biology and so forth—and we have a lot of physical understanding. How do we use that physical understanding combined with data to build AI models that actually internalize that physical understanding; in other words, having these models be able to make predictions in the world as opposed to in some abstract space.

This in some sense gets at this notion I was talking about early, hybridizing the symbolic kind of AI, where we can reason about physics and math and so on with the data driven AI. Those are the three big problems.

Brief bio:
Rick Stevens has been at Argonne since 1982, and has served as director of the Mathematics and Computer Science Division and also as Acting Associate Laboratory Director for Physical, Biological and Computing Sciences. He is currently leader of Argonne’s Exascale Computing Initiative, and a Professor of Computer Science at the University of Chicago Physical Sciences Collegiate Division. From 2000-2004, Stevens served as Director of the National Science Foundation’s TeraGrid Project and from 1997-2001 as Chief Architect for the National Computational Science Alliance.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

D-Wave Delivers 5000-qubit System; Targets Quantum Advantage

September 29, 2020

D-Wave today launched its newest and largest quantum annealing computer, a 5000-qubit goliath named Advantage that features 15-way qubit interconnectivity. It also introduced the D-Wave Launch program intended to jump st Read more…

By John Russell

What’s New in Computing vs. COVID-19: AMD, Remdesivir, Fab Spending & More

September 29, 2020

Supercomputing, big data and artificial intelligence are crucial tools in the fight against the coronavirus pandemic. Around the world, researchers, corporations and governments are urgently devoting their computing reso Read more…

By Oliver Peckham

Global QC Market Projected to Grow to More Than $800 million by 2024

September 28, 2020

The Quantum Economic Development Consortium (QED-C) and Hyperion Research are projecting that the global quantum computing (QC) market - worth an estimated $320 million in 2020 - will grow at an anticipated 27% CAGR betw Read more…

By Staff Reports

DoE’s ASCAC Backs AI for Science Program that Emulates the Exascale Initiative

September 28, 2020

Roughly a year after beginning formal efforts to explore an AI for Science initiative the Department of Energy’s Advanced Scientific Computing Advisory Committee last week accepted a subcommittee report calling for a t Read more…

By John Russell

Supercomputer Research Aims to Supercharge COVID-19 Antiviral Remdesivir

September 25, 2020

Remdesivir is one of a handful of therapeutic antiviral drugs that have been proven to improve outcomes for COVID-19 patients, and as such, is a crucial weapon in the fight against the pandemic – especially in the abse Read more…

By Oliver Peckham

AWS Solution Channel

The Water Institute of the Gulf runs compute-heavy storm surge and wave simulations on AWS

The Water Institute of the Gulf (Water Institute) runs its storm surge and wave analysis models on Amazon Web Services (AWS)—a task that sometimes requires large bursts of compute power. Read more…

Intel® HPC + AI Pavilion

Berlin Institute of Health: Putting HPC to Work for the World

Researchers from the Center for Digital Health at the Berlin Institute of Health (BIH) are using science to understand the pathophysiology of COVID-19, which can help to inform the development of targeted treatments. Read more…

NOAA Announces Major Upgrade to Ensemble Forecast Model, Extends Range to 35 Days

September 23, 2020

A bit over a year ago, the United States’ Global Forecast System (GFS) received a major upgrade: a new dynamical core – its first in 40 years – called the finite-volume cubed-sphere, or FV3. Now, the National Oceanic and Atmospheric Administration (NOAA) is bringing the FV3 dynamical core to... Read more…

By Oliver Peckham

D-Wave Delivers 5000-qubit System; Targets Quantum Advantage

September 29, 2020

D-Wave today launched its newest and largest quantum annealing computer, a 5000-qubit goliath named Advantage that features 15-way qubit interconnectivity. It a Read more…

By John Russell

DoE’s ASCAC Backs AI for Science Program that Emulates the Exascale Initiative

September 28, 2020

Roughly a year after beginning formal efforts to explore an AI for Science initiative the Department of Energy’s Advanced Scientific Computing Advisory Commit Read more…

By John Russell

NOAA Announces Major Upgrade to Ensemble Forecast Model, Extends Range to 35 Days

September 23, 2020

A bit over a year ago, the United States’ Global Forecast System (GFS) received a major upgrade: a new dynamical core – its first in 40 years – called the finite-volume cubed-sphere, or FV3. Now, the National Oceanic and Atmospheric Administration (NOAA) is bringing the FV3 dynamical core to... Read more…

By Oliver Peckham

Arm Targets HPC with New Neoverse Platforms

September 22, 2020

UK-based semiconductor design company Arm today teased details of its Neoverse roadmap, introducing V1 (codenamed Zeus) and N2 (codenamed Perseus), Arm’s second generation N-series platform. The chip IP vendor said the new platforms will deliver 50 percent and 40 percent more... Read more…

By Tiffany Trader

Oracle Cloud Deepens HPC Embrace with Launch of A100 Instances, Plans for Arm, More 

September 22, 2020

Oracle Cloud Infrastructure (OCI) continued its steady ramp-up of HPC capabilities today with a flurry of announcements. Topping the list is general availabilit Read more…

By John Russell

European Commission Declares €8 Billion Investment in Supercomputing

September 18, 2020

Just under two years ago, the European Commission formalized the EuroHPC Joint Undertaking (JU): a concerted HPC effort (comprising 32 participating states at c Read more…

By Oliver Peckham

Google Hires Longtime Intel Exec Bill Magro to Lead HPC Strategy

September 18, 2020

In a sign of the times, another prominent HPCer has made a move to a hyperscaler. Longtime Intel executive Bill Magro joined Google as chief technologist for hi Read more…

By Tiffany Trader

Future of Fintech on Display at HPC + AI Wall Street

September 17, 2020

Those who tuned in for Tuesday's HPC + AI Wall Street event got a peak at the future of fintech and lively discussion of topics like blockchain, AI for risk man Read more…

By Alex Woodie, Tiffany Trader and Todd R. Weiss

Supercomputer-Powered Research Uncovers Signs of ‘Bradykinin Storm’ That May Explain COVID-19 Symptoms

July 28, 2020

Doctors and medical researchers have struggled to pinpoint – let alone explain – the deluge of symptoms induced by COVID-19 infections in patients, and what Read more…

By Oliver Peckham

Nvidia Said to Be Close on Arm Deal

August 3, 2020

GPU leader Nvidia Corp. is in talks to buy U.K. chip designer Arm from parent company Softbank, according to several reports over the weekend. If consummated Read more…

By George Leopold

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Intel’s 7nm Slip Raises Questions About Ponte Vecchio GPU, Aurora Supercomputer

July 30, 2020

During its second-quarter earnings call, Intel announced a one-year delay of its 7nm process technology, which it says it will create an approximate six-month shift for its CPU product timing relative to prior expectations. The primary issue is a defect mode in the 7nm process that resulted in yield degradation... Read more…

By Tiffany Trader

Google Hires Longtime Intel Exec Bill Magro to Lead HPC Strategy

September 18, 2020

In a sign of the times, another prominent HPCer has made a move to a hyperscaler. Longtime Intel executive Bill Magro joined Google as chief technologist for hi Read more…

By Tiffany Trader

HPE Keeps Cray Brand Promise, Reveals HPE Cray Supercomputing Line

August 4, 2020

The HPC community, ever-affectionate toward Cray and its eponymous founder, can breathe a (virtual) sigh of relief. The Cray brand will live on, encompassing th Read more…

By Tiffany Trader

Neocortex Will Be First-of-Its-Kind 800,000-Core AI Supercomputer

June 9, 2020

Pittsburgh Supercomputing Center (PSC - a joint research organization of Carnegie Mellon University and the University of Pittsburgh) has won a $5 million award Read more…

By Tiffany Trader

European Commission Declares €8 Billion Investment in Supercomputing

September 18, 2020

Just under two years ago, the European Commission formalized the EuroHPC Joint Undertaking (JU): a concerted HPC effort (comprising 32 participating states at c Read more…

By Oliver Peckham

Leading Solution Providers


Oracle Cloud Infrastructure Powers Fugaku’s Storage, Scores IO500 Win

August 28, 2020

In June, RIKEN shook the supercomputing world with its Arm-based, Fujitsu-built juggernaut: Fugaku. The system, which weighs in at 415.5 Linpack petaflops, topp Read more…

By Oliver Peckham

Google Cloud Debuts 16-GPU Ampere A100 Instances

July 7, 2020

On the heels of the Nvidia’s Ampere A100 GPU launch in May, Google Cloud is announcing alpha availability of the A100 “Accelerator Optimized” VM A2 instance family on Google Compute Engine. The instances are powered by the HGX A100 16-GPU platform, which combines two HGX A100 8-GPU baseboards using... Read more…

By Tiffany Trader

DOD Orders Two AI-Focused Supercomputers from Liqid

August 24, 2020

The U.S. Department of Defense is making a big investment in data analytics and AI computing with the procurement of two HPC systems that will provide the High Read more…

By Tiffany Trader

Supercomputer Modeling Tests How COVID-19 Spreads in Grocery Stores

April 8, 2020

In the COVID-19 era, many people are treating simple activities like getting gas or groceries with caution as they try to heed social distancing mandates and protect their own health. Still, significant uncertainty surrounds the relative risk of different activities, and conflicting information is prevalent. A team of Finnish researchers set out to address some of these uncertainties by... Read more…

By Oliver Peckham

Microsoft Azure Adds A100 GPU Instances for ‘Supercomputer-Class AI’ in the Cloud

August 19, 2020

Microsoft Azure continues to infuse its cloud platform with HPC- and AI-directed technologies. Today the cloud services purveyor announced a new virtual machine Read more…

By Tiffany Trader

Japan’s Fugaku Tops Global Supercomputing Rankings

June 22, 2020

A new Top500 champ was unveiled today. Supercomputer Fugaku, the pride of Japan and the namesake of Mount Fuji, vaulted to the top of the 55th edition of the To Read more…

By Tiffany Trader

Joliot-Curie Supercomputer Used to Build First Full, High-Fidelity Aircraft Engine Simulation

July 14, 2020

When industrial designers plan the design of a new element of a vehicle’s propulsion or exterior, they typically use fluid dynamics to optimize airflow and in Read more…

By Oliver Peckham

Intel Speeds NAMD by 1.8x: Saves Xeon Processor Users Millions of Compute Hours

August 12, 2020

Potentially saving datacenters millions of CPU node hours, Intel and the University of Illinois at Urbana–Champaign (UIUC) have collaborated to develop AVX-512 optimizations for the NAMD scalable molecular dynamics code. These optimizations will be incorporated into release 2.15 with patches available for earlier versions. Read more…

By Rob Farber

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This