Associate Laboratory Director
Stevens has been at Argonne since 1982, and has served as director of the Mathematics and Computer Science Division and also as Acting Associate Laboratory Director for Physical, Biological and Computing Sciences. He is currently leader of Argonne’s Petascale Computing Initiative, Professor of Computer Science and Senior Fellow of the Computation Institute at the University of Chicago, and Professor at the University’s Physical Sciences Collegiate Division. From 2000-2004, Stevens served as Director of the National Science Foundation’s TeraGrid Project and from 1997-2001 as Chief Architect for the National Computational Science Alliance.
Stevens is interested in the development of innovative tools and techniques that enable computational scientists to solve important large-scale problems effectively on advanced scientific computers. Specifically, his research focuses on three principal areas: advanced collaboration and visualization environments, high-performance computer architectures (including Grids) and computational problems in the life sciences, most recently the computational problems arising in systems biology. In addition to his research work, Stevens teaches courses on computer architecture, collaboration technology, virtual reality, parallel computing and computational science.
HPCwire: In your role as a PI on the CANcer Distributed Learning Environment (CANDLE) project you’re developing deep learning tools for use to fight cancer. You’ve said in the past that part of DoE’s interest in funding this kind of work is eventually to be able to apply deep learning in many of its application areas. Can you give us a sense of this broader role for deep learning and where it might be used by DoE and comment on the key technical hurdles remaining?
Rick Stevens: There are many areas of science that’s starting to apply deep learning. For example, say you want to create a material that has some set of properties, and you don’t necessarily know the relationship between the input elements of that material and the properties, but you do have lots of examples of materials where people have measured the properties. With deep learning, you would use that dataset of input elements and the properties to train a model that learns that relationship. This can be applied to big problems in energy storage, photovoltaics or extreme environments.
Another area of use is data analysis for simulation. We have people doing cosmology simulations where they’re simulating the origin and evolution of the universe. These are huge simulations—trillions of particles that generate petabytes of data—and they want to mine that data to look for features whose statistics can be compared with the observable features of the universe. Using simulation data, for example, we might be able to find gravitational lensing of galaxies where you have one galaxy behind a massive object (like a black hole or something that causes the light to bend around that massive object) and that creates a unique signature in the data. You can look for these things in observational data, but, in cosmology simulations, you generate petabytes of data and there is no way a human is going to be able to look at all that. Here you can train a deep neural network to recognize these kinds of features and let it loose on the data as you’re generating it during the simulation or post processing.
In biology, one of the things that the DOE wants to do in biology is synthetic biology, where we design an organism to accomplish some purpose – maybe to incorporate a new chemical pathway or to signal when it encounters a given molecule. An example of a generalized problem is the “genotype-phenotype prediction problem,” where the aim is to predict a phenotype—that is, the behavior of the organism—using the genotype. Again, we can’t write down the equations to do that (it’s too complicated and we don’t have enough knowledge), but we can train neural networks or other machine learning algorithms to learn that relationship for some cases. You can then compute modifications to the genome and use the machine learning method to tell you what the resulting phenotype might be. In the DOE space, that might be improving carbon fixation. In the biomedical space, it might be improved antibiotics resistance research. I could go on and on.
These challenges are showing up everywhere, from energy storage to 3D printing. In 3D printing, you may want to use machine learning to tweak the parameters to produce materials that meet certain structural properties. Again, we can’t write down the equations to do that, but we can take a lot of samples out of 3D printers, and put them in a synchrotron, collect scattering data, and train the network on the relationship between the parameters that were used and the resulting atomic configurations readout of the scattering data. Then we can learn how to adjust the printer to produce the parts with the properties that we want.
The application space is very, very broad. We just had a workshop where 200 people came on-site to explore deep learning in their various areas of research.
HPCwire: What would you say are some of the most significant hurdles that you’re facing?
I would say that there are two classes of hurdles. The first class contains more fundamental things like, ‘do we have enough training data’, ‘does it sample the distribution we expect to apply it to in the right way’, or ‘is the data labeled well’. These are the kinds of problems every deep learning application has. Is the training data good and does it reflect the actual end use? When the answer is ‘maybe’, then you start looking at other ways to augment that: can you use unsupervised learning on unlabeled data to increase the coverage? Can you use simulations / experimental data to augment simulation data? One whole set of challenges has to do with data.
The second has to do with the models. Do we have good starting models? Do we have good strategies for improving those models (either by tuning the hyperparameters or searching the model space for better example models)? Can we incorporate into these models things like uncertainty quantification? One of the things we’re very excited about is finding a way to put uncertainty quantification into these deep learning models such that what you get from the model is not just a point prediction, but distribution. There are several different ways of doing that, and we’re building into the CANDLE toolset those different ways so that they become available to people.
I’d say the problem of ‘how do you search the model space quickly and efficiently’, ‘how do you modify the models to get uncertainty quantification’, or ‘how do you efficiently map them onto the HPC architecture (especially on more specialized platforms),’ is all about the technology and the infrastructure. While CANDLE can’t help all that much with the data for a specific problem, we are including some methods for data augmentation and some exemplars for using unlabeled data to help transfer learning. Usually, however, the data is very tightly coupled to the problem. On the other side, strategies for improving models—for choosing better starting points model types, building-block models, execution strategies—are all pretty generic. That’s what we’re building into the CANDLE libraries and tools.
HPCwire: Harmonizing computational infrastructure to accommodate traditional HPC and big data is a much-discussed topic. As one of the authors of the recently issued Big Data and Extreme-Scale Computing Pathways to Convergence Report, what’s your take on how pressing the problem is (or isn’t) and what’s the best way to address it?
It’s pretty pressing because what’s happening is the interest and opportunity from the data side of things—whether it’s machine learning or just data analysis in general—is accelerating and the budgets that we all have aren’t expanding as much as the opportunity is expanding. So we have to find a way to build infrastructure that can do both of these things and do them reasonably well.
There still may be cases where specialized systems or specialized accelerators would be helpful, but when we’re in exploratory mode, we want to investigate strategies for quickly optimizing model structure, which is one of the drivers.
One of the second drivers that I talk about a lot is that most problems are not ‘either / or.’ They’re not just simulation problems or just data problems, but they’re usually a combination of these things. The idea that we’re seeing more and more simulations coupled to machine learning or coupled to analytics, can go both ways. It could be where the machine learning part needs the simulation to fill in the gaps or to generate cases it needs, or it could be applying machine learning to the simulation output or controlling the simulation. All of these things are possible.
In any case, all of them have the need for tightly coupling storage resources to compute resources. Typically, that’s being achieved by some sort of solid state, high-bandwidth storage device either being fabric mounted or embedded in fabric that allows you to add a lot of bandwidth to the compute node from that storage available. In some of the cases that I’ve seen, we’re getting an order of magnitude or more improvement in bandwidth, making it one strategy that is promising.
Another strategy is moving toward object oriented storage systems or a global memory type of environment where you eliminate explicit I/O—which not only has advantages from not having write things out just to read them back in, which we do all the time in scientific workflows, but it also gives you an opportunity then to do memory optimization and have a much tighter coupling between the simulation processes and analysis (and learning processes). There are other things people are looking at, but when I think about it and what we’re doing in CANDLE, these are the key drivers.
HPCwire: What do you hope to see from the HPC community in the coming year? Also, what can we hope to see in the coming year from CANDLE?
Well, there are lots of things happening. I would say in the HPC space, the idea of mixed precision and taking linear algebra type libraries and improving their interfaces and making those a bit more general purpose such that they can more easily link to deep learning frameworks is one direction. Supporting mixed precisions, while also supporting different patterns of sparsity that we’re starting to see, in deep learning is key. That’s a question of what can a math library do to help support the evolution of deep learning.
I think the flip side is also true. There’s been a lot of progress in the last couple years in new methods in deep learning, ranging from things like capsule networks to different ways of handling sparsity—gated ensembles—to different ways of handling aggregation of networks, and these things haven’t yet been thought about from a standpoint of what’s needed from a math library standpoint that would optimize both types of networks.
At a super high-level, we basically need people thinking about the types of deep learning networks to be talking to the linear algebra solvers community about opportunities for optimization. That’s something we’re doing – we’re collaborating with the PETSc TAO Group on this direction underneath the Exascale Computing Project (ECP) because ECP supports both the CANDLE effort and the numerical library effort.
I also think there’s opportunity for the groups working on streaming I/O and data middleware infrastructure. In deep learning, you often have large amounts of data that you have to dynamically move around the system for training — sometimes computing on-the-fly joins, sometimes doing different partitions or sampling of the data — and you want to do that without reinventing the wheel. So there are projects like CODAR and others in ECP that are building tools for that purpose and CANDLE is collaborating with them.
The third is the HPC community does build software for large-scale optimization and, often in deep learning workloads, we need an optimizer sitting over top of the process (whether it’s hyperparameter optimization or something else). We want to use the best optimizers for what often becomes a mixed integer problem.
Much of this is being supported through the Exascale Computing Project because one thing that’s great about ECP is that it’s funding a large number of software projects and many applications, so you can find things that are in common between applications or common use cases that give the software/middleware/library developers a bit more motivation because they can serve more than one application.
HPCwire: Outside of the professional sphere, what can you tell us about yourself – personal life, family, background, hobbies, etc.? Is there anything about you your colleagues might be surprised to learn?
Some of my friends know this, but I’m having a lot of fun with little drones – the kind that you fly around and take pictures with. I have a collection of those. My daughters and I go out and fly them and make movies. We’ve made movies in Costa Rica and all sorts of places. I’m having great fun playing with those and also thinking of projects I can do with them in terms of programing them to do things that they are not originally designed to do.
For instance, I’d like to be able to search for fossils with them – program them using the AI that they’re starting to have on board where they can look at images and recognize objects. They don’t have built in hyperspectral imagers yet, but soon it should be possible to equip one with a hyperspectral imager and then train it on signatures of interesting rocks, for example, and then have it do a search pattern over some large area where you don’t really have to pay attention to it. Then it could come back to you and say, “over here may be some interesting things to look at.” That’s something I’d like to get working on, perhaps this summer.