Rick, congratulations on being named an HPCwire Person to Watch for the second time. 2023 is a big year for Argonne. How does it feel to be so close to the finish line on a project like Aurora?
This all started in the spring of 2007 with the first meetings in DC proposing the idea of an Exascale Initiative. In fact, the theme we proposed to drive the program was “Energy and the Environment”. It was clear then that we would need exascale and beyond to address the challenges of global climate change, and the move towards sustainable energy and of course the myriad of important open science questions ranging from fundamental physics to materials science to systems biology. What was not clear in 2007 was the rise of the third big wave of AI. Our first town hall report from later in 2007 barely mentioned data and of course was silent on AI. Now as we finalize the buildout of Aurora and the commissioning of Frontier, it is clear that we had part of the story in 07, energy and environment are still going to be major drivers for the use of Exascale machines, but the real breakout is going to be AI and the applications of AI for many areas of science and engineering. So, it feels good to see the fruits of our long labor come into being, but also humbling that we didn’t see the growth of data and AI as clearly sixteen years ago when we started this journey. What else did we miss?
How is the lab preparing for the system’s availability?
We have over 180 users working today on Sunspot which is the test and development system for Aurora, kind of a baby Aurora with 128 nodes. We have over sixteen Early Science projects gearing up for Aurora as well as many ECP projects (there are over 70 application codes from ECP, and over 80 software packages as part of the Exascale software stack). The ALCF and Intel have been conducting many workshops to help users get ready for Intel GPUs and Aurora in general. ECP of course has also been running many workshops and tutorials. As the system comes up, we are also planning some large capability computations (beyond the traditional things like Linpack) to both help shake out the machine and demonstrate its capabilities, so stay tuned for those.
Where do you see Aurora and Argonne distinguishing themselves in the growing landscape of the exascale era?
We will be the second publicly acknowledged Exascale machine (so Aurora is effectively doubling the public landscape) and Aurora is the first large scale machine built on both Intel CPUs and Intel GPUs. Our system also is the first very large system to deploy DAOS (Distributed Asynchronous Object Storage) which is designed for high-bandwidth access to NVM storage. DAOS will enable very high bandwidth streaming access to the nodes, much higher performance than a traditional filesystem and not subject to the capacity limits of a local NVM server on the node. On the applications side, we will be pushing on the balance of traditional simulation at scale and advanced AI applications, including major effort on demonstrating the value of AI driven surrogates in climate, electronic structure, drug design and cosmology to name a few. We will also be coupling Aurora to the Advanced Photon Source, to prepare for when the APS-U comes online with more than 500x improved throughput, by tightly linking the supercomputer to the light source we will be exploring how to use AI, streaming analysis, and simulation in the loop to dramatically increase the productivity of scientists, by automatically steering experiments and processing data faster than real-time we can help create new ways of doing science.
There’s a big spotlight on Aurora, of course, but Argonne’s HPC work has been at full steam for a while now, well ahead of the system’s launch. What projects and use cases are you most interested in right now at Argonne, and what transformative workloads are you looking forward to in the coming years?
I’m personally working on two projects that will be exciting to get on Aurora. The first one is our drug design pipeline, which we have been developing for the past many years, aiming to enable the rapid search of virtual libraries of tens of billions of molecules to identify those that are likely to bind to a given target. We made major advances in refining this pipeline during the pandemic and used it to identify over 60 compounds that were active against SARS-CoV-2. We are using it today on a dozen drug targets in Cancer and infectious disease including emerging antimicrobial resistant pathogens such as Staphylococcus aureus and Mycobacterium tuberculosis. This pipeline combines simulation and AI and with Aurora we should be able to identify strong binders out of more than 20 billion molecules in less than one hour per target. The second project I’m excited about is using Aurora to pre-train transformer models specifically for problems in science and mathematics. We are looking at how to use the full machine to train models with order a trillion parameters on trillions of tokens and how to do this efficiently. How these will be different from current models like ChatGPT is that they will be trained on a combination of diverse texts, but also on very large numbers of scientific documents, including papers, presentations, etc. but not just the text content but also on the full images of scientific and mathematics papers. In addition to the text content our goal is to also create a large training corpus of scientific datasets that can be tokenize and included in the pre-training, such as genomes and proteomes as well as data from simulations, large-scale experiments in materials science and chemistry as well as climate and environmental science. In some sense we want to capture in an AI model as much of the world’s scientific and math understanding as possible. This also includes scientific codes. But officially of course we love all the projects on Aurora equally and will work hard to get as many codes ported and optimized for Aurora while the machine is being commissioned so that once it is in production as many communities as possible benefit.
The traditional HPC market is undergoing substantial change, most notably blending in AI technologies with quantum on the horizon. Where do you see HPC headed? What trends – and in particular emerging trends – do you find most notable? Any areas you are concerned about, or identify as in need of more attention/investment?
I’ve been giving a few talks on this recently and more are planned. Traditional HPC and AI are really converging quickly both at the hardware level, but also perhaps more slowly at the software stack and application level. We see this in many ways, major AI infrastructures are starting to look and behave more like HPC systems with high-speed interconnects, gang scheduling, decently fast IO etc. And HPC systems especially those based on GPUs are suddenly gaining access to fast low-precision numerics and hardware that knows about dense matrices, though how they are approaching sparsity is still quite different. From an applications standpoint, we see rather spontaneous integration of AI methods into simulations (surrogates, control loops, etc.) and more use of simulation to create training sets for AI models. In the future most scientific codes will be some hybrid of traditional methods and AI methods. Quantum computing is also moving quickly but has a long way to go before its utility is positive (i.e., the value of a quantum computation is greater than the cost of the computation). We need to search harder for quantum algorithms that offer potential exponential speedups, and we need to scale quantum systems up by orders of magnitude to get to enough physical qubits so that we can use quantum error correction at scale to provide hundreds or thousands of logical qubits that are needed for real applications. We also need to see dramatic improvements in fidelity in gate operations and perhaps most importantly we need about a 10,000x reduction in the cost of qubits to put them in the range of the largest supercomputers we can deploy today. My sense is that AI convergence will happen much more quickly (and already is in full swing) and quantum is on more like a ten-to-twenty-year timetable to solve interesting problems. Quantum is important however and much like the early research in parallel processing which many people thought was crazy in the 1970s and 1980s, by the 1990s it was clear that parallel processing was the only way forward. I think the biggest problem with sustaining quantum investment is that we need to not over hype it and burn out the basis for long term support. Quantum is a long-term game. The near-term game is AI and hybrid approaches. The other long-term (but on the proven axis) is the need to continue to push towards Zettascale. It took us about twelve years from the first petascale systems to the first exascale systems. That next factor of 1000 is going to be much tougher. Perhaps it will take 15-20 years to hit a Zettascale in FP64. Though I think it could be done in lower precision for powering AI and perhaps mix-precision applications considerably sooner. It’s a big challenge, but one that we need to invest in since it will power both simulation progress and AI for the next few decades while we work to make quantum real.
What inspired you to pursue a career in STEM and what advice would you give to young people wishing to follow in your footsteps?
Like many people of my generation, we were inspired by many things but three stand out. The space program, landing on the Moon (“Houston, Tranquility base here. The Eagle has landed.”) and sending craft to Mars and other planets; the environmental crisis and movement of the 1950’s, 1960’s and 1970’s (Rachel Carson: Silent Spring, Edward Abbey: Desert Solitaire, Helen Hoover: The Years of the Forest) and Star Trek. I studied physics, mathematics, philosophy, biology, computer science and English literature in college. The year I applied to be an intern at Argonne I also applied to be an intern at JPL. I think I chose Argonne in part because it was closer and my car was not so reliable. My advice is to find something that gets you up in the morning, something that you love to do. Grow that thing, feed, and nurture it. For me it was the fascination that we could build computers and solve problems with them, and that the whole community was motivated to build faster machines so we could do more. I still love to code and try to spend some time everyday coding. Now of course it is even more fun as you can get help from chatGPT to help you remember how to code against an obscure API or remember some syntax you’ve long forgotten. It can even help you write small quantum programs using QISkit or Cirq. But really the advice is to find something that is important and exciting for you that also can make a positive impact on the world. Read books, lots of them. Refine your argument skills and your building collaborations and teaming skills. Work out how to get other people excited about your thing. And don’t care what other people think. And most importantly, don’t pay attention to people that try to pigeonhole you or tell you can’t do something. The only thing that really matters is that there are interesting problems and people that want to work on them. All else is not really important.
Outside of the professional sphere, what can you tell us about yourself – unique hobbies, favorite places, etc.? Is there anything about you your colleagues might be surprised to learn?
Some people know that this year my wife (Deb) and I started keeping bees. One of our hives never got established, but the other one produced a lot (about 25 kg) of the best honey I’ve ever tasted from the wildflowers near our house. Unfortunately, that hive also died this fall. So maybe we will try again this year. We might instrument the hives with some edge computing gear this year so we can monitor them better. I also like to camp, and I like to try out new camping gear and technology. Hot tents are lots of fun, as are solar panels and ultra-portable radios (I like to do ham radio stuff too). But I also like to spend my own time working on computing problems. I have lots of computers at home and I often feel so lucky to have so much stuff at home that is more powerful than the fastest machines when I was in school. Kind of mind blowing. I try to use them for some worthy purpose.