The giant supercomputer Summit — based at Oak Ridge National Lab’s Leadership Computing Facility (OLCF) — lives on, at least for one more year supported by a new program, SummitPLUS. Scheduled for shutdown earlier this year, the IBM-built supercomputer has been extraordinarily productive. It was the fastest supercomputer in the world sitting atop the Top500 list in 2018 and 2019, and it remains the fifth fastest based on its November 2023 showing.
Summit ‘s impressive record includes being recognized in two different 2020 Gordon Bell Prize winning efforts (work simulating SARS-CoV-2 virus’ spike protein activity and work modeling and testing of the DeePMD-kit software) and was a Special Gordon Prize finalist in 2019. It was also used by Google in an experiment to demonstrate quantum supremacy, and although that work has engendered controversy, Summit performed as just as promised. This is hardly a full list.
Today, of course, Summit’s OLCF successor, Frontier, the first U.S. exascale computer, has the limelight and bears the Top500 crown.
Now, SummitPLUS is extending the life of Summit, by supporting more than 100 research projects to be run on Summit this year before final shutdown at the end of 2024. Last week, HPCwire had a chance to talk to Bronson Messer, director of science at OLCF, about one of the projects — an effort to better understand the feasibility of quantum computing-HPC system integration and to build a benchmarking suite, Q-Stone, better assess QC-HPC integrations.
But unsurprisingly, Messer didn’t want to neglect SummitPLUS more broadly. Messer’s job is perhaps one of the most exciting and taxing among the big system sites. His charge is to ensure all of OLCF’s computers and scientific instruments are working at full-tilt on worthwhile science projects. Let’s not forget that includes Frontier.
Even as Summit approached sunset, Messer and colleagues thought it could do more. “You know, we’re constantly running a computer in production and looking ahead to the next computer; we have a finite amount of space as everybody does. So it was very much our plan to shut the computer down. But it’s been such a productive scientific instrument that we negotiated with IBM and Nvidia and got a sixth year of support for the machine,” said Messer.
“That took some doing and because it took a considerable amount of effort, we got to the end of the calendar year or close to the end of the calendar year and realized, oh, my gosh, we need to figure out what we’re going to do with the machine for the coming year because we have not included in any of our allocation programs for 2024. So we basically stood up an allocation program and reviewed close to 200 proposals in about four weeks. Yeah, which is the most ridiculously crazy thing that I’ve been involved with so far.”
The lives of these colossal machines are both amazing and short, in the 5-6 year range generally.
Talking about Summit, Messer said, “We have utilization rates well in excess of 80%. So we’ve been running those GPUs full out for five and a half years now. Asking a lot of them. They’re built to last for a while, but they don’t last forever. So it’s a perishable resource to some extent.
“It’s all about maintenance. It’s all about making sure that if changes go into libraries that we get reasonable updates in a reasonable amount of time. There’s hardware failing every day on the machine, right, and we run it hard. That happens on many machines. But you can tell that Summit has been working hard for a long, long time.”
He doesn’t think yet another year is in the cards for Summit and said, “I cannot say for sure, but I think it would be a minor miracle if we extended operations.”
In the meantime, SummitPLUS is a big win for many researchers. “We’ve got people jumping on the machine and using it like gangbusters. You know, the other thing we did with Summit that I didn’t mention is we didn’t allocate all the time available through SummitPLUS. We are also a resource provider for NAIRR, the National Artificial Intelligence Research Resource, and easily the biggest right now, during the pilot; we’re offering more time on Summit than any other resource provider currently. So I think we’ll have projects from that program appearing on the machine in the very near future.”
Messer said they don’t normally release the details of all allocations on the machine, but said “I can tell you some stats:
1) “The quantum project we discussed is actually the smallest on the allocation list with 20k node-hours. That doesn’t mean it’s the least important! It’s just ‘right-sized’ for the work to be done at this point. All SummitPLUS projects enjoy the same priority on the machine.
2) “The largest projects have allocations a little more than 10x this size, with the largest allocation going to FUS155 from Noah Reddell from ZAP Energy (300k node-hours).
3) “The average allocation size is about 175k node-hours.”
He did provide the list of projects on SummitPLUS and that list is included below.
It’s notable to recall OLCF’s ground-breaking history. Remember Titan, Summit’s predecessor, which was decommissioned in 2019. Titan was one of the first large scale heterogenous systems. OLCF program director Buddy Bland at the time recalled, “Choosing a GPU-accelerated system was considered a risky choice.”
The Top500 List organizers reported in 2012:
“When the 40th edition of the list was released at the start of SC12, the No. 1 position was claimed by Titan, a 560,640 processor system with a Linpack performance of 17.6 petaflop/s. Oak Ridge National Laboratory’s Titan is a Cray XK7 system that relies on a combination of GPUs and traditional CPUs to make it the world’s most powerful supercomputer. Each of Titan’s 18,688 nodes contains an NVIDIA Tesla K20 GPU along with a 16-core AMD Opteron 6274 CPU processor, giving the system a peak performance of more than 27 petaflops. Titan also has more than 700 terabytes of memory.
“Titan’s use of GPUs also points the way for future scientific supercomputers. Because GPUs provide high-performance and energy-efficient computing power, they will allow supercomputing systems to become ever more powerful while avoiding the obstacles inherent in growing size and power consumption.”
Now, as Messer pointed out, “We rode the crest of hybrid CPU-GPU computing, for large scale computing, we were at the top of that wave. And now my gosh, it’s the world, literally the world at this point.”
FYI, this article began as an intent to discuss the QC-HPC integration project being run as part of SummitPLUS, but it seemed a stretch to squeeze the two together given Messer’s enthusiasm for the SummitPLUS program generally and Summit’s remarkable history. HPCwire will post an article on the SummitPLUS QC-HPC integration project shortly.
Stay tuned
Top image: Bronson Messer, director of science, OLCF, with the IBM-built Summit supercomputer, was taken in 2019 when Summit was atop the Top500 List.
LIST SUMMITPLUS PROJECTS
Project ID | Name | PI | |
ARD163 | High Order Wall-Modeled Large Eddy Simulation of High-Lift Configurations | Dr. Zhi Jian Wang | |
ARD166 | Shock Unsteadiness in Transonic Flow over Supercritical Laminar Flow Control Airfoil | Dr. Sanjiva K. Lele | |
ARD167 | Aerodynamic and Aeroacoustic Simulations of a Regional Air Mobility Aircraft with Distributed Electric Propulsion | Dr. Vineet Ahuja | |
ARD168 | Development of LES-informed AI/ML models for vortical flows in gas turbines | Dr. Michal Osusky | |
ARD169 | Deep Learning Closure for Large Eddy Simulation of Transitional Hypersonic Shockwave-Boundary Layer Interactions | Dr. Jonathan Francis MacArt | |
ARD172 | Nonequilibrium effects in hypersonic boundary layers: DNS and data-driven RANS modeling | Dr. Akanksha Baranwal | |
AST196 | Metal Loading of Galactic Winds | Dr. Mark Krumholz | |
AST198 | Radiative MHD of bright transients from neutron stars | Dr. Bart Ripperda | |
AST199 | Ensemble Surveys of Core-collapse Supernovae | Dr. Christian Cardall | |
AST200 | From Supernovae to Galactic Winds: The ISM-Halo Connection | Dr. Evan Schneider | |
AST203 | Simulations of core-collapse supernovae in 3D with rotation | Dr. Austin Harris | |
AST204 | PIC Simulations of the Polarized X-ray Emission from Magnetars | Dr. Yuran Chen | |
ATM145 | High-Fidelity Physics-Based Modeling of the Ionosphere-Thermosphere | Dr. Ngoc-Cuong Nguyen | |
ATM146 | Energy Exascale Earth System Model Project | Dr. Walter Michael Hannah | |
ATM148 | Understanding Extreme Weather Events with AI Forecast Emulators | Dr. Rahul Ramachandran | |
BIF143 | Contrastive Learning for Drug Discovery | Dr. Jens Glaser | |
BIF144_MDE | Cincinnati Children’s Hospital Medical Center (CCHMC) Mental Health Trajectories | Dr. Mayanka Chandra Shekar | |
BIP236 | Microscopic Characterization of the Full Transport Cycle of a Major Neurotransmitter Transporter in Human Brain | Dr. Emad Tajkhorshid | |
BIP237 | Molecular mechanisms of GPCR-mediated phospholipid scrambling | Dr. George Khelashvili | |
BIP240 | Modeling key cell cycle processes in bacteria | Dr. Jaan Mannik | |
BIP242 | Dynamics and energetics of bacterial pili extension and retraction | Dr. James C. Gumbart | |
BIP243 | Revealing the Structural Basis of Functional Selectivity to Create Safe, Effective Drugs | Dr. Ron O. Dror | |
BIP244 | Molecular simulation of monoclonal antibody binding to protein A | Dr. Abraham Lenhoff | |
BIP245 | Interrogating Virus Aerostability in High pH Conditions | Dr. Rommie Amaro | |
BIP246 | Decoding Sequence-Dependent Dynamics of Holliday Junctions in DNA Self-Assembly and Beyond | Dr. Aleksei Aksimentiev | |
CFD180 | HPC4EI-CapraBiosciences | Dr. Ishan Srivastava | |
CFD184 | Massively Parallel Large Eddy Simulations for High-Efficient Gas Turbines Operating with Hydrogen and High Aerodynamic Loading | Dr. Alexander Stein | |
CFD185 | Interface-resolved simulations of scalar transport in turbulent bubbly flows | Dr. Parisa Mirbod | |
CFD186 | Particle-Resolved Direct Numerical Simulation of Particulate Buoyancy-Driven Turbulent Convection | Dr. Myoungkyu Lee | |
CFD188 | High-Fidelity Simulations of Sustainable Propulsion and Power Generation Systems | Dr. Muhsin Mohammed Ameen | |
CFD189 | Direct numerical simulations of hypersonic boundary layer receptivity and transition | Dr. Wesley Harris | |
CFD190 | VERTEX—Advanced Multiphysics Simulations for Core Applications | Dr. Marc Olivier Gerard Delchini | |
CFD191 | High-fidelity computation of high-Reynolds number multiphysics problems | Dr. Aditya Nair | |
CHM196 | Permeability of Gases in Polymers at Cryogenic Conditions from Molecular Simulation | Dr. Walter G Chapman | |
CHM198 | Comparative Performance Analysis of Programming Models Used in GronOR | Dr. Tjerk Straatsma | |
CHM202 | Reactive, Generalizable Machine Learning Potentials for Molten Salts Modeling at Scale | Dr. Vyacheslav Bryantsev | |
CHM203 | Modeling and Simulation of the Non-Equilibrium Energy Transfer for Efficient Reactions | Dr. Ramanan Sankaran | |
CHM205 | Enabling high-accuracy exascale ab initio molecular dynamics | Dr. Giuseppe Barca AProf | |
CHM206 | Developing a workflow for the automation of large-scale parallel tempering MD simulations for advancing drug discovery | Dr. Thanh D. Do | |
CHM207 | Revealing Promethium Aqua Ion Chemistry Using Relativistic Calculations and Machine Learning Approaches | Dr. Alex Ivanov | |
CHM208 | Structure and reactivity at complex interfaces | Dr. Vanda Glezakou | |
CHM209 | Multi-determinant diffusion Monte Carlo calculations of isomers of C20 fullerene | Dr. Kenneth D Jordan | |
CHP125 | Advanced Numerical Simulation for Measurement based Computing of Quantum Chemistry | Dr. Ang Li | |
CHP126 | Deep potential molecular dynamics of electrochemical and atmospherically-relevant aqueous interfaces | Dr. Roberto Car | |
CHP129 | Excess entropy strategy for constraining AI parameterized force fields from ab intio simulation. | Dr. Jonathan D Nickels | |
CLI180 | High-Resolution E3SM Land Model on GPUs | Dr. Peter Thornton | |
CLI187 | Measurement-based high fidelity wind farm simulations for realistic, complex atmospheric conditions | Dr. Lawrence Cheung | |
CLI188 | Saving PetaBytes in Earth System Model Outputs using Stochastic Approximations | Dr. David Elliot Keyes | |
CMB153 | Benchmark Simulations of Turbulent Multiphysics Processes in a Laboratory-Scale Supersonic Combustor | Dr. Joseph C. Oefelein | |
CMB156 | Thermal and chemical nonequilibrium effects in detonation waves revealed by high-fidelity simulations | Dr. Jorge Sebastian Salinas | |
CMB157 | Fundamental study of soot formation and flame dynamics of sustainable aviation fuel using DNS | Dr. Bruno Souza Soriano | |
CMB159 | Molecular level simularions of reacting flows under thermal and chemical non-equilibrium | Dr. Shrey Trivedi | |
CPH156 | Photophysics of Excitons in Low-Dimensional Organic-Inorganic Semiconductors | Dr. Marina Rucsandra Filip | |
CPH159 | Computing many-body dispersion and superradiance effects in biomacromolecular dynamics in aqueous environments | Dr. Philip Kurian | |
CPH160 | Probing dynamical correlations and information scrambling in Quantum Annealing devices using GPU optimized Tensor Network Methods | Dr. Alberto Nocera | |
CSC452 | Performance Analysis and Tuning of HPC and AI Applications | Dr. Abhinav Bhatele | |
CSC555 | ILLUMINE | Dr. Jana Bozena Thayer | |
CSC556 | SciMLBench | Dr. Juri Papay | |
CSC559 | Exploring the Frontiers of Simulated Quantum Computing: Performance and Limitations of Tensor Network Simulators in Solving MaxCut and Many-Body Problems | Dr. Vicente Leyton Ortega | |
CSC562 | Exploring and benchmarking prospects for HPC-Quantum integration on a leadership-scale computing platform | Mr. Peter Groszkowski | |
CSC564 | Scalable Simulation and Data Analytics with PETSc | Dr. Richard Tran Mills | |
CSC565 | Large-Scale Multimodal AI Foundation Models | Dr. Irina Rish | |
ENG142 | High Fidelity Operational Reliability Modeling | Dr. Slaven Peles | |
FUS155 | Study of Z Pinch Plasma by 3D Kinetic Model on slimmed-memory GPU | Dr. Noah Reddell | |
FUS156 | Stellarator performance predictions | Dr. Walter Guttenfelder | |
FUS157 | High-fidelity Coupled SOL Impurity Transport Simulations in 3D Complex Geometry Fusion Devices | Dr. Jacob Merson | |
FUS158 | Kinetic Simulations of Quasi-Parallel Collisionless Shocks in Laboratory Plasmas | Dr. Derek Schaeffer | |
FUS159 | Computational Design of Multi-principle element alloys for fusion energy | Dr. Kevin Woller & Dr. Sara E Ferry | |
FUS160 | Tungsten Erosion Modeling in WEST through Synthetic Diagnostic | Dr. Abdourahmane Diaw | |
FUS161 | Frameworks for Multiscale Transport Modeling in Fusion Plasmas | Dr. Noah Mandell | |
FUS164 | Magnetic Field Generation and Reconnection in High Energy Density Plasmas | Dr. William Randolph Fox II | |
GEO153 | SCEC Earthquake Ground Motion Modeling Research | Mr. Philip James Maechling | |
GEO155 | EQSIM regional earthquake simulations for San Francisco Bay Area | Dr. David McCallen | |
LGT127 | Searching for the critical end point using lattice QCD | Dr. Hai-Tao Shu | |
LGT128 | Electromagnetic and strong isospin breaking corrections to strong dynamics | Dr. Henry Monge Camacho | |
LRN038 | Autonomously Driven Software – SummitPLUS Pursuit of Level 4 Autonomy | Mr. Allan Grosvenor | |
LRN044 | Building Foundational and Surrogate Models for Experiment Steering at LCLS | Dr. FREDERIC POITEVIN | |
LRN045 | Dynamic Information Flow for Secure and Real-Time Integration of Edge and HPC | Dr. Ryan Neal Coffee | |
LRN046 | Neural Operators for Learning Multi-Scale Multi-Physics Processes | Dr. Anima Anandkumar | |
LRN047 | Scalable Swarm Intelligence | Dr. Robert Patton | |
LSC119 | Development of foundational AI models for Agriculture | Dr. Aditya Balu | |
MAT267 | Modeling plasma facing and structural materials for fusion applications | Dr. Sophie Blondel | |
MAT270 | Multiscale Modeling of Subgrain Cellular Structure across Melt Pools in Additive Manufacturing | Dr. Lang Yuan | |
MAT272 | Domain dynamics of ferroelectric heterostructures at large scale using causal-informed scientific machine learning and atomistic simulations | Dr. Ayana Ghosh | |
MAT273 | Foundational graph neural network models for chemistry and materials science | Dr. Victor Fung | |
MAT274 | Rational Design of High-Performing electrodes in Energy Storage Devices | Dr. Kwangnam Kim | |
MAT275 | First-Principles Study of NMC-Carbon Interfaces | Dr. Zongtang Fang | |
MAT276 | First-principles understanding of the electronic and magnetic properties of doped-magnetic quantum topological materials | Dr. Swarnava Ghosh | |
MAT277 | Interplay between length scales as spot melts solidify at varied power profiles | Dr. Stephen Joseph DeWitt | |
MED123 | Modelling new materials for hydrogen storage applications | Dr. Dario Alfe | |
MED125 | Large-Scale Investigation of CSF Dynamics in the Human Optic Nerve | Dr. Diego Rossinelli | |
MPH118 | 6,000 kinase simulations for a new molecular dynamics repository | Dr. Travis Wheeler | |
MPH119 | Developing a Generative AI Model for Protein Disorder | Dr. Julie Carol Mitchell | |
MPH120 | Inverse Design of Near Infrared Fluorophores for Quantum Network Repeaters | Dr. Pilsun Yoo | |
NFI126 | Microscopic Framework for Fission Dynamics of Odd-Mass Nuclei | Dr. Aurel Bulgac | |
NFI127 | CFD for Advanced Nuclear Reactors | Dr. Dillon Shaver | |
NFI128 | Advanced Computing for Scientific Discovery of Molten Salt Reactor Dynamics | Dr. April Novak | |
NPH160 | pion and kaon twist-3 GPDs | Dr. Martha Constantinou | |
NPH161 | Gravitational Form Factors from Lattice QCD | Dr. Keh-Fei Liu | |
NPH162 | Nuclear interactions from QCD | Dr. Andre Walker-Loud | |
NRO109 | PeakBrain: Generalizable segmentation models for connectomics | Mr. Thomas Uram | |
PHY182 | Quantum Supremacy | Dr. Travis Humble | |
PHY185 | First-principles QED-PIC Simulations of High Energy Emission from Pulsars | Dr. Revathi Jambunathan | |
SYB112 | Building Ensembles of Single Cell Predictive Expression Networks | Dr. Daniel Jacobson | |
TUR144 | Direct Numerical Simulation of Smooth-Body Flow Separation at a High Reynolds Number | Dr. Ali Uzun | |
TUR145 | High-fidelity simulations of particle-laden turbulent separating flows | Dr. Suhas Jain Suresh |