NERSC and the HPC Community Bid Farewell to Cori Supercomputer

By Kathy Kincade

May 17, 2023

After nearly seven years of service, thousands of user projects, and tens of billions of compute hours, the Cori supercomputer at the National Energy Research Scientific Computing Center (NERSC) will be retired at the end of May. With its first cabinets installed in 2015 and the system fully deployed by 2016, Cori has been in service longer than any supercomputer in NERSC’s 49-year history and enabled more than 10,000 scientific publications. And its technological innovations reflect the dynamic evolution of high performance computing (HPC) over the past decade, paving the way for the next generation of scientific computing.

Cori was developed through a partnership with Intel, Cray (now HPE), and Los Alamos and Sandia National Laboratories. The Cray XC40 system was named in honor of biochemist Gerty Cori, the first American woman to win a Nobel Prize in science and the first woman to win a Nobel Prize for Physiology or Medicine. It comprises 2,388 Intel Xeon Haswell processor nodes, 9,688 Intel Xeon Phi Knight’s Landing (KNL) nodes, and a 1.8 PB Cray Data Warp Burst Buffer and has a peak performance of ~30 petaflops; when it debuted in 2017, it ranked fifth on the TOP500 list.

Gerty and Carl Cori. Image credit: The Smithsonian Institution.

Cori was also the first supercomputer to be installed from the ground up in Lawrence Berkeley National Laboratory’s (Berkeley Lab’s) then-new Shyh Wang Hall, influencing the building’s infrastructure and prompting the implementation of numerous energy-efficiency innovations on the system and throughout the facility. In addition, the introduction of Cori’s manycore KNL architecture changed the way NERSC interacts with users, leading to the implementation of the NERSC Exascale Science Applications Program (NESAP) and the NESAP post-doc program, both of which are still running strong as NERSC moves into the Perlmutter GPU era.

“Cori has been an exciting system for a number of reasons, including the fact that It was the first energy-efficient architecture that NERSC deployed,” said NERSC Director Sudip Dosanjh. “It was clear to us with the advent of exascale computing that to get more computational power we needed to go to an energy-efficient, manycore architecture.”

“Cori has been a workhorse for our center and a very productive environment for the user community,” added Katie Antypas, NERSC division deputy who has been involved with multiple procurements at NERSC, including Cori and Perlmutter. “It is also where we developed our data strategy, did Jupyter at scale for the first time, and were able to prototype and test a lot of the capabilities that are now on Perlmutter.”

The Impact of KNL, and More

Other innovative features introduced on Cori that have influenced current and next-generation architectures include the Burst Buffer (which laid the foundation for Perlmutter’s all-flash file system), high-bandwidth memory, increased vector capability, real-time queues, deep learning library support, Globus sharing connections, and workflow service nodes.

“With Cori, we deployed the manycore KNL, a more specialized processor that could yield higher performance,” said Jack Deslippe, who leads the Application Performance Group at NERSC, was involved with the procurement and development of Cori, and oversees the NESAP program. “It gave users the opportunity to use the high-bandwidth memory that the KNL processors have right on their chip and the manycore aspect of the chip, which had 68 cores – significantly higher than anything before that.”

The design and stability of the Cray Dragonfly interconnect and the machine’s cooling system, along with the modernization of the software stack, also enhanced its utility for users, noted Tina Declerck, division deputy for operations and project lead on the Cori installation and deployment.

“In terms of interconnectedness, the design team did a really good job of finding the right path through the system,” she said. “Cori has handled all kinds of network failures effectively, which allowed the science to continue without interruption via much faster communication, fewer hops, and faster node-to-node communication.”

Shyh Wang Hall, the building that has housed Cori and now Perlmutter, was also integral to this success, added Jeff Broughton, former deputy of operations at NERSC, who retired in 2022 after 13 years at NERSC.

“In 2015, we were building the new building and looking at what we had to do to get from the Oakland Scientific Facility (where NERSC had been located since 2001) to the Berkeley Lab campus with minimum impact on our users,” he said. “So we decided to acquire Cori Phase 1 (the Haswell partition) and install it in the new building so we could then turn off Edison and move it from Oakland to Berkeley Lab with essentially no disruption to service.”

The design of the building was uniquely influenced by the needs of the Cori system, Broughton added. “Normally, when we do site prep for a machine, all the electrical and plumbing work, etc., is done as part of the project. But in this case, it was done as part of the building construction.”

 

One key result of this was that the building was designed to run without any mechanical refrigeration for Cori and NERSC’s future supercomputing systems.

“Running big compressors to produce 50-degree F water is what consumes a huge amount of energy in a data center, so we decided early on to do the building without compressors, which is what allows NERSC to get its extraordinarily high energy efficiency,” Broughton said. “Serendipitously, Cray enabled us to do that by delivering a machine capable of running at 80 degrees F. Had they not been able to do that, we might have had to put refrigeration into the building or find an alternative.”

A New Design Approach

Cori’s impact on HPC and scientific computing goes well beyond its technological innovations. It also changed the way NERSC and others in the HPC community began to think about how supercomputers could be configured in ways that would better serve the user community. For example, prior to Cori, NERSC typically had a main supercomputer with several smaller clusters around it that served specific user communities, noted Pete Ungaro, former CEO of Cray, who continues to work in HPC as an industry consultant.

“Cori was where we worked to aggregate a lot of those unique capabilities in those unique systems into the main supercomputer itself,” he said. “NERSC had done this really interesting study for the Department of Energy about the cost of computing, and it showed that the small clusters around the supercomputer were more expensive to run and maintain than the main supercomputer platform. So we had a lot of discussions around how to bring these capabilities into the main supercomputer and make it more flexible and less monolithic. As a result, Cori made a dent in doing some unique things that people just weren’t able to do on supercomputers at the time.”

 

David Trebotich – a staff scientist in the Applied Numerical Algorithms Group at Berkeley Lab – has been one such user, and a prolific one at that. Cori has been instrumental in enabling him to scale up his research in entirely new ways, yielding previously unattainable results with his Chombo-Crunch software and beyond. His team’s numerous projects that have involved computing on Cori include subsurface flow and transportpaper manufacturing, and water desalination. He is also the principal investigator (PI) on the Chombo-Crunch project and application code development and performance portability lead on the ECP Subsurface project.

“I found Cori to be way more productive than I thought it was going to be early on, and the NESAP program had a lot to do with getting us up to production capability,” said Trebotich, whose simulations from some of his work are colorfully displayed in the Cori system panel art. “Among other things, we’ve been able to achieve really great performance with reduced memory footprint for high-resolution simulations of subsurface reactive transport processes and, in general, simulations of flow and transport in heterogeneous materials.”

“Cori gave the scientific community access to much bigger capability and resource,” Ungaro said. “Instead of having to use a smaller specialty cluster, they now could leverage this huge supercomputer to do bigger datasets with much higher throughput and try many more different experimentations.”

Users Take a Deeper Dive

Over the past decade, as the HPC and scientific communities began to move toward exascale and energy-efficient architectures, NERSC wanted to make sure the scientific community wasn’t left behind and could effectively use these next-generation systems, Deslippe noted. “A lot of what we have done with Cori, and now Perlmutter, has paved the way for the HPC ecosystem to continue transitioning.”

Former Secretary of Energy Rick Perry during an official visit to the Cori supercomputer at NERSC. Image credit: Berkeley Lab.

NESAP has been a key component of this evolution, and Cori was the catalyst for this program, Deslippe and others emphasized. Through NESAP, NERSC initially partnered with code teams and library and tool developers to prepare their codes for Cori’s manycore architecture. More recently, the program has done the same to help users optimize their applications for Perlmutter’s GPU architecture.

“Cori was the first time that NERSC worked with users to optimize for a new platform,” said Rebecca Hartman-Baker, who leads NERSC’s User Engagement Group. “The KNL manycore architecture was very unique and innovative at the time, and people needed some help in understanding how to use it. Once they did, it really took off.”

This dynamic has had a lasting impact on how NERSC engages with its users, Deslippe added. “It caused us to rethink how we engage with the user community and the application developer community and to work with them at a much deeper level than we had before,” he said. “We’d always had a consulting team to answer user questions and help them compile and build applications. But with Cori, we formed a new team to work with users directly on their codes and go into the trenches with them as they were preparing their applications for the new architecture.”

Part of this effort involved partnering with colleagues in Berkeley Lab’s Computational Research Division (now two divisions: Applied Math and Computational Research, and Scientific Data) to design tools and performance models – such as Roofline – that users could apply to understand performance in an absolute sense and better determine which directions would be profitable for them to target on the new processors, Deslippe added.

“Our users are typically scientists first and not necessarily computer architects, so what we wanted to do and were successful in doing was coming up with tools that would lower the barrier of entry and make it easier for users to understand performance on the system,” he said.

Panorama view of the Cori Supercomputer

Cori also changed the way NERSC interacts with vendors, particularly from a design perspective, noted Nick Wright, who leads the Advanced Technologies Group at NERSC and was chief architect on Cori (NERSC-8), Perlmutter (NERSC-9), and now on NERSC-10 (in development).

“Cori was the first machine where we partnered with the vendors more than just bought the machine from them,” he said. “It was also the first system we procured jointly with LANL, Sandia, and Cray, and the first one where we did a non-recurring engineering (NRE) project (the Burst Buffer). The experience with the NRE taught us the value of strong and deep co-design with vendors.”

Wright sees many of these trends continuing for current and future system procurements, and he considers the Cori procurement pivotal to NERSC’s progression to exascale and beyond. “It is really clear, looking out into the future, that tighter co-design partnerships with vendors will be even more necessary,” he said.

In the long run, Cori laid the groundwork for a new generation of HPC architectures and served as a testing ground for many features that are now on Perlmutter and other supercomputing systems. It also enabled numerous ground-breaking scientific achievements, from environmental, chemical, energy, materials science, applied physics, and nuclear physics research to climate, biology, cosmology, and quantum computing simulations.

“Cori was a very exciting system to bring in,” Deslippe said. “It was an all-hands-on-deck activity to make Cori a productive system for users, and it took every bit of NERSC’s expertise. For NERSC staff, there was a lot of excitement around the challenge of deploying a first-of-its-kind system like this.”

“I look at Cori as another step in the evolution of HPC,” Broughton added. “Basically, it shows that NERSC continues to be on the leading edge of deploying new and novel systems that will help our users maintain their advantage in scientific computing.”

NERSC is a DOE Office of Science user facility.

About NERSC and Berkeley Lab
The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 7,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.


Source: NERSC, Photos courtesy NERSC/Berkeley Lab

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

HPC User Forum: Sustainability at TACC Points to Software

October 3, 2023

Recently, Dan Stanzione, Executive Director, TACC and Associate Vice President for Research, UT-Austin, gave a presentation on HPC sustainability at the Fall 2023 HPC Users Forum. The complete set of slides is available Read more…

Google’s Controversial AI Chip Paper Under Scrutiny Again 

October 3, 2023

A controversial research paper by Google that claimed the superiority of AI techniques in creating chips is under the microscope for the authenticity of its claims. Science publication Nature is investigating Google's c Read more…

Rust Busting: IBM and Boeing Battle Corrosion with Simulations on Quantum Computer

October 3, 2023

The steady research into developing real-world applications for quantum computing is piling up interesting use cases. Today, IBM reported on work with Boeing to simulate corrosion processes to improve composites used in Read more…

Nvidia Delivering New Options for MLPerf and HPC Performance

September 28, 2023

As HPCwire reported recently, the latest MLperf benchmarks are out. Not unsurprisingly, Nvidia was the leader across many categories. The HGX H100 GPU systems, which contain eight H100 GPUs, delivered the highest throughput on every MLPerf inference test in this round. Read more…

Hakeem Oluseyi Explores His Unlikely Journey from the Street to the Stars in SC23 Keynote

September 28, 2023

Defying the odds In the heart of one of the toughest neighborhoods in the country, young Hakeem Oluseyi’s world was a confined space, but his imagination soared to the stars. While other kids roamed the streets, he Read more…

AWS Solution Channel

Shutterstock 2338659951

VorTech Derisks Innovative Technology to Aid Global Water Sustainability Challenges Using Cloud-Native Simulations on AWS

Overview

No more than 1 percent of the world’s water is readily available fresh water, according to the US Geological Survey. Read more…

QCT Solution Channel

QCT and Intel Codeveloped QCT DevCloud Program to Jumpstart HPC and AI Development

Organizations and developers face a variety of issues in developing and testing HPC and AI applications. Challenges they face can range from simply having access to a wide variety of hardware, frameworks, and toolkits to time spent on installation, development, testing, and troubleshooting which can lead to increases in cost. Read more…

Nvidia Takes Another Shot at Trying to Get AI to Mobile Devices

September 28, 2023

Nvidia takes another shot at trying to get to mobile devices Long before the current situation of Nvidia's GPUs holding AI hostage, the company tried to put its chips in mobile devices but failed. The Tegra mobile chi Read more…

Shutterstock 1927423355

Google’s Controversial AI Chip Paper Under Scrutiny Again 

October 3, 2023

A controversial research paper by Google that claimed the superiority of AI techniques in creating chips is under the microscope for the authenticity of its cla Read more…

Rust Busting: IBM and Boeing Battle Corrosion with Simulations on Quantum Computer

October 3, 2023

The steady research into developing real-world applications for quantum computing is piling up interesting use cases. Today, IBM reported on work with Boeing to Read more…

Nvidia Delivering New Options for MLPerf and HPC Performance

September 28, 2023

As HPCwire reported recently, the latest MLperf benchmarks are out. Not unsurprisingly, Nvidia was the leader across many categories. The HGX H100 GPU systems, which contain eight H100 GPUs, delivered the highest throughput on every MLPerf inference test in this round. Read more…

IonQ Announces 2 New Quantum Systems; Suggests Quantum Advantage is Nearing

September 27, 2023

It’s been a busy week for IonQ, the quantum computing start-up focused on developing trapped-ion-based systems. At the Quantum World Congress today, the compa Read more…

Rethinking ‘Open’ for AI

September 27, 2023

What does “open” mean in the context of AI? Must we accept hidden layers? Do copyrights and patents still hold sway? And do consumers have the right to opt Read more…

Aurora Image

Leveraging Machine Learning in Dark Matter Research for the Aurora Exascale System 

September 25, 2023

Scientists have unlocked many secrets about particle interactions at atomic and subatomic levels. However, one mystery that has eluded researchers is dark matte Read more…

Watsonx Brings AI Visibility to Banking Systems

September 21, 2023

A new set of AI-based code conversion tools is available with IBM watsonx. Before introducing the new "watsonx," let's talk about the previous generation Watson Read more…

Intel’s Gelsinger Lays Out Vision and Map at Innovation 2023 Conference

September 20, 2023

Intel’s sprawling, optimistic vision for the future was on full display yesterday in CEO Pat Gelsinger’s opening keynote at the Intel Innovation 2023 confer Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

ISC 2023 Booth Videos

Cornelis Networks @ ISC23
Dell Technologies @ ISC23
Intel @ ISC23
Lenovo @ ISC23
Microsoft @ ISC23
ISC23 Playlist
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire