Exascale Computing Project: Building a Capable Computing Ecosystem for Exascale and Beyond

September 18, 2023

Sept. 18, 2023 — With the delivery of the U.S. Department of Energy’s (DOE’s) first exascale system, Frontier, in 2022, and the upcoming deployment of Aurora and El Capitan systems by next year, researchers will have the most sophisticated computational tools at their disposal to conduct groundbreaking research. Exascale machines, which can perform more than a quintillion operations per second, are 1,000 times faster and more powerful than their petascale predecessors, enabling simulations of complex physical phenomena in unprecedented detail to push the boundaries of scientific understanding well beyond its current limits.

This incredible feat of research, development, and deployment has been made possible through a national effort to maximize the benefits of high-performance computing (HPC) for strengthening U.S. economic competitiveness and national security. The Exascale Computing Project (ECP) has been an integral part of that endeavor.

Seven years ago, DOE’s Office of Science and National Nuclear Security Administration embarked on a fundamentally different approach to advance HPC capabilities in the national interest. Within the HPC community, application developers, software technology experts, and hardware vendors tend to work independently toward producing products that are later integrated together. While effective, this process can at times create an implementation gap between software tools and how the applications can best utilize them to exploit the full performance of new, more advanced machines—slowing the realization of full computing capability. ECP recognized this challenge and strategically brought these different groups together as one community at the outset—fostering a computing ecosystem that supports co-design of applications, software, and hardware to accelerate scientific innovation and technical readiness for exascale systems.

Fast forward to 2023—the project’s final year—and ECP collaborations have involved more than 1,000 team members working on 25 different mission-critical applications for research in areas ranging from energy and environment to materials and data science; 70 unique software products; and integrated continuous testing and delivery of ECP products on targeted DOE systems. The results achieved as part of the ECP ecosystem development reflect the synergy, interdependency, and collaboration forged between the project’s three focus areas—Application Development, Software Technology, and Hardware and Integration—and the close working relationships with DOE HPC facilities and the vendors that are fielding the exascale machines. “ECP emphasizes the commonalities between each of the focus areas and provides an environment where we can identify with each other, share experiences and ideas, and understand one another while still being unique in our abilities,” says Andrew Siegel, a senior scientist at Argonne National Laboratory and the director for ECP Applications Development. “ECP has provided the stability and needed vision for a diverse community to work together to achieve targets we all care about.”

A Computing Symphony

ECP is an extensive, seven-year, $1.8 billion project that harnesses the collective brainpower of computer science experts from DOE national laboratories, universities, and industrial partners under a single funding umbrella.

With this funding paradigm, integrated teams have been able to surpass their target goal of 50 times the application performance of the 20-petaflop (floating-point operations per second) systems in use when ECP began in 2016 and 5 times the performance of the 200-petaflop Summit supercomputer (ranked the world’s most powerful computer in 2018 and 2019). Mike Heroux, a senior scientist at Sandia National Laboratories and the director for ECP Software Technology, says “ECP is unique in that everyone involved in the project has the same mission, and we have healthy funding that is holistic across all the participating organizations, so we can collaborate in ways that have been essential to our success.”

At a basic level, software technology products such as math libraries, input/output (I/O) libraries, and performance analysis tools, provide the building blocks for applications—sophisticated computer programs that run complex underlying mathematical calculations to deliver the necessary predictive capabilities. Applications are dependent upon the available software products, and both must be developed with computing architectures in mind—what types of processors are used, for example—so that they will run efficiently and effectively when integrated. Together, these three pieces—applications, software technology, and hardware—orchestrate the computing symphony that enables advanced scientific simulation.

According to Erik Draeger, the Scientific Computing group leader in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory and the deputy director for ECP Applications Development, the functional model that has existed for most of computational science can be likened to frontier homesteading. “In general, homesteaders supply for their own needs and build their own structures. They may get input, but they still want to build and fix it themselves. They want to understand how it works,” he says. “In this case, the homesteader isn’t going to be comfortable with having a service where someone shows up and fertilizes his field once a week. He would have a hard time trusting that the person would do it the way he would want it done because the situation is counter to the model.” As the computing landscape becomes more complex, this homesteader mentality becomes less tractable. HPC has become so involved that it’s not an efficient practice for one person to be an expert in every subdomain. Draeger continues, “ECP enabled these different groups—applications, software, and hardware—to establish healthy, collaborative working relationships where specialists in each area came together to create something greater than the sum of its parts.”

In the past, development paths for applications and software technologies have often been somewhat disconnected from one another. Part of the reason for this disconnect is that technology products are not typically created with specific applications in mind. Although this more isolated approach has produced software products that have become extremely useful for many applications with large user communities, these are the exceptions rather than the rule. With ECP, working together was a prerequisite for participation. “From the beginning, the teams had this so-called ‘shared fate,’” says Siegel. When incorporating new capabilities, applications teams had to consider relevant software tools developed by others that could help meet their performance targets, and if they didn’t choose to use them, they needed to justify why not. Simultaneously, software technology teams had their success measured by the number of sustainable integrations they achieved with applications and other users of the products. “This early communication incentivized teams to be knowledgeable of each other’s work and identify gaps between what the application teams needed and what the software technologies could provide,” says Siegel. “Initially, we had to foster these types of collaborations, but eventually the process gained momentum. Teams wanted to help each other and demonstrate that effort quantitatively.”

Creating this type of push–pull effect, where teams can iterate back and forth, offers other benefits in addition to improved applications performance. Heroux says, “A substantial level of effort is required to integrate a library or utilize a tool, which results in a short-term loss in productivity because you first must learn how to do it. However, once you’ve made the investment, then you reap the benefits going forward if those libraries and tools are high quality, and for us, quality was a top priority.” Such collaborations also boost confidence in the products being provided.

By having the application developers leverage the libraries and tools from the software technology teams, software experts gleaned important information about how to adapt and build upon existing technologies to meet the needs of the exascale user. ECP enabled this type of interaction by providing an environment where that creative problem solving could occur in a collaborative space, and the benefits of that paradigm are evident in the types of projects that have thrived over the last seven years.

 

Click here to continue reading...


Source: Caryn Meissner, ECP

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Watsonx Brings AI Visibility to Banking Systems

September 21, 2023

A new set of AI-based code conversion tools is available with IBM watsonx. Before introducing the new "watsonx," let's talk about the previous generation Watson, perhaps better known as "Jeopardy!-Watson." The origi Read more…

Researchers Advance Topological Superconductors for Quantum Computing

September 21, 2023

Quantum computers process information using quantum bits, or qubits, based on fragile, short-lived quantum mechanical states. To make qubits robust and tailor them for applications, researchers from the Department of Ene Read more…

Fortran: Still Compiling After All These Years

September 20, 2023

A recent article appearing in EDN (Electrical Design News) points out that on this day, September 20, 1954, the first Fortran program ran on a mainframe computer. Originally developed by IBM, Fortran (or FORmula TRANslat Read more…

Intel’s Gelsinger Lays Out Vision and Map at Innovation 2023 Conference

September 20, 2023

Intel’s sprawling, optimistic vision for the future was on full display yesterday in CEO Pat Gelsinger’s opening keynote at the Intel Innovation 2023 conference being held in San Jose. While technical details were sc Read more…

Intel Showcases “AI Everywhere” Strategy in MLPerf Inferencing v3.1

September 18, 2023

Intel used the latest MLPerf Inference (version 3.1) results as a platform to reinforce its developing “AI Everywhere” vision, which rests upon 4th gen Xeon CPUs and Gaudi2 (Habana) accelerators. Both fared well on t Read more…

AWS Solution Channel

Shutterstock 1679562793

How Maxar Builds Short Duration ‘Bursty’ HPC Workloads on AWS at Scale

Introduction

High performance computing (HPC) has been key to solving the most complex problems in every industry and has been steadily changing the way we work and live. Read more…

QCT Solution Channel

QCT and Intel Codeveloped QCT DevCloud Program to Jumpstart HPC and AI Development

Organizations and developers face a variety of issues in developing and testing HPC and AI applications. Challenges they face can range from simply having access to a wide variety of hardware, frameworks, and toolkits to time spent on installation, development, testing, and troubleshooting which can lead to increases in cost. Read more…

Survey: Majority of US Workers Are Already Using Generative AI Tools, But Company Policies Trail Behind

September 18, 2023

A new survey from the Conference Board indicates that More than half of US employees are already using generative AI tools, at least occasionally, to accomplish work-related tasks. Yet some three-quarters of companies st Read more…

Watsonx Brings AI Visibility to Banking Systems

September 21, 2023

A new set of AI-based code conversion tools is available with IBM watsonx. Before introducing the new "watsonx," let's talk about the previous generation Watson Read more…

Intel’s Gelsinger Lays Out Vision and Map at Innovation 2023 Conference

September 20, 2023

Intel’s sprawling, optimistic vision for the future was on full display yesterday in CEO Pat Gelsinger’s opening keynote at the Intel Innovation 2023 confer Read more…

Intel Showcases “AI Everywhere” Strategy in MLPerf Inferencing v3.1

September 18, 2023

Intel used the latest MLPerf Inference (version 3.1) results as a platform to reinforce its developing “AI Everywhere” vision, which rests upon 4th gen Xeon Read more…

China’s Quiet Journey into Exascale Computing

September 17, 2023

As reported in the South China Morning Post HPC pioneer Jack Dongarra mentioned the lack of benchmarks from recent HPC systems built by China. “It’s a we Read more…

Nvidia Releasing Open-Source Optimized Tensor RT-LLM Runtime with Commercial Foundational AI Models to Follow Later This Year

September 14, 2023

Nvidia's large-language models will become generally available later this year, the company confirmed. Organizations widely rely on Nvidia's graphics process Read more…

MLPerf Releases Latest Inference Results and New Storage Benchmark

September 13, 2023

MLCommons this week issued the results of its latest MLPerf Inference (v3.1) benchmark exercise. Nvidia was again the top performing accelerator, but Intel (Xeo Read more…

Need Some H100 GPUs? Nvidia is Listening

September 12, 2023

During a recent earnings call, Tesla CEO Elon Musk, the world's richest man, summed up the shortage of Nvidia enterprise GPUs in a few sentences.  "We're us Read more…

Intel Getting Squeezed and Benefiting from Nvidia GPU Shortages

September 10, 2023

The shortage of Nvidia's GPUs has customers searching for scrap heap to kickstart makeshift AI projects, and Intel is benefitting from it. Customers seeking qui Read more…

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

Leading Solution Providers

Contributors

CORNELL I-WAY DEMONSTRATION PITS PARASITE AGAINST VICTIM

October 6, 1995

Ithaca, NY --Visitors to this year's Supercomputing '95 (SC'95) conference will witness a life-and-death struggle between parasite and victim, using virtual Read more…

SGI POWERS VIRTUAL OPERATING ROOM USED IN SURGEON TRAINING

October 6, 1995

Surgery simulations to date have largely been created through the development of dedicated applications requiring considerable programming and computer graphi Read more…

U.S. Will Relax Export Restrictions on Supercomputers

October 6, 1995

New York, NY -- U.S. President Bill Clinton has announced that he will definitely relax restrictions on exports of high-performance computers, giving a boost Read more…

Dutch HPC Center Will Have 20 GFlop, 76-Node SP2 Online by 1996

October 6, 1995

Amsterdam, the Netherlands -- SARA, (Stichting Academisch Rekencentrum Amsterdam), Academic Computing Services of Amsterdam recently announced that it has pur Read more…

Cray Delivers J916 Compact Supercomputer to Solvay Chemical

October 6, 1995

Eagan, Minn. -- Cray Research Inc. has delivered a Cray J916 low-cost compact supercomputer and Cray's UniChem client/server computational chemistry software Read more…

NEC Laboratory Reviews First Year of Cooperative Projects

October 6, 1995

Sankt Augustin, Germany -- NEC C&C (Computers and Communication) Research Laboratory at the GMD Technopark has wrapped up its first year of operation. Read more…

Sun and Sybase Say SQL Server 11 Benchmarks at 4544.60 tpmC

October 6, 1995

Mountain View, Calif. -- Sun Microsystems, Inc. and Sybase, Inc. recently announced the first benchmark results for SQL Server 11. The result represents a n Read more…

New Study Says Parallel Processing Market Will Reach $14B in 1999

October 6, 1995

Mountain View, Calif. -- A study by the Palo Alto Management Group (PAMG) indicates the market for parallel processing systems will increase at more than 4 Read more…

ISC 2023 Booth Videos

Cornelis Networks @ ISC23
Dell Technologies @ ISC23
Intel @ ISC23
Lenovo @ ISC23
Microsoft @ ISC23
ISC23 Playlist
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire