As Exascale Frontier Opens, Science Application Developers Share Pioneering Strategies

By Jonathan Hines

December 19, 2017

In November 2015, three colleagues representing the US Department of Energy (DOE) Office of Science’s three major supercomputing facilities struck up a conversation with a science and technology book publisher about a project to prepare a publication focusing on the future of application development in anticipation of pre-exascale and exascale supercomputers and the challenges posed by such systems.

Two years later, the fruits of that discussion became tangible in the form of a new book, which debuted at SC17. Exascale Scientific Applications: Scalability and Performance Portability captures programming strategies being used by leading experts across a wide spectrum of scientific domains to prepare for future high-performance computing (HPC) resources. The book’s initial collaborators and eventual coeditors are Tjerk Straatsma, Scientific Computing Group leader at the Oak Ridge Leadership Computing Facility (OLCF); Katerina Antypas, Data Department Head at the National Energy Research Scientific Computing Center (NERSC); and Timothy Williams, Deputy Director of Science at the Argonne Leadership Computing Facility (ALCF).

Twenty-four teams, including many currently participating in early science programs at the OLCF, ALCF, and NERSC, contributed chapters on preparing codes for next-generation supercomputers, in which they summarized approaches to make applications performance portable and to develop applications that align with trends in supercomputing technology and architectures.

In this interview, Straatsma, Antypas, and Williams discuss the significance of proactive application development and the benefits this work portends for the scientific community.

Tjerk Straatsma

How did this book come to be written?

Tjerk Straatsma: When we proposed writing the book, the intent was to provide application developers with an opportunity to share what they are doing today to take advantage of pre-exascale machines. These are the people doing the actual porting and optimization work. Through their examples, we hope that others will be inspired and get ideas about how to approach similar problems for their applications to do more and better science.

For quite some time, the three DOE ASCR [Advanced Scientific Computing Research] supercomputing facilities have been the leaders when it comes to working on performance portability for science applications. For our users, it’s very important that they can move from one system to another and continue their research at different facilities. That’s why DOE is very much interested in the whole aspect of portability—not just architectural portability but also performance portability. You want high performance on more than just a single system.

Katerina Antypas

Katerina Antypas: As the three of us discussed the different application readiness programs within our centers, it was clear that despite architectural differences between the systems at each center, the strategies to optimize applications for pre-exascale systems were quite similar. Sure, if a system has a GPU, a different semantic might be needed, but the processes of finding hot spots in codes, increasing data locality, and improving thread scalability were the same. And in fact, teams from NERSC, OLCF, and ALCF talked regularly about best practices and lessons learned preparing applications. We thought these lessons learned and case studies should be shared more broadly with the rest of the scientific computing community.

Timothy Williams: Nothing instructs the developer of scientific applications more clearly than an example. Capturing the efforts of our book’s authors as examples was an idea that resonated with us. Measuring and understanding the performance of applications at large scale is key for those developers, so we were glad we could include discussions about some of the tools that make that possible across multiple system architectures. Libraries supporting functions common to many applications, such as linear algebra, are an ideal approach to performance portability, so it made good sense to us to include this as a topic as well.

Tim Williams

Why is it important for these programming strategies to be shared now?

Straatsma: It’s important because DOE’s newest set of machines is starting to arrive. In 2016, NERSC delivered Cori, which comprises 9,688 Intel Xeon Phi Knights Landing processors, each with 68 cores. As we speak, the OLCF is building Summit—which will be around eight times more powerful than our current system, Titan, when it debuts in 2018. The ALCF is working to get its first exascale machine, Aurora, and the OLCF and NERSC are already working on the machines to follow their newest systems, at least one of which is likely to be an exascale machine.

It takes a long time to prepare codes for these new machines because they are becoming more and more complex. Hierarchies of processing elements, memory space, and communication networks are becoming more complex. Effectively using these resources requires significant effort porting applications. If you do that in a way that makes them portable between current machines, there’s a better chance that they will also be portable to future machines—even if you don’t know exactly what those systems will look like.

This is what this book is all about: providing a set of practical approaches that are currently being used by application development teams with the goal of getting applications to run effectively on future-generation architectures.

Antypas: There are three key technologies that applications need to take advantage of to achieve good performance on exascale systems: longer vector units, high bandwidth memory, and many low-powered cores. Regardless of vendor or specific architecture, future exascale systems will all have these features. The pre-exascale systems being deployed today—Cori at NERSC, Theta at ALCF, and Summit at OLCF—have early instances of exascale technologies that scientists can use to optimize their applications for the coming exascale architectures. Preparing applications for these changes now means better performing codes today and a smoother transition to exascale systems tomorrow.

Williams: Exascale computing is coming to the US in an accelerated timeframe—by 2021. This makes the work on applications, tools, and libraries documented in this book all the more relevant. Today is also a time of extraordinary innovation in both hardware and software technologies. Developing applications that are up to today’s state of the art, and well-positioned to adapt to those new technologies, is effort well spent.

What other major challenges are science and engineering application developers grappling with?

Straatsma: The biggest challenge is expressing parallelism across millions and millions—if not billions—of compute elements. That’s an algorithmic challenge. Then you have the hardware challenge, mapping those algorithms on to the specific hardware that you are targeting. Whether you have NVIDIA GPUs as accelerators together with IBM Power CPUs like on Summit or you’re looking at NERSC’s Cori system with its Intel Knights Landing processors, the basic story is the same: Taking the parallelism you’ve expressed and mapping it on to that hardware.

It’s a tall order, but, if done right, there is an enormous payoff because things that are being developed for these large pre-exascale machines tend to also lead to more efficient use of traditional architectures. In that sense, we’re at the forefront of the hardware with these machines, but we’re also at the forefront of the software. The benefits trickle down to the wider community.

Antypas: Besides the challenges associated with expressing on-node parallelism and improving data locality, scientists are grappling with the huge influx of data from experiments and observational facilities such as light sources, telescopes, sensors, and detectors, and how to incorporate data from these experiments into models and simulations. In the not too distant past, workflows started and ended within a supercomputing facility. Now, many user workflows start from outside of a computing facility and end with users needing to share data with a large collaboration. Data transfer, management, search, analysis, and curation have become large challenges for users.

Williams: Whether you view it as a challenge or an opportunity is a matter of perspective, but those developers who are themselves computational scientists are now more tightly coupled to the work of experimentalists and theorists. They are increasingly codependent. For example, cosmological simulations inform observational scientists of specific signs to look for in sky surveys, given an assumed set of parameter values for theoretical models. Particle-collider event simulations inform detectors at the experiment about what to look for, and what to ignore, in the search for rare particles—before the experiment is run.

How is scientific application development, which has traditionally entailed modeling and simulation, being influenced by data-driven discovery and artificial intelligence?

Straatsma: Most of the applications that we have in our current application readiness programs at the DOE computing facilities use traditional modeling and simulation, but artificial intelligence, machine learning, and deep learning are rapidly affecting the way we do computational science. Because of growth in datasets, it’s now possible to use these big machines to analyze data to discover underlying models. This is the broad area of data analytics. In our book, one such project is using seismic data analysis to derive models that are being used to get a better understanding of the Earth’s crust and interior.

In a sense, it’s doing computational science from the opposite direction than what has traditionally been done. Instead of having a model and simulating that model to create a lot of data that you use to learn things from your system, you start with potentially massive datasets—experimental or observational—and use inference methods to derive models, networks, or other features of interest.

Antypas: Machine learning and deep learning have revolutionized many fields already and are increasingly being used by NERSC users to solve science challenges important to the mission of the Department of Energy’s Office of Science. As part of a requirements-gathering process with the user community, scientists from every field represented noted they were exploring new methods for data analysis, including machine learning. We also expect scientists will begin to incorporate the inference step of learning directly into simulations.

Williams: Computational scientists now increasingly employ data-driven and machine learning approaches to answer the same science and engineering questions addressed by simulation. Fundamental-principles–based simulation and machine learning have some similarities. They can both address problems where there is no good, high-level theory to explain phenomena. For example, behavior of materials at the nanoscale, where conventional theories don’t apply, can be understood either by simulating the materials atom-by-atom or by using machine learning approaches to generate reduced models that predict behavior.

In the foreword, the contributors to this book are referred to as “the pioneers who will explore the exascale frontier.” How will their work benefit the larger scientific community?

Straatsma: In multiple ways. The most obvious benefit is that we get a set of applications that run very well on very large machines. If these are applications used by broad scientific communities, many researchers will benefit from them. The second benefit is in finding methodologies that can be translated to other codes or other application domains and be used to make these applications run very well on these new architectures. A third benefit is that application developers get a lot of experience doing this kind of work, and based on that experience, we have better ideas on how to approach the process of application readiness and performance portability.

Williams: With each step forward in large-scale parallel computing, a cohort of young scientists comes along for the ride, engaged in these pioneering efforts. The scale of this computing, and the sophistication of the software techniques employed, will become routine for them going forward. This is really just a manifestation of the advance of science, which builds on successes and corrects itself to be consistent with what we learn.

After coediting this volume, are there any key lessons that you hope readers take from this work?

Straatsma: I hope that people who are wondering about HPC at the scale we’re talking about will get inspired to think about what these future resources could do for their science or think bigger than what they’re thinking now. To draw one example from the book, astrophysicists are developing techniques for exascale systems that are projected to enable simulation of supernova explosions that include significantly larger kinetic networks than can be used today, and these systems can do this faster and more accurately. That’s just one example of the many described in this publication of exascale-capable applications with the promise of enabling computational science with more accurate models and fewer approximations, leading to more reliable predictions.

Oak Ridge National Laboratory is supported by the US Department of Energy’s Office of Science. The single largest supporter of basic research in the physical sciences in the United States, the Office of Science is working to address some of the most pressing challenges of our time. For more information, please visit science.energy.gov.

Jonathan Hines is a science writer at Oak Ridge National Laboratory.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

SC22 Unveils ACM Gordon Bell Prize Finalists

August 12, 2022

Courtesy of the schedule for the SC22 conference, we now have our first glimpse at the finalists for this year’s coveted Gordon Bell Prize. The Gordon Bell Prize, of course, comes with an award of $10,000 courtesy of H Read more…

Q&A with ORNL’s Bronson Messer, an HPCwire Person to Watch in 2022

August 12, 2022

HPCwire presents our interview with Bronson Messer, distinguished scientist and director of Science at the Oak Ridge Leadership Computing Facility (OLCF), ORNL, and an HPCwire 2022 Person to Watch. Messer recaps ORNL's journey to exascale and sheds light on how all the pieces line up to support the all-important science. Also covered are the role... Read more…

TACC Simulations Probe the First Days of Stars, Black Holes

August 12, 2022

The stunning images produced by the James Webb Space Telescope and recent supercomputer-enabled black hole imaging efforts have brought the early days of the universe quite literally into sharp focus. Researchers from th Read more…

Google Program to Free Chips Boosts University Semiconductor Design

August 11, 2022

A Google-led program to design and manufacture chips for free is becoming popular among researchers and computer enthusiasts. The search giant's open silicon program is providing the tools for anyone to design chips, which then get manufactured. Google foots the entire bill, from a chip's conception to delivery of the final product in a user's hand. Google's... Read more…

Argonne Deploys Polaris Supercomputer for Science in Advance of Aurora

August 9, 2022

Argonne National Laboratory has made its newest supercomputer, Polaris, available for scientific research. The system, which ranked 14th on the most recent Top500 list, is serving as a testbed for the exascale Aurora system slated for delivery in the coming months. The HPE-built Polaris system (pictured in the header) consists of 560 nodes... Read more…

AWS Solution Channel

Shutterstock 1519171757

Running large-scale CFD fire simulations on AWS for Amazon.com

This post was contributed by Matt Broadfoot, Senior Fire Strategy Manager at Amazon Design and Construction, and Antonio Cennamo ProServe Customer Practice Manager, Colin Bridger Principal HPC GTM Specialist, Grigorios Pikoulas ProServe Strategic Program Leader, Neil Ashton Principal, Computational Engineering Product Strategy, Roberto Medar, ProServe HPC Consultant, Taiwo Abioye ProServe Security Consultant, Talib Mahouari ProServe Engagement Manager at AWS. Read more…

Microsoft/NVIDIA Solution Channel

Shutterstock 1689646429

Gain a Competitive Edge using Cloud-Based, GPU-Accelerated AI KYC Recommender Systems

Financial services organizations face increased competition for customers from technologies such as FinTechs, mobile banking applications, and online payment systems. To meet this challenge, it is important for organizations to have a deep understanding of their customers. Read more…

US CHIPS and Science Act Signed Into Law

August 9, 2022

Just a few days after it was passed in the Senate, the U.S. CHIPS and Science Act has been signed into law by President Biden. In a ceremony today, Biden signed and lauded the ambitious piece of legislation, which over the course of the legislative process broadened to include hundreds of billions in additional science and technology spending. He was flanked by Speaker... Read more…

Q&A with ORNL’s Bronson Messer, an HPCwire Person to Watch in 2022

August 12, 2022

HPCwire presents our interview with Bronson Messer, distinguished scientist and director of Science at the Oak Ridge Leadership Computing Facility (OLCF), ORNL, and an HPCwire 2022 Person to Watch. Messer recaps ORNL's journey to exascale and sheds light on how all the pieces line up to support the all-important science. Also covered are the role... Read more…

Google Program to Free Chips Boosts University Semiconductor Design

August 11, 2022

A Google-led program to design and manufacture chips for free is becoming popular among researchers and computer enthusiasts. The search giant's open silicon program is providing the tools for anyone to design chips, which then get manufactured. Google foots the entire bill, from a chip's conception to delivery of the final product in a user's hand. Google's... Read more…

Argonne Deploys Polaris Supercomputer for Science in Advance of Aurora

August 9, 2022

Argonne National Laboratory has made its newest supercomputer, Polaris, available for scientific research. The system, which ranked 14th on the most recent Top500 list, is serving as a testbed for the exascale Aurora system slated for delivery in the coming months. The HPE-built Polaris system (pictured in the header) consists of 560 nodes... Read more…

US CHIPS and Science Act Signed Into Law

August 9, 2022

Just a few days after it was passed in the Senate, the U.S. CHIPS and Science Act has been signed into law by President Biden. In a ceremony today, Biden signed and lauded the ambitious piece of legislation, which over the course of the legislative process broadened to include hundreds of billions in additional science and technology spending. He was flanked by Speaker... Read more…

12 Midwestern Universities Team to Boost Semiconductor Supply Chain

August 8, 2022

The combined stressors of Covid-19 and the invasion of Ukraine have sent every major nation scrambling to reinforce its mission-critical supply chains – including and in particular the semiconductor supply chain. In the U.S. – which, like much of the world, relies on Asia for its semiconductors – those efforts have taken shape through the recently... Read more…

Quantum Pioneer D-Wave Rings NYSE Bell, Begins Life as Public Company

August 8, 2022

D-Wave Systems, one of the early quantum computing pioneers, has completed its SPAC deal to go public. Its merger with DPCM Capital was completed last Friday, and today, D-Wave management rang the bell on the New York Stock Exchange. It is now trading under two ticker symbols – QBTS and QBTS WS (warrant shares), respectively. Welcome to the public... Read more…

Supercomputer Models Explosives Critical for Nuclear Weapons

August 6, 2022

Lawrence Livermore National Laboratory (LLNL) is one of the laboratories that operates under the auspices of the National Nuclear Security Administration (NNSA), which manages the United States’ stockpile of nuclear weapons. Amid major efforts to modernize that stockpile, LLNL has announced that researchers from its own Energetic Materials Center... Read more…

SEA Changes: How EuroHPC Is Preparing for Exascale

August 5, 2022

Back in June, the EuroHPC Joint Undertaking – which serves as the EU’s concerted supercomputing play – announced its first exascale system: JUPITER, set to be installed by the Jülich Supercomputing Centre (FZJ) in 2023. But EuroHPC has been preparing for the exascale era for a much longer time: eight months... Read more…

Nvidia R&D Chief on How AI is Improving Chip Design

April 18, 2022

Getting a glimpse into Nvidia’s R&D has become a regular feature of the spring GTC conference with Bill Dally, chief scientist and senior vice president of research, providing an overview of Nvidia’s R&D organization and a few details on current priorities. This year, Dally focused mostly on AI tools that Nvidia is both developing and using in-house to improve... Read more…

Royalty-free stock illustration ID: 1919750255

Intel Says UCIe to Outpace PCIe in Speed Race

May 11, 2022

Intel has shared more details on a new interconnect that is the foundation of the company’s long-term plan for x86, Arm and RISC-V architectures to co-exist in a single chip package. The semiconductor company is taking a modular approach to chip design with the option for customers to cram computing blocks such as CPUs, GPUs and AI accelerators inside a single chip package. Read more…

The Final Frontier: US Has Its First Exascale Supercomputer

May 30, 2022

In April 2018, the U.S. Department of Energy announced plans to procure a trio of exascale supercomputers at a total cost of up to $1.8 billion dollars. Over the ensuing four years, many announcements were made, many deadlines were missed, and a pandemic threw the world into disarray. Now, at long last, HPE and Oak Ridge National Laboratory (ORNL) have announced that the first of those... Read more…

US Senate Passes CHIPS Act Temperature Check, but Challenges Linger

July 19, 2022

The U.S. Senate on Tuesday passed a major hurdle that will open up close to $52 billion in grants for the semiconductor industry to boost manufacturing, supply chain and research and development. U.S. senators voted 64-34 in favor of advancing the CHIPS Act, which sets the stage for the final consideration... Read more…

Top500: Exascale Is Officially Here with Debut of Frontier

May 30, 2022

The 59th installment of the Top500 list, issued today from ISC 2022 in Hamburg, Germany, officially marks a new era in supercomputing with the debut of the first-ever exascale system on the list. Frontier, deployed at the Department of Energy’s Oak Ridge National Laboratory, achieved 1.102 exaflops in its fastest High Performance Linpack run, which was completed... Read more…

Newly-Observed Higgs Mode Holds Promise in Quantum Computing

June 8, 2022

The first-ever appearance of a previously undetectable quantum excitation known as the axial Higgs mode – exciting in its own right – also holds promise for developing and manipulating higher temperature quantum materials... Read more…

AMD’s MI300 APUs to Power Exascale El Capitan Supercomputer

June 21, 2022

Additional details of the architecture of the exascale El Capitan supercomputer were disclosed today by Lawrence Livermore National Laboratory’s (LLNL) Terri Read more…

PsiQuantum’s Path to 1 Million Qubits

April 21, 2022

PsiQuantum, founded in 2016 by four researchers with roots at Bristol University, Stanford University, and York University, is one of a few quantum computing startups that’s kept a moderately low PR profile. (That’s if you disregard the roughly $700 million in funding it has attracted.) The main reason is PsiQuantum has eschewed the clamorous public chase for... Read more…

Leading Solution Providers

Contributors

ISC 2022 Booth Video Tours

AMD
AWS
DDN
Dell
Intel
Lenovo
Microsoft
PENGUIN SOLUTIONS

Exclusive Inside Look at First US Exascale Supercomputer

July 1, 2022

HPCwire takes you inside the Frontier datacenter at DOE's Oak Ridge National Laboratory (ORNL) in Oak Ridge, Tenn., for an interview with Frontier Project Direc Read more…

AMD Opens Up Chip Design to the Outside for Custom Future

June 15, 2022

AMD is getting personal with chips as it sets sail to make products more to the liking of its customers. The chipmaker detailed a modular chip future in which customers can mix and match non-AMD processors in a custom chip package. "We are focused on making it easier to implement chips with more flexibility," said Mark Papermaster, chief technology officer at AMD during the analyst day meeting late last week. Read more…

Intel Reiterates Plans to Merge CPU, GPU High-performance Chip Roadmaps

May 31, 2022

Intel reiterated it is well on its way to merging its roadmap of high-performance CPUs and GPUs as it shifts over to newer manufacturing processes and packaging technologies in the coming years. The company is merging the CPU and GPU lineups into a chip (codenamed Falcon Shores) which Intel has dubbed an XPU. Falcon Shores... Read more…

Nvidia, Intel to Power Atos-Built MareNostrum 5 Supercomputer

June 16, 2022

The long-troubled, hotly anticipated MareNostrum 5 supercomputer finally has a vendor: Atos, which will be supplying a system that includes both Nvidia and Inte Read more…

India Launches Petascale ‘PARAM Ganga’ Supercomputer

March 8, 2022

Just a couple of weeks ago, the Indian government promised that it had five HPC systems in the final stages of installation and would launch nine new supercomputers this year. Now, it appears to be making good on that promise: the country’s National Supercomputing Mission (NSM) has announced the deployment of “PARAM Ganga” petascale supercomputer at Indian Institute of Technology (IIT)... Read more…

Is Time Running Out for Compromise on America COMPETES/USICA Act?

June 22, 2022

You may recall that efforts proposed in 2020 to remake the National Science Foundation (Endless Frontier Act) have since expanded and morphed into two gigantic bills, the America COMPETES Act in the U.S. House of Representatives and the U.S. Innovation and Competition Act in the U.S. Senate. So far, efforts to reconcile the two pieces of legislation have snagged and recent reports... Read more…

AMD Lines Up Alternate Chips as It Eyes a ‘Post-exaflops’ Future

June 10, 2022

Close to a decade ago, AMD was in turmoil. The company was playing second fiddle to Intel in PCs and datacenters, and its road to profitability hinged mostly on Read more…

Exascale Watch: Aurora Installation Underway, Now Open for Reservations

May 10, 2022

Installation has begun on the Aurora supercomputer, Rick Stevens (associate director of Argonne National Laboratory) revealed today during the Intel Vision event keynote taking place in Dallas, Texas, and online. Joining Intel exec Raja Koduri on stage, Stevens confirmed that the Aurora build is underway – a major development for a system that is projected to deliver more... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire