Rice Oil and Gas Keynote Offers Exascale Lessons and Thoughts on Technology Uptake

By John Russell

March 11, 2021

For the better part of a decade the U.S. Exascale Computing Initiative (ECI) has been churning along vigorously. The first exascale supercomputer – Frontier – is expected this year, with Aurora and El Capitan to follow. How much of the exascale-derived technology will diffuse through the broader HPC landscape and how soon? Andrew Siegel, director of application development for the Exascale Computing Project (ECP), the software arm of ECI, took a stab at that question as well as summarizing overall ECP progress in his keynote at last week’s annual Rice Oil and Gas HPC Conference.

Andrew Siegel, ECP and ANL

“I’ll update you how things have gone generally in the (ECP) project, what’s been harder than expected, what’s gone surprisingly well, what remains to be done, some implications for the future. [But] before we begin, let me start by posing some fundamental questions that might be on your minds during the talk. It’s very important for me to remember that not everybody is operating at the bleeding edge of high performance computing, and that most of what happens is at the sort of mid-range,” said Siegel.

“One question is, to what degree will this initial U.S. exascale technology impact what people see at mid-range high performance computing? And is this inevitable? Or might we not see such an impact? What are viable alternatives for people making procurements in the next several years? I’ve talked to lots of different groups who wonder, is it time to buy, for example, a GPU-based system now? Should I wait? What are the implications of waiting? How do I make this decision? What about alternatives to the technologies that have been chosen by the U.S. for its first push to exascale? For example, the ARM64 system in Fugaku? How long will these architectures be relevant? So what is next after what we see in this first wave of exascale?”

Good questions.

Siegel’s answers, perhaps predictably, were more guarded. It’s early days for much of the technology, and picking broadly useable winners isn’t easy, but Siegel’s fast-moving reprise of the ECP experience and lessons learned are nevertheless valuable. Near the top of the list, for example, was the role of domain experts in adapting applications for the forthcoming exascale systems, all of which are GPU-accelerated.

“In almost all cases the AD (application development) teams are led by domain scientists. The domain scientist obviously understands the modeling problem, how it relates to validation and verification, and the numerics. They don’t understand anything close to all of the complexity of the hardware, and the sort of hardware algorithm interfaces necessary to pull this off. So the teams themselves are hybrid have people with expertise in applied math and computer science and in sort of software engineering on them. [The] most successful have either put all of this together and a very diverse team,” Siegel said.

To give you a sense of the challenge:

“There are lessons that I’ve learned in overseeing these projects for five or six years now. The first is that one has to be able to extract massive parallelism from the algorithm. That goes without saying, but sometimes we lose a sense of how massive massive is. [If] we just think about Summit (pre-exascale system) to literally map everything to all available parallelism will be 73 million degrees of parallelism and that does not account for the need to over-subscribe to a GPU-type architecture so that it can schedule efficiently. So you can imagine how going into future systems, billion-way parallelism is the starting point for being able to get efficient use out of those systems,” said Siegel.

ECP, of course, has been the main vehicle charged with ensuring there is a software ecosystem able to take advantage of the coming exascale systems. This includes three interrelated areas of focus: hardware and integration; software technology; and application development. Much of the early work was done on pre-exascale Summit and Sierra systems which share the same architecture and rely on Nvidia GPUs. That relative simplicity will change as the exascale portfolio will include systems that also use AMD and Intel GPUs.

Siegel’s AD group has been focused on preparing applications for the systems. ECP settled on six application areas (national security, energy security, economic security, earth systems, and health care) and 24 applications with a significant focus on simulation and data-driven (AI) approaches.

“We were looking for a certain number of guinea pigs who were willing to sort of work on the cutting edge and take a risk. And help both understand how to use these systems do science on these systems as well as contribute to the maturation of the systems at the same time. So there was a difficult RFP process you can imagine. But in the end 24 applications were chosen to be part of this push, and we see them as kind of leading the way into the exascale era,” said Siegel.

“Over 10 billion lines of code were represented. One thing that is very critical is that many of these codes supported, at least in our field, what we consider to be large user communities. So it might be up to 10,000 people or so for thinking about computational chemistry, but [that] can easily be in the hundreds [of thousands]. For other applications, molecular dynamics could be a lot, astrophysics could still be 100 research teams, computational fluid dynamics could be more,” he said.

Clearly that’s a daunting task which is now nearing completion. Siegel discussed both specific applications as well as more general software issues.

“All of the 24 applications I mentioned have gone through the following transition. So they’ve gone from the sort of CPU or multi-threaded CPU, to the CPU plus single GPU, to CPU working with multiple GPUs and that brings in new challenges to diverse multi-GPU architectures. That includes early hardware that we have access to from Intel and AMD and the new Nvidia hardware features that are targeting AI workflows. All projects have ported to the Summit and Sierra architecture and they have performance increases, which is quantified by fairly complex figures of merit (FOM) that are unique to each of these applications between a factor of 20 and a factor of 300. Our successes on the Summit platform have been a major story of the project. And that’s a different talk,” said Siegel.

“One thing that we learned that was a surprise to me and that I can’t emphasize enough is that there’s really a hierarchy of porting approaches that touches all aspects of simulation. We think of code porting as the reordering loops or changing data structures or memory coalescing, whatever the case might be. But we also have things that are more fundamental algorithmic restructuring; that could include things like communication avoiding algorithms, reduced synchronization, or use of specialized hardware. And we think of alternate discretizations, like approaching a problem using higher order methods because they are more amenable to the hardware,” said Siegel.

“Now we think of entirely new physical models [because] we have all this new computing power. So an interesting consequence of this big shift in computing hardware is [it has] had a significant impact on all aspects of simulation strategy. It’s been, in most cases, difficult to simply port the same approach, and take full advantage of the accelerator based systems.”

Not surprisingly, porting apps to the new hardware was challenging and sometimes posed critical choices for dealing with the strengths and drawbacks associated with weak scaling and strong scaling.

“There were a lot of clever strategies for mitigating the negative impacts of strong scaling with accelerated based systems. There were a lot of issues with the maturity of the software ecosystem that HPC depends on on the Early Access machines. So things like dense matrix operations, things that need to perform well. When you think about running on one of these machines, you have to think about the maturity of everything around the hardware, not the hardware itself. The performance of OpenMP offload, and strategies for GPU residence, and the role of unified virtual memory and achieving that.

“A really interesting question that’s begun to emerge as we’ve gotten more GPUs on the node and the nodes have become more and more complex is increased costs, relatively, of internode communication. So MPI implementations now, which weren’t really an issue at 10,000 nodes, now have to keep up with the incredible performance on a single node. People are starting to say that’s what our real bottleneck is. That was not the case until this point in the project,” said Siegel.

Despite moving quickly, Siegel dug into many of the challenges encountered. In this sense, his talk is best watched/listed to directly and there are plans to post it. Of note, incidentally, are plans to slightly modify the conference name. Next year will be the 15th year and it will become Rice Energy High Performance Computing Conference.

Circling back to his opening comments, Siegel closed with broad thoughts on how quickly technology developed for the exascale program will filter throughout the broader HPC world. On balance, he said, it’s fine to wait and smart to prepare.

“If I go back to the original questions (slide repeated below) I started with, I do not have answers to these questions. So much depends on your own circumstance. But if I say to what degree is this technology going to impact midrange computing? I’d say a significant impact is highly likely and there’s an impact already. Are the viable alternatives? Absolutely. [There] doesn’t have to be huge rush. x86 or Arm-based systems, with or without the special vector extensions, are certainly viable alternatives.

“I would say learn about and reason about how one’s code would map to these types of systems before you dive in headfirst. I’m speaking to people who are sort of on the sidelines. One of the important questions is, even if you can port more easily, what’s the cost of porting performance relative to a multi-GPU system? I think understanding and evaluating codes is important, even though it’s perfectly reasonable to take a wait-and-see attitude if you’re doing a procurement. The software cost porting can be very low if you have localized operations, high intensity, most of your performance is a very small part of your code. But it could be quite high when you’re doing things that are not computationally-intensive when you have performance spread out all around your code [and] when you have very complex data structures,” he said.

“One has to also remember that the things that I list below are all evolving and they’re still relatively immature and they’ll be much better soon. So we’ll begin to coalesce around programming models; we will see a thinning of the number of options and a hardening of the best ones.”

Link to 2021 Rice Oil and Gas High Performance Conference: https://rice2021oghpc.rice.edu/programs/

Slides are from Siegel’s keynote

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

With New Owner and New Roadmap, an Independent Omni-Path Is Staging a Comeback

July 23, 2021

Put on a shelf by Intel in 2019, Omni-Path faced a uncertain future, but under new custodian Cornelis Networks, OmniPath is looking to make a comeback as an independent high-performance interconnect solution. A "significant refresh" – called Omni-Path Express – is coming later this year according to the company. Cornelis Networks formed last September as a spinout of Intel's Omni-Path division. Read more…

PEARC21 Panel Reviews Eight New NSF-Funded HPC Systems Debuting in 2021

July 23, 2021

Over the past few years, the NSF has funded a number of HPC systems to further supply the open research community with computational resources to meet that community’s changing and expanding needs. A review of these systems at the PEARC21 conference (July 19-22) highlighted... Read more…

Chameleon’s HPC Testbed Sharpens Its Edge, Presses ‘Replay’

July 22, 2021

“One way of saying what I do for a living is to say that I develop scientific instruments,” said Kate Keahey, a senior fellow at the University of Chicago and a computer scientist at Argonne National Laboratory, as s Read more…

PEARC21 Plenary Session: AI for Innovative Social Work

July 21, 2021

AI analysis of social media poses a double-edged sword for social work and addressing the needs of at-risk youths, said Desmond Upton Patton, senior associate dean, Innovation and Academic Affairs, Columbia University. S Read more…

Summer Reading: “High-Performance Computing Is at an Inflection Point”

July 21, 2021

At last month’s 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART), a group of researchers led by Martin Schulz of the Leibniz Supercomputing Center (Munich) presented a “position paper” in which they argue HPC architectural landscape... Read more…

AWS Solution Channel

Accelerate innovation in healthcare and life sciences with AWS HPC

With Amazon Web Services, researchers can access purpose-built HPC tools and services along with scientific and technical expertise to accelerate the pace of discovery. Whether you are sequencing the human genome, using AI/ML for disease detection or running molecular dynamics simulations to develop lifesaving drugs, AWS has the infrastructure you need to run your HPC workloads. Read more…

PEARC21 Panel: Wafer-Scale-Engine Technology Accelerates Machine Learning, HPC

July 21, 2021

Early use of Cerebras’ CS-1 server and wafer-scale engine (WSE) has demonstrated promising acceleration of machine-learning algorithms, according to participants in the Scientific Research Enabled by CS-1 Systems panel Read more…

With New Owner and New Roadmap, an Independent Omni-Path Is Staging a Comeback

July 23, 2021

Put on a shelf by Intel in 2019, Omni-Path faced a uncertain future, but under new custodian Cornelis Networks, OmniPath is looking to make a comeback as an independent high-performance interconnect solution. A "significant refresh" – called Omni-Path Express – is coming later this year according to the company. Cornelis Networks formed last September as a spinout of Intel's Omni-Path division. Read more…

Chameleon’s HPC Testbed Sharpens Its Edge, Presses ‘Replay’

July 22, 2021

“One way of saying what I do for a living is to say that I develop scientific instruments,” said Kate Keahey, a senior fellow at the University of Chicago a Read more…

Summer Reading: “High-Performance Computing Is at an Inflection Point”

July 21, 2021

At last month’s 11th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies (HEART), a group of researchers led by Martin Schulz of the Leibniz Supercomputing Center (Munich) presented a “position paper” in which they argue HPC architectural landscape... Read more…

PEARC21 Panel: Wafer-Scale-Engine Technology Accelerates Machine Learning, HPC

July 21, 2021

Early use of Cerebras’ CS-1 server and wafer-scale engine (WSE) has demonstrated promising acceleration of machine-learning algorithms, according to participa Read more…

15 Years Later, the Green500 Continues Its Push for Energy Efficiency as a First-Order Concern in HPC

July 15, 2021

The Green500 list, which ranks the most energy-efficient supercomputers in the world, has virtually always faced an uphill battle. As Wu Feng – custodian of the Green500 list and an associate professor at Virginia Tech – tells it, “noone" cared about energy efficiency in the early 2000s, when the seeds... Read more…

Frontier to Meet 20MW Exascale Power Target Set by DARPA in 2008

July 14, 2021

After more than a decade of planning, the United States’ first exascale computer, Frontier, is set to arrive at Oak Ridge National Laboratory (ORNL) later this year. Crossing this “1,000x” horizon required overcoming four major challenges: power demand, reliability, extreme parallelism and data movement. Read more…

Quantum Roundup: IBM, Rigetti, Phasecraft, Oxford QC, China, and More

July 13, 2021

IBM yesterday announced a proof for a quantum ML algorithm. A week ago, it unveiled a new topology for its quantum processors. Last Friday, the Technical Univer Read more…

ExaWind Prepares for New Architectures, Bigger Simulations

July 10, 2021

The ExaWind project describes itself in terms of terms like wake formation, turbine-turbine interaction and blade-boundary-layer dynamics, but the pitch to the Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Berkeley Lab Debuts Perlmutter, World’s Fastest AI Supercomputer

May 27, 2021

A ribbon-cutting ceremony held virtually at Berkeley Lab's National Energy Research Scientific Computing Center (NERSC) today marked the official launch of Perlmutter – aka NERSC-9 – the GPU-accelerated supercomputer built by HPE in partnership with Nvidia and AMD. Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer called Dojo to process truly vast amounts of video data. It’s a beast! … A truly useful exaflop at de facto FP32.” Read more…

Google Launches TPU v4 AI Chips

May 20, 2021

Google CEO Sundar Pichai spoke for only one minute and 42 seconds about the company’s latest TPU v4 Tensor Processing Units during his keynote at the Google I Read more…

CentOS Replacement Rocky Linux Is Now in GA and Under Independent Control

June 21, 2021

The Rocky Enterprise Software Foundation (RESF) is announcing the general availability of Rocky Linux, release 8.4, designed as a drop-in replacement for the soon-to-be discontinued CentOS. The GA release is launching six-and-a-half months after Red Hat deprecated its support for the widely popular, free CentOS server operating system. The Rocky Linux development effort... Read more…

CERN Is Betting Big on Exascale

April 1, 2021

The European Organization for Nuclear Research (CERN) involves 23 countries, 15,000 researchers, billions of dollars a year, and the biggest machine in the worl Read more…

Iran Gains HPC Capabilities with Launch of ‘Simorgh’ Supercomputer

May 18, 2021

Iran is said to be developing domestic supercomputing technology to advance the processing of scientific, economic, political and military data, and to strengthen the nation’s position in the age of AI and big data. On Sunday, Iran unveiled the Simorgh supercomputer, which will deliver.... Read more…

Leading Solution Providers

Contributors

HPE Launches Storage Line Loaded with IBM’s Spectrum Scale File System

April 6, 2021

HPE today launched a new family of storage solutions bundled with IBM’s Spectrum Scale Erasure Code Edition parallel file system (description below) and featu Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized in his keynote, Nvidia doesn’t plan to build... Read more…

Microsoft to Provide World’s Most Powerful Weather & Climate Supercomputer for UK’s Met Office

April 22, 2021

More than 14 months ago, the UK government announced plans to invest £1.2 billion ($1.56 billion) into weather and climate supercomputing, including procuremen Read more…

Q&A with Jim Keller, CTO of Tenstorrent, and an HPCwire Person to Watch in 2021

April 22, 2021

As part of our HPCwire Person to Watch series, we are happy to present our interview with Jim Keller, president and chief technology officer of Tenstorrent. One of the top chip architects of our time, Keller has had an impactful career. Read more…

Quantum Roundup: IBM, Rigetti, Phasecraft, Oxford QC, China, and More

July 13, 2021

IBM yesterday announced a proof for a quantum ML algorithm. A week ago, it unveiled a new topology for its quantum processors. Last Friday, the Technical Univer Read more…

Senate Debate on Bill to Remake NSF – the Endless Frontier Act – Begins

May 18, 2021

The U.S. Senate today opened floor debate on the Endless Frontier Act which seeks to remake and expand the National Science Foundation by creating a technology Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire