Rice Oil and Gas Keynote Offers Exascale Lessons and Thoughts on Technology Uptake

By John Russell

March 11, 2021

For the better part of a decade the U.S. Exascale Computing Initiative (ECI) has been churning along vigorously. The first exascale supercomputer – Frontier – is expected this year, with Aurora and El Capitan to follow. How much of the exascale-derived technology will diffuse through the broader HPC landscape and how soon? Andrew Siegel, director of application development for the Exascale Computing Project (ECP), the software arm of ECI, took a stab at that question as well as summarizing overall ECP progress in his keynote at last week’s annual Rice Oil and Gas HPC Conference.

Andrew Siegel, ECP and ANL

“I’ll update you how things have gone generally in the (ECP) project, what’s been harder than expected, what’s gone surprisingly well, what remains to be done, some implications for the future. [But] before we begin, let me start by posing some fundamental questions that might be on your minds during the talk. It’s very important for me to remember that not everybody is operating at the bleeding edge of high performance computing, and that most of what happens is at the sort of mid-range,” said Siegel.

“One question is, to what degree will this initial U.S. exascale technology impact what people see at mid-range high performance computing? And is this inevitable? Or might we not see such an impact? What are viable alternatives for people making procurements in the next several years? I’ve talked to lots of different groups who wonder, is it time to buy, for example, a GPU-based system now? Should I wait? What are the implications of waiting? How do I make this decision? What about alternatives to the technologies that have been chosen by the U.S. for its first push to exascale? For example, the ARM64 system in Fugaku? How long will these architectures be relevant? So what is next after what we see in this first wave of exascale?”

Good questions.

Siegel’s answers, perhaps predictably, were more guarded. It’s early days for much of the technology, and picking broadly useable winners isn’t easy, but Siegel’s fast-moving reprise of the ECP experience and lessons learned are nevertheless valuable. Near the top of the list, for example, was the role of domain experts in adapting applications for the forthcoming exascale systems, all of which are GPU-accelerated.

“In almost all cases the AD (application development) teams are led by domain scientists. The domain scientist obviously understands the modeling problem, how it relates to validation and verification, and the numerics. They don’t understand anything close to all of the complexity of the hardware, and the sort of hardware algorithm interfaces necessary to pull this off. So the teams themselves are hybrid have people with expertise in applied math and computer science and in sort of software engineering on them. [The] most successful have either put all of this together and a very diverse team,” Siegel said.

To give you a sense of the challenge:

“There are lessons that I’ve learned in overseeing these projects for five or six years now. The first is that one has to be able to extract massive parallelism from the algorithm. That goes without saying, but sometimes we lose a sense of how massive massive is. [If] we just think about Summit (pre-exascale system) to literally map everything to all available parallelism will be 73 million degrees of parallelism and that does not account for the need to over-subscribe to a GPU-type architecture so that it can schedule efficiently. So you can imagine how going into future systems, billion-way parallelism is the starting point for being able to get efficient use out of those systems,” said Siegel.

ECP, of course, has been the main vehicle charged with ensuring there is a software ecosystem able to take advantage of the coming exascale systems. This includes three interrelated areas of focus: hardware and integration; software technology; and application development. Much of the early work was done on pre-exascale Summit and Sierra systems which share the same architecture and rely on Nvidia GPUs. That relative simplicity will change as the exascale portfolio will include systems that also use AMD and Intel GPUs.

Siegel’s AD group has been focused on preparing applications for the systems. ECP settled on six application areas (national security, energy security, economic security, earth systems, and health care) and 24 applications with a significant focus on simulation and data-driven (AI) approaches.

“We were looking for a certain number of guinea pigs who were willing to sort of work on the cutting edge and take a risk. And help both understand how to use these systems do science on these systems as well as contribute to the maturation of the systems at the same time. So there was a difficult RFP process you can imagine. But in the end 24 applications were chosen to be part of this push, and we see them as kind of leading the way into the exascale era,” said Siegel.

“Over 10 billion lines of code were represented. One thing that is very critical is that many of these codes supported, at least in our field, what we consider to be large user communities. So it might be up to 10,000 people or so for thinking about computational chemistry, but [that] can easily be in the hundreds [of thousands]. For other applications, molecular dynamics could be a lot, astrophysics could still be 100 research teams, computational fluid dynamics could be more,” he said.

Clearly that’s a daunting task which is now nearing completion. Siegel discussed both specific applications as well as more general software issues.

“All of the 24 applications I mentioned have gone through the following transition. So they’ve gone from the sort of CPU or multi-threaded CPU, to the CPU plus single GPU, to CPU working with multiple GPUs and that brings in new challenges to diverse multi-GPU architectures. That includes early hardware that we have access to from Intel and AMD and the new Nvidia hardware features that are targeting AI workflows. All projects have ported to the Summit and Sierra architecture and they have performance increases, which is quantified by fairly complex figures of merit (FOM) that are unique to each of these applications between a factor of 20 and a factor of 300. Our successes on the Summit platform have been a major story of the project. And that’s a different talk,” said Siegel.

“One thing that we learned that was a surprise to me and that I can’t emphasize enough is that there’s really a hierarchy of porting approaches that touches all aspects of simulation. We think of code porting as the reordering loops or changing data structures or memory coalescing, whatever the case might be. But we also have things that are more fundamental algorithmic restructuring; that could include things like communication avoiding algorithms, reduced synchronization, or use of specialized hardware. And we think of alternate discretizations, like approaching a problem using higher order methods because they are more amenable to the hardware,” said Siegel.

“Now we think of entirely new physical models [because] we have all this new computing power. So an interesting consequence of this big shift in computing hardware is [it has] had a significant impact on all aspects of simulation strategy. It’s been, in most cases, difficult to simply port the same approach, and take full advantage of the accelerator based systems.”

Not surprisingly, porting apps to the new hardware was challenging and sometimes posed critical choices for dealing with the strengths and drawbacks associated with weak scaling and strong scaling.

“There were a lot of clever strategies for mitigating the negative impacts of strong scaling with accelerated based systems. There were a lot of issues with the maturity of the software ecosystem that HPC depends on on the Early Access machines. So things like dense matrix operations, things that need to perform well. When you think about running on one of these machines, you have to think about the maturity of everything around the hardware, not the hardware itself. The performance of OpenMP offload, and strategies for GPU residence, and the role of unified virtual memory and achieving that.

“A really interesting question that’s begun to emerge as we’ve gotten more GPUs on the node and the nodes have become more and more complex is increased costs, relatively, of internode communication. So MPI implementations now, which weren’t really an issue at 10,000 nodes, now have to keep up with the incredible performance on a single node. People are starting to say that’s what our real bottleneck is. That was not the case until this point in the project,” said Siegel.

Despite moving quickly, Siegel dug into many of the challenges encountered. In this sense, his talk is best watched/listed to directly and there are plans to post it. Of note, incidentally, are plans to slightly modify the conference name. Next year will be the 15th year and it will become Rice Energy High Performance Computing Conference.

Circling back to his opening comments, Siegel closed with broad thoughts on how quickly technology developed for the exascale program will filter throughout the broader HPC world. On balance, he said, it’s fine to wait and smart to prepare.

“If I go back to the original questions (slide repeated below) I started with, I do not have answers to these questions. So much depends on your own circumstance. But if I say to what degree is this technology going to impact midrange computing? I’d say a significant impact is highly likely and there’s an impact already. Are the viable alternatives? Absolutely. [There] doesn’t have to be huge rush. x86 or Arm-based systems, with or without the special vector extensions, are certainly viable alternatives.

“I would say learn about and reason about how one’s code would map to these types of systems before you dive in headfirst. I’m speaking to people who are sort of on the sidelines. One of the important questions is, even if you can port more easily, what’s the cost of porting performance relative to a multi-GPU system? I think understanding and evaluating codes is important, even though it’s perfectly reasonable to take a wait-and-see attitude if you’re doing a procurement. The software cost porting can be very low if you have localized operations, high intensity, most of your performance is a very small part of your code. But it could be quite high when you’re doing things that are not computationally-intensive when you have performance spread out all around your code [and] when you have very complex data structures,” he said.

“One has to also remember that the things that I list below are all evolving and they’re still relatively immature and they’ll be much better soon. So we’ll begin to coalesce around programming models; we will see a thinning of the number of options and a hardening of the best ones.”

Link to 2021 Rice Oil and Gas High Performance Conference: https://rice2021oghpc.rice.edu/programs/

Slides are from Siegel’s keynote

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

GTC21: Dell Building Cloud Native Supercomputers at U Cambridge and Durham

April 14, 2021

In conjunction with GTC21, Dell Technologies today announced new supercomputers at universities across DiRAC (Distributed Research utilizing Advanced Computing) in the UK with plans to explore use of Nvidia BlueField DPU Read more…

The Role and Potential of CPUs in Deep Learning

April 14, 2021

Deep learning (DL) applications have unique architectural characteristics and efficiency requirements. Hence, the choice of computing system has a profound impact on how large a piece of the DL pie a user can finally enj Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized i Read more…

Nvidia Aims Clara Healthcare at Drug Discovery, Imaging via DGX

April 12, 2021

Nvidia Corp. continues to expand its Clara healthcare platform with the addition of computational drug discovery and medical imaging tools based on its DGX A100 platform, related InfiniBand networking and its AGX develop Read more…

Nvidia Serves Up Its First Arm Datacenter CPU ‘Grace’ During Kitchen Keynote

April 12, 2021

Today at Nvidia’s annual spring GPU technology conference, held virtually once more due to the ongoing pandemic, the company announced its first ever Arm-based CPU, called Grace in honor of the famous American programmer Grace Hopper. Read more…

AWS Solution Channel

Volkswagen Passenger Cars Uses NICE DCV for High-Performance 3D Remote Visualization

 

Volkswagen Passenger Cars has been one of the world’s largest car manufacturers for over 70 years. The company delivers more than 6 million automobiles to global customers every year, from 50 production locations on five continents. Read more…

Nvidia Debuts BlueField-3 – Its Next DPU with Big Plans for an Expanded Role

April 12, 2021

Nvidia today announced its next generation data processing unit (DPU) – BlueField-3 – adding more substance to its evolving concept of the DPU as a full-fledged partner to CPUs and GPUs in delivering advanced computi Read more…

GTC21: Dell Building Cloud Native Supercomputers at U Cambridge and Durham

April 14, 2021

In conjunction with GTC21, Dell Technologies today announced new supercomputers at universities across DiRAC (Distributed Research utilizing Advanced Computing) Read more…

The Role and Potential of CPUs in Deep Learning

April 14, 2021

Deep learning (DL) applications have unique architectural characteristics and efficiency requirements. Hence, the choice of computing system has a profound impa Read more…

Nvidia Serves Up Its First Arm Datacenter CPU ‘Grace’ During Kitchen Keynote

April 12, 2021

Today at Nvidia’s annual spring GPU technology conference, held virtually once more due to the ongoing pandemic, the company announced its first ever Arm-based CPU, called Grace in honor of the famous American programmer Grace Hopper. Read more…

Nvidia Debuts BlueField-3 – Its Next DPU with Big Plans for an Expanded Role

April 12, 2021

Nvidia today announced its next generation data processing unit (DPU) – BlueField-3 – adding more substance to its evolving concept of the DPU as a full-fle Read more…

Nvidia’s Newly DPU-Enabled SuperPod Is a Multi-Tenant, Cloud-Native Supercomputer

April 12, 2021

At GTC 2021, Nvidia has announced an upgraded iteration of its DGX SuperPods, calling the new offering “the first cloud-native, multi-tenant supercomputer.” Read more…

Tune in to Watch Nvidia’s GTC21 Keynote with Jensen Huang – Recording Now Available

April 12, 2021

Join HPCwire right here on Monday, April 12, at 8:30 am PT to see the Nvidia GTC21 keynote from Nvidia’s CEO, Jensen Huang, livestreamed in its entirety. Hosted by HPCwire, you can click to join the Huang keynote on our livestream to hear Nvidia’s expected news and... Read more…

The US Places Seven Additional Chinese Supercomputing Entities on Blacklist

April 8, 2021

As tensions between the U.S. and China continue to simmer, the U.S. government today added seven Chinese supercomputing entities to an economic blacklist. The U Read more…

Habana’s AI Silicon Comes to San Diego Supercomputer Center

April 8, 2021

Habana Labs, an Intel-owned AI company, has partnered with server maker Supermicro to provide high-performance, high-efficiency AI computing in the form of new Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

CERN Is Betting Big on Exascale

April 1, 2021

The European Organization for Nuclear Research (CERN) involves 23 countries, 15,000 researchers, billions of dollars a year, and the biggest machine in the worl Read more…

Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

HPE Launches Storage Line Loaded with IBM’s Spectrum Scale File System

April 6, 2021

HPE today launched a new family of storage solutions bundled with IBM’s Spectrum Scale Erasure Code Edition parallel file system (description below) and featu Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

Quantum Computer Start-up IonQ Plans IPO via SPAC

March 8, 2021

IonQ, a Maryland-based quantum computing start-up working with ion trap technology, plans to go public via a Special Purpose Acquisition Company (SPAC) merger a Read more…

Leading Solution Providers

Contributors

Can Deep Learning Replace Numerical Weather Prediction?

March 3, 2021

Numerical weather prediction (NWP) is a mainstay of supercomputing. Some of the first applications of the first supercomputers dealt with climate modeling, and Read more…

Livermore’s El Capitan Supercomputer to Debut HPE ‘Rabbit’ Near Node Local Storage

February 18, 2021

A near node local storage innovation called Rabbit factored heavily into Lawrence Livermore National Laboratory’s decision to select Cray’s proposal for its CORAL-2 machine, the lab’s first exascale-class supercomputer, El Capitan. Details of this new storage technology were revealed... Read more…

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

African Supercomputing Center Inaugurates ‘Toubkal,’ Most Powerful Supercomputer on the Continent

February 25, 2021

Historically, Africa hasn’t exactly been synonymous with supercomputing. There are only a handful of supercomputers on the continent, with few ranking on the Read more…

The History of Supercomputing vs. COVID-19

March 9, 2021

The COVID-19 pandemic poses a greater challenge to the high-performance computing community than any before. HPCwire's coverage of the supercomputing response t Read more…

AMD Launches Epyc ‘Milan’ with 19 SKUs for HPC, Enterprise and Hyperscale

March 15, 2021

At a virtual launch event held today (Monday), AMD revealed its third-generation Epyc “Milan” CPU lineup: a set of 19 SKUs -- including the flagship 64-core, 280-watt 7763 part --  aimed at HPC, enterprise and cloud workloads. Notably, the third-gen Epyc Milan chips achieve 19 percent... Read more…

HPE Names Justin Hotard New HPC Chief as Pete Ungaro Departs

March 2, 2021

HPE CEO Antonio Neri announced today (March 2, 2021) the appointment of Justin Hotard as general manager of HPC, mission critical solutions and labs, effective Read more…

Microsoft, HPE Bringing AI, Edge, Cloud to Earth Orbit in Preparation for Mars Missions

February 12, 2021

The International Space Station will soon get a delivery of powerful AI, edge and cloud computing tools from HPE and Microsoft Azure to expand technology experi Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire