What’s After Exascale? The Internet of Workflows Says HPE’s Nicolas Dubé

By John Russell

July 29, 2021

With the race to exascale computing in its final leg, it’s natural to wonder what the Post Exascale Era will look like. Nicolas Dubé, VP and chief technologist for HPE’s HPC business unit, agrees and shared his vision at Supercomputing Frontiers Europe 2021 held last week. The next big thing, he told the virtual audience at SFE21, is something that will connect HPC and (broadly) all of IT – into what Dubé calls The Internet of Workflows.

“It’s not just the exascale capability, which in itself will allow us to address some problems we’ve never been able to tackle before, but it’s the exascale technologies that are going to trickle down to the broad IT community and it’s computing all that enormous amount of data [being generated]. We’re no longer just doing classical simulation HPC, but also doing machine learning and the data analytics. What’s even more interesting is that these three spheres are getting more and more coupled and even pipelined. You might have data go through multiple stages, or you might use machine learning to [steer] a classical simulation,” said Dubé.

Nicolas Dubé, HPE

He ticked through the underlying enablers – accelerator chips and their burgeoning diversity, packaging innovation and the rise of multichip-everything, tightly-integrated memory-processor communications based on speedy fabrics and co-location. Stir in the infusion of AI throughout HPC (and computing writ large) and the resulting diversity of workflows, and what will emerge, contends Dubé, is a computing landscape dominated by what he calls the Internet of Workflows, spanning edge-to-supercomputer environments.

Welcome to Dubé’s vision of the Post Exascale Era.

OK, and at the risk of introducing a new acronym, what is this IoW?

“The Internet of Workflows is the idea that data gets produced all over and then flows and gets processed at different points and then gets analyzed and visualized all across the internet. [It’s] more than the Internet of Things, which is all about addressing [devices]. The Internet of workflows will actually deliver some value,” said Dubé.

“First of all, it’s going to be from edge to exascale. Why? Because the workflows are going to be executed from sensor data that starts from, you know all of the sensors from cell phones and cars and all of that. Then they can go to tiny inference engines or planetary size problems that are going to run on exascale supercomputers. This requires a lot more capability to extract the data and you might not want to send all of that data over, or overseas, to a storage system. The ATLAS [particle physics] project had already started looking at those things more than a decade ago.”

The IoW, said Dubé, is about “applying those principles to a much broader set of scientific fields because we’re convinced that is where this is going.”

It’s an intriguing idea, with echoes of grid computing and IoT all smashed together. Presented here are six takeaways from Dubé’s talk briefly touching on recent relevant advances as well as a list of requirements for developing the IoW. Many of the challenges are familiar.

1. First the Basics. The effort to achieve exascale and the needs of heterogeneous computing generally were catalysts in producing technologies needed for IoW. Dubé also noted the “countless silicon startups doing accelerators” to tackle diverse workloads. Still, lots more work is needed. Here’s snippet on MCM’s expected impact on memory.

“Multi-chip modules (MCMs) are also becoming a de facto standard. If you look, AMD was early on, embracing the MCM paths. And now the next generation motherboards, you might think of them as MCMs or those high-speed substrates that will have both the compute silicon, the memory, and before too long, some network interconnects positioned directly on the substrate. Because of that, you’re going to put very high-speed data on the MCM. Most of the data capacity will no longer just be loaded in RAM, but may get loaded into some farther ahead, fabric-attached memory that may be non-volatile, but that can be accessed at the right time and at the right throughput,” he said.

“If you’re interested, we’ve done some demonstrations, for instance, using XSBench [a neutron transport mini-app] that shows that if you’re placing the right data structures on fabric-attached, much higher-latency memory structures, you’re actually getting roughly the same performance percentage on that benchmark even if the data has a much higher latency, as long as you understand your data substructure and you place it well on your memory tiers.”

2. White Hats & Data Sovereignty. A key issue, currently not fully addressed, is data sovereignty. Dubé agrees it’s a critical challenge now and will be even more so in an IoW world. He didn’t offer specific technology or practice guidelines.

“Another key precondition in the post exascale world is data sovereignty. The hyperscalers will drive you a truck, right? A FedEx truck loaded with hard drives, for you to put the data on, and for them to load that into their premises, but they’re never gonna get send that FedEx truck back loaded with your data once it gets processed. At HPE, we see ourselves as being one of the white hats in the industry; we want to enable the community to have access to the data with the right permissions and the right identification mechanism, the right encryption end-to-end and all of that,” said Dubé

“We’re not into a play where the data gets locked in into a kingdom and then can never get it out. Data sovereignty is something I think we’re going to hear more and more talk about in the next decade. Data is the new currency, it’s by far the most important asset of all of your organizations. We need to make sure your data is not only secure, but it’s used for its intended purpose, and because the data feeds the compute, we’ll have to make sure that the right compute gets positioned close enough to the data so it can get processed in the right environment,” he said.

3. New Runtimes for a Grand Vision. It’s one thing to dream of IoW; it’s another to build it. Effective parallel programming for diverse devices and the availability of reasonably performant runtime systems able to accommodate device diversity are all needed.

“We’ll have to find the right workflow execution engine that is really close to that data source. Edge-to-exascale is kind of a great vision, but it needs to get enabled through new run runtime and deployment models. Today, we’re deploying systems in a very static way, and we’re executing, always within the confines of the datacenter. We need to enable a much more fluid execution environment that can take data from the edge and to the output of exascale supercomputers but in a way that they can flow between sites, between organizations, again, always with the right authentication and security mechanisms, but in a not so confined way,” he said.

“So that leads us to the democratization of parallel runtime environments. Fortran and MPI and OpenMP are very powerful tools, but the proportion of graduates that can use them is on a steady decline. We need to enable new languages like Python, for instance. Think of Project Dragon that came from Cray that we inherited here at HPE; it’s about writing a real, very capable parallel Python execution engine. Chapel and Arkouda are two other examples. But ultimately, we need development and runtime environments that can enable a growing ensemble of users to compute larger and larger problem sizes.”

4. Chasing Performance Portability…Still. Tight vertical software integration as promoted by some (pick your favorite target vendor) isn’t a good idea, argued Dubé. This isn’t a new controversy and maybe it’s a hard-stop roadblock for IoW. We’ll see. Dubé argues for openness and says HPE (Cray) is trying to make the Cray Programming Environment a good choice.

“We need performance portability (in order) to enable alternative compute. So again, the vertical integration of a software platform all the way to silicon that some are gunning for might sound appealing at first sight, but it really locks anyone that embraces such a model, and it prevents you from adopting alternative options down the line. We see performance portability as a foundational pillar to the Internet of Workflows. It allows for a single codebase to be targeted and optimized for multiple silicon underpinnings. To do that we are evolving the Cray Programming Environment as a key asset to have a much broader reach, and positioning it as that foundational asset to this broad vision,” said Dubé.

“In a way, we’d like CPE (Cray Programming Environment) to become kind of the TensorFlow of parallel models for parallel workloads. When you’re an undergrad, if you want to program machine learning, there are plenty of TensorFlow undergrad courses. We’re working to enable CPE to be used by a broad set of people and on the undergrad courses and all of that. So people have a way to go to develop their code for parallel environments that scales and that today might run on x86, on GPUs, and on Arm. That’s the whole idea of performance portability. To make it more easily consumable, we’ve even packaged it in the Docker container so that anyone can run it on the laptop. This is now going into proof of concept.”

5. “A Combinatorial Explosion of Configurations”. Now there’s an interesting turn of phrase. The avalanche of new chips from established players and newcomers is a blessing and curse. Creating systems to accommodate the new wealth of choices is likewise exciting but daunting and expensive. Dubé argues we need to find ways to cut the costs of silicon innovation and subsequent systems to help bring the IoW into being.

“We need to do a better plan enabling silicon innovation. Right now, to build a new chip, it’s on the order of over $100 million, and that’s excluding software and all the enablement that comes after that. When you include software, it’s over $200 million in 5nm. So that makes it very difficult to enable the silicon innovation. On top of that, building a new platform for every new chip is very cumbersome for every system integrator. We need to come to a place where not only we’re going to have a route to fab for people that want to build new silicon [through] initiatives like IMEC in Europe or MOSIS in the U.S., but also have ways for vendors to adopt standard form factors for platforms so that when that new silicon gets built it can have a motherboard to enable it, instantiate it,” said Dubé.

“We as vendors — and not just HPE but all of the other systems vendor — can take it and really lower our adoption cost because right now, building a motherboard on top of the silicon costs is making it really expensive to do everything custom every time there’s, there’s a new silicon coming in. That works when you have one or two CPU vendors, and maybe one or two GPU vendors, but now we have [many] — there’s Intel, AMD, multiple Arm versions for the CPU side, and then Nvidia, Intel, AMD on the GPU side, and then add all of the machine learning accelerators. It’s a combinatorial explosion of configurations that as a vendor makes it very challenging to support that breadth of opportunities. So we as an industry need to find out how we’re going to enable doing that going forward.”

6. Worldwide Data Hub? If one is going to set goals, they may as well be big ones. Creating an infrastructure with reasonable governance and practices to support an IoW is a big goal. Data is at the core of nearly everything, Dubé argued.

“The next key thing we see on the Internet of Workflows is kind of a worldwide data web. Go back to how Google revolutionized the content web, by indexing the whole thing that was on there. People got access to all of that content without being locked in like, into AOL, for instance. If we could do that for metadata, again, with the right access and permissions, that would be awesome, because then people will be able to free the data so people can compute that [data] and throw that into their workflow, wherever they are. That will lead into hybrid execution pipelines. Think about SmartSim, for instance, which is a code we’ve built along with NCAR (National Center for Atmospheric Research). We’ve been able to accelerate planetary scale ocean models using augmented classical simulation HPC with a machine learning approach, and got a 10x speed up to insight,” said Dubé.

“All of that, ultimately, is about having something that is open. As I said, HPE is about being the white hat system vendor / system integrator in the industry. We’re about being open, providing choice, being a trusted advisor. We’ve been a strong contributor to the open source community, SmartSim that I was talking about is an example. I know this is a very high level talk, but we see the Internet of Workflows as the future of HPC, and really is a true rebirth of the internet, where workloads and data will drive new insight. And that’s where we’re much higher in the value chain, all of us as an HPC community, because we deliver that outcome to the scientists, and to the world ultimately.”

Wrap-up
As noted in the Q&A, there are many technical and governance/practice issues facing construction an IoW. Whether, as Dubé contends, what was once loosely thought of as the Internet of Things (IOT), a device-centric concept, instead becomes the Internet of Workflows will be fascinating to watch.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

The Case for an Edge-Driven Future for Supercomputing

September 24, 2021

“Exascale only becomes valuable when it’s creating and using data that we care about,” said Pete Beckman, co-director of the Northwestern-Argonne Institute of Science and Engineering (NAISE), at the most recent HPC Read more…

Three Universities Team for NSF-Funded ‘ACES’ Reconfigurable Supercomputer Prototype

September 23, 2021

As Moore’s law slows, HPC developers are increasingly looking for speed gains in specialized code and specialized hardware – but this specialization, in turn, can make testing and deploying code trickier than ever. Now, researchers from Texas A&M University, the University of Illinois at Urbana... Read more…

Qubit Stream: Monte Carlo Advance, Infosys Joins the Fray, D-Wave Meeting Plans, and More

September 23, 2021

It seems the stream of quantum computing reports never ceases. This week – IonQ and Goldman Sachs tackle Monte Carlo on quantum hardware, Cambridge Quantum pushes chemistry calculations forward, D-Wave prepares for its Read more…

Asetek Announces It Is Exiting HPC to Protect Future Profitability

September 22, 2021

Liquid cooling specialist Asetek, well-known in HPC circles for its direct-to-chip cooling technology that is inside some of the fastest supercomputers in the world, announced today that it is exiting the HPC space amid multiple supply chain issues related to the pandemic. Although pandemic supply chain... Read more…

TACC Supercomputer Delves Into Protein Interactions

September 22, 2021

Adenosine triphosphate (ATP) is a compound used to funnel energy from mitochondria to other parts of the cell, enabling energy-driven functions like muscle contractions. For ATP to flow, though, the interaction between the hexokinase-II (HKII) enzyme and the proteins found in a specific channel on the mitochondria’s outer membrane. Now, simulations conducted on supercomputers at the Texas Advanced Computing Center (TACC) have simulated... Read more…

AWS Solution Channel

Introducing AWS ParallelCluster 3

Running HPC workloads, like computational fluid dynamics (CFD), molecular dynamics, or weather forecasting typically involves a lot of moving parts. You need a hundreds or thousands of compute cores, a job scheduler for keeping them fed, a shared file system that’s tuned for throughput or IOPS (or both), loads of libraries, a fast network, and a head node to make sense of all this. Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-apples) datacenter and edge categories. Perhaps more interesti Read more…

The Case for an Edge-Driven Future for Supercomputing

September 24, 2021

“Exascale only becomes valuable when it’s creating and using data that we care about,” said Pete Beckman, co-director of the Northwestern-Argonne Institut Read more…

Three Universities Team for NSF-Funded ‘ACES’ Reconfigurable Supercomputer Prototype

September 23, 2021

As Moore’s law slows, HPC developers are increasingly looking for speed gains in specialized code and specialized hardware – but this specialization, in turn, can make testing and deploying code trickier than ever. Now, researchers from Texas A&M University, the University of Illinois at Urbana... Read more…

Qubit Stream: Monte Carlo Advance, Infosys Joins the Fray, D-Wave Meeting Plans, and More

September 23, 2021

It seems the stream of quantum computing reports never ceases. This week – IonQ and Goldman Sachs tackle Monte Carlo on quantum hardware, Cambridge Quantum pu Read more…

Asetek Announces It Is Exiting HPC to Protect Future Profitability

September 22, 2021

Liquid cooling specialist Asetek, well-known in HPC circles for its direct-to-chip cooling technology that is inside some of the fastest supercomputers in the world, announced today that it is exiting the HPC space amid multiple supply chain issues related to the pandemic. Although pandemic supply chain... Read more…

TACC Supercomputer Delves Into Protein Interactions

September 22, 2021

Adenosine triphosphate (ATP) is a compound used to funnel energy from mitochondria to other parts of the cell, enabling energy-driven functions like muscle contractions. For ATP to flow, though, the interaction between the hexokinase-II (HKII) enzyme and the proteins found in a specific channel on the mitochondria’s outer membrane. Now, simulations conducted on supercomputers at the Texas Advanced Computing Center (TACC) have simulated... Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-ap Read more…

Why HPC Storage Matters More Now Than Ever: Analyst Q&A

September 17, 2021

With soaring data volumes and insatiable computing driving nearly every facet of economic, social and scientific progress, data storage is seizing the spotlight. Hyperion Research analyst and noted storage expert Mark Nossokoff looks at key storage trends in the context of the evolving HPC (and AI) landscape... Read more…

GigaIO Gets $14.7M in Series B Funding to Expand Its Composable Fabric Technology to Customers

September 16, 2021

Just before the COVID-19 pandemic began in March 2020, GigaIO introduced its Universal Composable Fabric technology, which allows enterprises to bring together Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer called Dojo to process truly vast amounts of video data. It’s a beast! … A truly useful exaflop at de facto FP32.” Read more…

Enter Dojo: Tesla Reveals Design for Modular Supercomputer & D1 Chip

August 20, 2021

Two months ago, Tesla revealed a massive GPU cluster that it said was “roughly the number five supercomputer in the world,” and which was just a precursor to Tesla’s real supercomputing moonshot: the long-rumored, little-detailed Dojo system. “We’ve been scaling our neural network training compute dramatically over the last few years,” said Milan Kovac, Tesla’s director of autopilot engineering. Read more…

Esperanto, Silicon in Hand, Champions the Efficiency of Its 1,092-Core RISC-V Chip

August 27, 2021

Esperanto Technologies made waves last December when it announced ET-SoC-1, a new RISC-V-based chip aimed at machine learning that packed nearly 1,100 cores onto a package small enough to fit six times over on a single PCIe card. Now, Esperanto is back, silicon in-hand and taking aim... Read more…

CentOS Replacement Rocky Linux Is Now in GA and Under Independent Control

June 21, 2021

The Rocky Enterprise Software Foundation (RESF) is announcing the general availability of Rocky Linux, release 8.4, designed as a drop-in replacement for the soon-to-be discontinued CentOS. The GA release is launching six-and-a-half months after Red Hat deprecated its support for the widely popular, free CentOS server operating system. The Rocky Linux development effort... Read more…

Intel Completes LLVM Adoption; Will End Updates to Classic C/C++ Compilers in Future

August 10, 2021

Intel reported in a blog this week that its adoption of the open source LLVM architecture for Intel’s C/C++ compiler is complete. The transition is part of In Read more…

Hot Chips: Here Come the DPUs and IPUs from Arm, Nvidia and Intel

August 25, 2021

The emergence of data processing units (DPU) and infrastructure processing units (IPU) as potentially important pieces in cloud and datacenter architectures was Read more…

AMD-Xilinx Deal Gains UK, EU Approvals — China’s Decision Still Pending

July 1, 2021

AMD’s planned acquisition of FPGA maker Xilinx is now in the hands of Chinese regulators after needed antitrust approvals for the $35 billion deal were receiv Read more…

Google Launches TPU v4 AI Chips

May 20, 2021

Google CEO Sundar Pichai spoke for only one minute and 42 seconds about the company’s latest TPU v4 Tensor Processing Units during his keynote at the Google I Read more…

Leading Solution Providers

Contributors

HPE Wins $2B GreenLake HPC-as-a-Service Deal with NSA

September 1, 2021

In the heated, oft-contentious, government IT space, HPE has won a massive $2 billion contract to provide HPC and AI services to the United States’ National Security Agency (NSA). Following on the heels of the now-canceled $10 billion JEDI contract (reissued as JWCC) and a $10 billion... Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

Quantum Roundup: IBM, Rigetti, Phasecraft, Oxford QC, China, and More

July 13, 2021

IBM yesterday announced a proof for a quantum ML algorithm. A week ago, it unveiled a new topology for its quantum processors. Last Friday, the Technical Univer Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Frontier to Meet 20MW Exascale Power Target Set by DARPA in 2008

July 14, 2021

After more than a decade of planning, the United States’ first exascale computer, Frontier, is set to arrive at Oak Ridge National Laboratory (ORNL) later this year. Crossing this “1,000x” horizon required overcoming four major challenges: power demand, reliability, extreme parallelism and data movement. Read more…

Intel Unveils New Node Names; Sapphire Rapids Is Now an ‘Intel 7’ CPU

July 27, 2021

What's a preeminent chip company to do when its process node technology lags the competition by (roughly) one generation, but outmoded naming conventions make it seem like it's two nodes behind? For Intel, the response was to change how it refers to its nodes with the aim of better reflecting its positioning within the leadership semiconductor manufacturing space. Intel revealed its new node nomenclature, and... Read more…

Top500: Fugaku Still on Top; Perlmutter Debuts at #5

June 28, 2021

The 57th Top500, revealed today from the ISC 2021 digital event, showcases many of the same systems as the previous edition, with Fugaku holding its significant lead and only one new entrant in the top 10 cohort: the Perlmutter system at the DOE Lawrence Berkeley National Laboratory enters the list at number five with 65.69 Linpack petaflops. Perlmutter is the largest... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire