ISC 2021 Keynote: Thomas Sterling on Urgent Computing, Big Machines, China Speculation

By John Russell

July 1, 2021

In a somewhat shortened version of his annual ISC keynote surveying the HPC landscape Thomas Sterling, lauded the community’s effort in bringing HPC to bear in the fight against the pandemic, welcomed the start of the exascale – if not yet exaflops – era with quick tour of some big machines, speculated a little on what China may be planning, and paid tribute to new and ongoing efforts to bring fresh talent into HPC.

Sterling is a longtime HPC leader, professor at Indiana University, and one of the co-developers of Beowulf  cluster. Let’s jump in (with apologies for any garbling of quotes).

The pandemic affected everything.

Thomas Sterling

“It has been a tragedy. There have been more than 200 million COVID cases worldwide, and almost 4 million deaths. And frankly, those numbers are probably conservative, and the actual numbers are much greater. We may never know. In the U.S., shockingly, more than half a million people, 600,000 people, have been killed by this virulent disease. And we’ve experienced over 34 million cases just in the U.S. alone, and our case rate per 1 million of the population is greater than 10 percent,” he said.

“One of the things that came out of this is an appreciation for what has been called urgent computing, the ability for high performance computing in general and the resources, both in terms of facility and talent, to be rapidly brought to bear to a problem, even a problem as challenging as that of COVID-19. Over the year across the international community, very quickly, HPC resources were freed up and made available to scientists. In addition, expert assistance and code development optimization were added to the scientific community to minimize the time of deployment of their code and their application to drug discovery to exploration and to analysis of new possible candidates of cures. In this sense, the high-performance computing community can be proud at the job [done] yet humbled by its own limitations in attacking this problem.”

Fugaku is an impressive machine

“Much of this slide I have used before. The core design is Arm done by Fujitsu and added to that is the use of a significant vector extensions that have demonstrated, in their view, that a homogeneous machine can compete with accelerator machines, and that future designs will be more varied than a singular formula. Is the jury done and the verdict in [on this]? No. As rapid changes are taking place we’ll still see this constructive tension among those. But what we are finding is that the broader range of applications not just in high performance computing, per se, but in AI, in machine learning and in big data and analytics, all of these can be done on machines that are intended for extreme scale,” said Sterling.

“Now, I said extreme scale. Fugaku is not an exaflops Rmax machine, but it comes close. It’s in somewhere around 700*. I apologize to our friends, Satoshi Matsuoka who is standing there in front of his machine. But in the area of lower precision, for intelligence computing, it is indeed an exascale machine. So we are now in an era of exascale if not yet classic exaflops.”

The age of big machines

This era of exascale and exaflops is rapidly dawning around the globe and Sterling briefly reviewed several systems now or soon-to-be rolling out. Importantly, he emphasized, the line between AI and HPC is happening fast and that fusion is greatly influencing HPC computer architecture.

About Frontier, which is expected to be the first U.S. exascale system stood up, he said:

“The Frontier machine has been announced as going to be the U.S.’s first exaflops and by exaflops, I mean an Rmax supercomputing somewhere around – we don’t have the measurements, of course – but the estimates are about one and a half exaflops Rmax. This will be operated in the Oak Ridge National Laboratory or the Oak Ridge Leadership Computing Facility in Tennessee, where the current Summit machine is, and this will be deployed towards the end of this year or the very beginning of the next year. It is being integrated by a Cray division of Hewlett Packard Enterprise and incorporates AMD chips, providing substantial performance and energy efficiency, although it’s predicted that the power consumption will be on the order of 30 megawatts but in a footprint [that’s] somewhat modest of just over 100 racks. The cost is $600 million. That’s a lot of money. [I’m] looking forward to this machine being operated and the science and the data analytics that can be performed with it.”

Sterling gave a brief tour of several of the forthcoming large systems, most of whose names are familiar to the HPC community. Despite being largely accelerated-based architectures, there is diversity among the approaches. He singled out the UK Met Office-Microsoft project to build the Met Office’s next system for weather forecasting in the cloud. That’s a first. He looked at the Euro Joint Undertaking’s Lumi project which will be a roughly half exaflops system.

“[The system] will be in Finland but there are 10 different countries that are involved in the consortium that together will share this machine. You have the list (on the slide below) of such countries starting with Finland and going down to Switzerland. There are multiple partitions for different purposes. So, I think that this is a slightly different way of organizing machines, where distinct countries will be managing different partitions and have different responsibilities,” said Sterling.

About the UK Met-Microsoft project, he noted, “They’re saying that this will be the world’s largest [web-based] climate modeling supercomputer, and this will be deployed a year from now that in the summer of 2022. Its floating-point performance will be 60 petaflops distributed among an organization of four quadrants, each 15 petaflops. There’ll be one and a half million CPUs of the AMD Epyc type, and eventually, I don’t know the year, there will be a midlife kicker, giving it a performance increase by a factor of three. So this will have a long life, indeed a life of about 10 years. What I find extraordinary is that this is a commitment of about one and a half billion dollars over a 10-year period. This is very serious, very significant dedication to a single domain of application.”

Here are a few of his slides on the coming systems.



China is the Dragon in the room

“Okay, so I talked about big machines. And there’s obviously one really big hole, and, you know, maybe what we should say is that’s the big dragon in the room. It’s China, of course, China has deployed over the last decade more than one Top500 machine. And over their evolution of machines they’ve taken a strong, organized and frankly, I’d call it a disciplined approach. In fact, it’s been a three-pronged strategy that they have moved forward. These include the National University of Defense Technology, the National Research Center of Parallel Computer (NRCPC) Engineering and Technology, and third, Sugon, which for those old gray beards, such as myself, we remember as Dawning,” said Sterling.

“All three of these different organizations are pursuing and following different approaches and I don’t know who’s in the lead or when their next big machine will hit the floor, but recently there have been some hints that have been exposed for one of them. And this is the NRCPC Sunway custom architecture. Now, you’ll remember the Sunway TaihuLight. Well, I didn’t know this, but in fact, their plan all along with TaihuLight was designed to be scalable, truly scalable. It was delivering something over 100 petaflops when it was deployed and led the list of HPC systems there and their intent is to bring that up to exascale. Now I use the term exascale as opposed to exaflops for the same reasons I did before. Their peak performance will be floating point. Four exaflops for single precision, and one exaflops for double precision. That’s peak performance. It’s anticipated that their Linpack Rmax will be around 700 petaflops.

“You know, the Sunway architecture is really interesting, because of its use of an enormous number of very lightweight processing elements organized in conjunction with a main processing elements to handle a sequential execution. The expectation is that, as opposed to 28 nanometers, for TaihuLight, this will be 14 nanometers as SMIC, the semiconductor manufacturer fabrication company will provide this at about just under one and a half gigahertz, which is about the same clock rate as TaihuLight. Why? Well, of course, to try to keep the power down. In doing this, they will have eight core groups**, as opposed to the four core groups you see in the lower black and white schematic (slide below), they will double the size of the words or multi-word lines from 256 bits to 512 bits. And they will increase the total size of the machine from somewhere around 40,000 nodes to 80,000 nodes. I don’t know when. But we can certainly wish our friends in China the best of luck as they push the edge of the envelope,” he said.

QUICK HITS – MPI Still Strong; In Praise of STEM

“Within the next small number of months, exactly when I don’t know, MPI 4.0 will be released with a number of improvements that have been carefully considered, examined and debated, including such things but not limited to persistent collaborative, persistent collective operations. For significant improvements in efficiency, and improvements in error handling a number of other as you can see these as well are either going to be in or are going to be considered for later extensions to 4.1. And if you thought that was it, now, there will be an MPI 5.0. The committee is open for new ideas. I don’t know how long this is going to go. But MPI 4.0 coming to an internet place near you,” said Sterling.

Sterling gave nods to various efforts to support HPC students and STEM efforts generally. He noted the establishment of the new Moscow State University branch at the Russian National Physics and Mathematics Center, near Nizhny Novgorod. “I’ve been there, a lovely small city. The MSU Sarov branch is intended to frankly attract the best scientists and students and faculty. No, I haven’t gotten my invitation letter yet and it (MSU) will be directed by our good friend and respected colleague, Vladimir Voevodin shown here,” he said.

Sterling had praise for the Texas Advanced Computing Center which helped South Africa by training its student cluster team by bringing them over to Austin, and “really giving them sort of a turbocharged experience in this area. Dan Stanzione (TACC director) shown here (slide below) also managed to make possible the repurposing of one of their earlier machines and giving it a second life at CHPC in South Africa.”

He concluded with kudos for the STEM-Trek organization led by Elizabeth Leake:

“The final person here is one who frankly, we really need to acknowledge and that is Elizabeth Leake. Now many of you know Elizabeth, she is part of our community and always with a friendly smile. But she is much more than that. She is the founder of STEM-Trek track, a nonprofit organization that is intended to – and let me read this – support scholarly travel, mentoring and advanced skills training in STEM scholars and students from underrepresented demographics in the use of 21st century cyberinfrastructure. I can’t read to you the long list of accomplishments, but through STEM-Trek, students are encouraged and engaged in high performance computing. She has singularly managed to acquire travel grants for students who otherwise, frankly, would never get to see conferences like ISC. You see a picture of her with students I met a couple of years ago. Elizabeth deserves very high praise for all of her contributions.”

NOTES

*  Fugaku’s Top500 Rmax is 442 petaflops and Rpeak is 537 petaflops.

** One observer noted in the ISC chat window during the keynote that Sunway would have six not eight core groups.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

TACC Supercomputer Delves Into Protein Interactions

September 22, 2021

Adenosine triphosphate (ATP) is a compound used to funnel energy from mitochondria to other parts of the cell, enabling energy-driven functions like muscle contractions. For ATP to flow, though, the interaction between t Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-apples) datacenter and edge categories. Perhaps more interesti Read more…

Why HPC Storage Matters More Now Than Ever: Analyst Q&A

September 17, 2021

With soaring data volumes and insatiable computing driving nearly every facet of economic, social and scientific progress, data storage is seizing the spotlight. Hyperion Research analyst and noted storage expert Mark No Read more…

GigaIO Gets $14.7M in Series B Funding to Expand Its Composable Fabric Technology to Customers

September 16, 2021

Just before the COVID-19 pandemic began in March 2020, GigaIO introduced its Universal Composable Fabric technology, which allows enterprises to bring together any HPC and AI resources and integrate them with networking, Read more…

What’s New in HPC Research: Solar Power, ExaWorks, Optane & More

September 16, 2021

In this regular feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

AWS Solution Channel

Supporting Climate Model Simulations to Accelerate Climate Science

The Amazon Sustainability Data Initiative (ASDI), AWS is donating cloud resources, technical support, and access to scalable infrastructure and fast networking providing high performance computing (HPC) solutions to support simulations of near-term climate using the National Center for Atmospheric Research (NCAR) Community Earth System Model Version 2 (CESM2) and its Whole Atmosphere Community Climate Model (WACCM). Read more…

Cerebras Brings Its Wafer-Scale Engine AI System to the Cloud

September 16, 2021

Five months ago, when Cerebras Systems debuted its second-generation wafer-scale silicon system (CS-2), co-founder and CEO Andrew Feldman hinted of the company’s coming cloud plans, and now those plans have come to fruition. Today, Cerebras and Cirrascale Cloud Services are launching... Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-ap Read more…

GigaIO Gets $14.7M in Series B Funding to Expand Its Composable Fabric Technology to Customers

September 16, 2021

Just before the COVID-19 pandemic began in March 2020, GigaIO introduced its Universal Composable Fabric technology, which allows enterprises to bring together Read more…

Cerebras Brings Its Wafer-Scale Engine AI System to the Cloud

September 16, 2021

Five months ago, when Cerebras Systems debuted its second-generation wafer-scale silicon system (CS-2), co-founder and CEO Andrew Feldman hinted of the company’s coming cloud plans, and now those plans have come to fruition. Today, Cerebras and Cirrascale Cloud Services are launching... Read more…

AI Hardware Summit: Panel on Memory Looks Forward

September 15, 2021

What will system memory look like in five years? Good question. While Monday's panel, Designing AI Super-Chips at the Speed of Memory, at the AI Hardware Summit, tackled several topics, the panelists also took a brief glimpse into the future. Unlike compute, storage and networking, which... Read more…

ECMWF Opens Bologna Datacenter in Preparation for Atos Supercomputer

September 14, 2021

In January 2020, the European Centre for Medium-Range Weather Forecasts (ECMWF) – a juggernaut in the weather forecasting scene – signed a four-year, $89-million contract with European tech firm Atos to quintuple its supercomputing capacity. With the deal approaching the two-year mark, ECMWF... Read more…

Quantum Computer Market Headed to $830M in 2024

September 13, 2021

What is one to make of the quantum computing market? Energized (lots of funding) but still chaotic and advancing in unpredictable ways (e.g. competing qubit tec Read more…

Amazon, NCAR, SilverLining Team for Unprecedented Cloud Climate Simulations

September 10, 2021

Earth’s climate is, to put it mildly, not in a good place. In the wake of a damning report from the Intergovernmental Panel on Climate Change (IPCC), scientis Read more…

After Roadblocks and Renewals, EuroHPC Targets a Bigger, Quantum Future

September 9, 2021

The EuroHPC Joint Undertaking (JU) was formalized in 2018, beginning a new era of European supercomputing that began to bear fruit this year with the launch of several of the first EuroHPC systems. The undertaking, however, has not been without its speed bumps, and the Union faces an uphill... Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer called Dojo to process truly vast amounts of video data. It’s a beast! … A truly useful exaflop at de facto FP32.” Read more…

Berkeley Lab Debuts Perlmutter, World’s Fastest AI Supercomputer

May 27, 2021

A ribbon-cutting ceremony held virtually at Berkeley Lab's National Energy Research Scientific Computing Center (NERSC) today marked the official launch of Perlmutter – aka NERSC-9 – the GPU-accelerated supercomputer built by HPE in partnership with Nvidia and AMD. Read more…

Esperanto, Silicon in Hand, Champions the Efficiency of Its 1,092-Core RISC-V Chip

August 27, 2021

Esperanto Technologies made waves last December when it announced ET-SoC-1, a new RISC-V-based chip aimed at machine learning that packed nearly 1,100 cores onto a package small enough to fit six times over on a single PCIe card. Now, Esperanto is back, silicon in-hand and taking aim... Read more…

Enter Dojo: Tesla Reveals Design for Modular Supercomputer & D1 Chip

August 20, 2021

Two months ago, Tesla revealed a massive GPU cluster that it said was “roughly the number five supercomputer in the world,” and which was just a precursor to Tesla’s real supercomputing moonshot: the long-rumored, little-detailed Dojo system. “We’ve been scaling our neural network training compute dramatically over the last few years,” said Milan Kovac, Tesla’s director of autopilot engineering. Read more…

CentOS Replacement Rocky Linux Is Now in GA and Under Independent Control

June 21, 2021

The Rocky Enterprise Software Foundation (RESF) is announcing the general availability of Rocky Linux, release 8.4, designed as a drop-in replacement for the soon-to-be discontinued CentOS. The GA release is launching six-and-a-half months after Red Hat deprecated its support for the widely popular, free CentOS server operating system. The Rocky Linux development effort... Read more…

Intel Completes LLVM Adoption; Will End Updates to Classic C/C++ Compilers in Future

August 10, 2021

Intel reported in a blog this week that its adoption of the open source LLVM architecture for Intel’s C/C++ compiler is complete. The transition is part of In Read more…

Google Launches TPU v4 AI Chips

May 20, 2021

Google CEO Sundar Pichai spoke for only one minute and 42 seconds about the company’s latest TPU v4 Tensor Processing Units during his keynote at the Google I Read more…

Hot Chips: Here Come the DPUs and IPUs from Arm, Nvidia and Intel

August 25, 2021

The emergence of data processing units (DPU) and infrastructure processing units (IPU) as potentially important pieces in cloud and datacenter architectures was Read more…

Leading Solution Providers

Contributors

AMD-Xilinx Deal Gains UK, EU Approvals — China’s Decision Still Pending

July 1, 2021

AMD’s planned acquisition of FPGA maker Xilinx is now in the hands of Chinese regulators after needed antitrust approvals for the $35 billion deal were receiv Read more…

HPE Wins $2B GreenLake HPC-as-a-Service Deal with NSA

September 1, 2021

In the heated, oft-contentious, government IT space, HPE has won a massive $2 billion contract to provide HPC and AI services to the United States’ National Security Agency (NSA). Following on the heels of the now-canceled $10 billion JEDI contract (reissued as JWCC) and a $10 billion... Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

Quantum Roundup: IBM, Rigetti, Phasecraft, Oxford QC, China, and More

July 13, 2021

IBM yesterday announced a proof for a quantum ML algorithm. A week ago, it unveiled a new topology for its quantum processors. Last Friday, the Technical Univer Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Frontier to Meet 20MW Exascale Power Target Set by DARPA in 2008

July 14, 2021

After more than a decade of planning, the United States’ first exascale computer, Frontier, is set to arrive at Oak Ridge National Laboratory (ORNL) later this year. Crossing this “1,000x” horizon required overcoming four major challenges: power demand, reliability, extreme parallelism and data movement. Read more…

Intel Unveils New Node Names; Sapphire Rapids Is Now an ‘Intel 7’ CPU

July 27, 2021

What's a preeminent chip company to do when its process node technology lags the competition by (roughly) one generation, but outmoded naming conventions make it seem like it's two nodes behind? For Intel, the response was to change how it refers to its nodes with the aim of better reflecting its positioning within the leadership semiconductor manufacturing space. Intel revealed its new node nomenclature, and... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire