Jack Dongarra on SC21, the Top500 and His Retirement Plans

By Tiffany Trader

November 29, 2021

HPCwire’s Managing Editor sits down with Jack Dongarra, Top500 co-founder and Distinguished Professor at the University of Tennessee, during SC21 in St. Louis to discuss the latest Top500 list, the outlook for global exascale computing and what exactly is going on in that Viking helmet photo. Plus what’s in store for 2022.

Transcript (lightly edited):

Tiffany Trader: Hello, I’m Tiffany Trader, Managing Editor, HPCwire. And here with me at SC21 in St. Louis is Jack Dongarra. Jack Dongarra needs no introduction. But he is the Top500 co-founder, he’s the originator of the Linpack benchmark, and he is also a distinguished professor with the University of Tennessee and has affiliations with Oak Ridge National Laboratory and the University of Manchester. So, Jack, we’re here at SC21. How many SCs does that make for you?

Perennials pose for group photo at SC16.

Jack Dongarra: Hello. I’m one of a select group of people who have been to all of the SCs. SC started in 1988. And there are 18 of us who have been to all of the conferences. This year is a little unusual in the sense that not all of us are here in person. Some of them are here virtually. But we’ve been to every every one of them since the since the beginning in 1988.

Trader: And you all have the designation of perennials…

Dongarra: And we have the designation of perennials, that’s right. So that’s our claim to fame.

Trader: Speaking of Supercomputings from years past, there is an iconic photo of you from a little little while ago in a Viking helmet, and I’ve always wondered what the backstory is on that.

Dongarra: So that was during the Supercomputing when it was held in Minnesota in Minneapolis [SC92]. It was in November, of course; and it was freezing cold. And there was a party that Intel threw, and part of the party was to give away those helmets. And I still have that helmet; it’s used by my grandchildren now, and they do enjoy it.

The Intel party was a little bit unusual in a number of ways. One way was they had hired the Viking cheerleaders, NFL Viking cheerleaders to come in and perform at the party, and that was probably not the most politically correct thing to have happen. And the other memorable thing that sticks in my mind is one of the parties, maybe the exhibitor party, was held at Prince’s club in Minneapolis. There was a lot of loud music, a lot of drinking, a lot of things going on. At about one o’clock, I decided to leave Prince’s club, and as I was leaving the entrance that we were using, somebody told me to step aside. And I was a little shocked that I was being asked that. So, a car pulls up and Prince jumps out and goes into his club.

Trader: I think a lot of people will be jealous of that story. That’s a pretty good story. Another important title that you’re wearing this year, and for this show is that you are you are the keynote chair. And we had a really nice plenary on Monday night. It was the HPC, AI and ethics plenary chaired by Dan Reed. And then there was the keynote this morning with Vint Cerf, the father of the internet who is also the Internet Evangelist for Google. So, I mean, how did you pick those folks?

Dongarra: Well, this year, I was asked by Bronis [de Supinski, SC General Chair] to be part of the team that put together the conference. I’m on the executive committee, and my responsibility this year is to drive the keynote, to choose a keynote speaker. And part of that responsibility is also to put together this panel. That would be the plenary panel for SC. I’ve known Dan Reed for many years. He was the chair of that. He suggested a bunch of names. We wanted to make sure that the panel was in-person, we didn’t want to have virtual panelists. So we it did take a little bit of struggle to put together. But we had a terrific panel. I think it’s the first time we had an astronaut on the panel. We had a lawyer, we had an MD, we had a scientist. So it made for a very rounded group of people to talk about ethics and the impact on supercomputing and what perhaps the future might look. So I thought was very interesting panel, very engaging.

This morning, we had our keynote from Vint Cerf. I’ve known Vint for many years. And when asked to do it, he very graciously said he would. I had heard him speak before about a number of things, and suggested that maybe he can give a talk because of the theme of this conference relating to ‘science and beyond’ and humanities and so on. That he could give a talk that I had heard him give at the University of Tennessee, that was related to what he talked about today. I thought it was a great talk. He was able to get things at the right level, to capture the interest of the audience. From where I was sitting, it looked like the whole auditorium was full, which I was very impressed with. We were concerned about that. And of course, because of the pandemic that it wouldn’t be, but I think Vint being a very known name, he was able to attract a very large crowd. And I think he did very well in this presentation today. I think a lot of people enjoyed it.

Trader: Yes, it was nicely attended, as was the Intersection of Ethics and HPC panel, which, as you referenced, had a really rich and distinct distinguished set of panelists. And because of the hybrid nature of this event, those were livestreamed to all of the attendees. And they will also be made available on demand. So if you haven’t seen them yet, I really recommend people check them out.

Dongarra: Absolutely. So they’re roughly 45 minutes to an hour long. And there are online as we say, and they are worth listening to, to capture the spirit of what was talked about.

Trader: So one of the flagship events of the SC and then the European counterpart, ISC, is the unveiling of the twice yearly Top500 list. And this one was no exception. We were in the Top500 press briefing yesterday on Monday with you and the other list authors and the Green500 list author. If you’d like we can dive a little into the list in a minute. And you can you can see our coverage on HPCwire from more of the feeds and speeds and the full coverage of that. But what I thought it was, again, for this hybrid event, it was, I wouldn’t say surprisingly, but it was it was very rich and robust. And it was a good… I mean, I’ve been to many of these, and it was a good one.

Dongarra: It was a good one.

Trader: It was a little different… the format, we had questions coming in from the people who were livestreaming in. And so they had some interesting questions. And I asked some questions too. So these are some of the questions that came up. We were talking about the different systems and the different geopolitical considerations with China specifically holding back systems and, you know, that brings up the relevancy or the usefulness of the list, or what are the implications for the list if some big players are not on there.

Dongarra: That’s right. So yeah, let me just start by saying, this year’s list is not terribly exciting at the top end, anyway, there doesn’t seem to be many new machines at the top end. In the top 10, there’s one new new entry in the list…

Trader: Number 10.

Dongarra: Number 10.

Trader: Fugaku is still on top.

Dongarra: Fugaku is still on top. Yeah. So that’s a very impressive machine. It’s the fourth time that it’s been number one on the list.

Trader: Cheers to Satoshi Matsuoka.

Dongarra: Satoshi Matsuoka deserves a lot of credit for that architecture, and that machine and how it’s run. And we were hoping that there would be new entries, which would be at the exascale point. We know that there’s a machine being put together at Oak Ridge. I’ve seen it; it’s there.

A Frontier 2-node sled on display at AMD’s booth at SC21

Trader: They stole one of the nodes and put it on the show floor. It’s in the AMD booth.

Dongarra: I didn’t know that… the machine isn’t quite ready yet. It’s a big machine. It’s a challenge of course, whenever you assemble a machine like that at scale, to bring it up for the first time. And I think they’re going through some of those early stages and trying to get this… trying not only to get the hardware up, but also to get the software working and that that presents some of the challenges that they’re facing. And that’s not unusual; that happens for all machines. The timing is such that they couldn’t get the entry in in time. I’m sure that they will have one in for June, for the June list. So we’ll see an exascale machine there.

China is reported to have at least two machines which are at exascale. And that comes from the rumors, of course, we haven’t seen the results. And that comes from good sources, though. So the question might be: why haven’t those machines been introduced into the list? Why haven’t they announced those machines? Why are they holding back? And you know there’s probably a number of reasons that we can give for that. One reason could be related to China not wanting to upset some kind of balance that’s in place now with the U.S. government. And I probably feel that that’s the reason that China’s a little concerned about coming up with technology that supersedes the technologies that are in the U.S. [Their] exascale machine would be probably five times faster than the machine at Oak Ridge today [i.e. Summit]. And that could cause some people to react in a way that might upset China. It’s unclear where China makes its parts, fabs its chips. They probably do it at TSMC, so Taiwan is a source for their chips. And the U.S. government may react in some way against that.

Jack Dongarra being interviewed by some young Chinese reporters in Wuxi (May 2017) Credit: HPCwire

So that’s one argument I think that could be made for why they haven’t disclosed the machines. Those machines are not secret. There’s an entry in the Gordon Bell Prize, which was run on one of those machines. And that’s a very impressive result that they have. The architecture is described in some detail. So we got a good understanding of what that machine looks like. And it looks to be a very powerful system for solving high performance computing problems. You know, sometimes people think of machines that are put together and appear number one as being a stunt machine that was put together just to get a Linpack number. That was said about the machine from Wuxi, the Sunway machine when it first came out. But the reality is: it’s a very powerful machine, it is being used for science, they were able to win Gordon Bell Prizes with it. And I think that demonstrates that the machine is not just a one-off stunt to do number one in the Top500. So I feel that the Chinese machines when they are announced officially… So what does it mean to be announced? Well, in order to get into the Top500 list, they have to submit. They have to actively submit something to us, which shows the benchmark results and sort of proves that they achieved a certain level of performance. It’s simple to do, they click on a website, they upload a file, and then it becomes part of the Top500 list the next time that list is released. And they have not done that at this point. So we’re waiting for it, we’re hoping for it. Maybe in June, we’ll see three machines at exascale. That would really be a substantial change to the list, in the sense that the current number-one machine on the list is 440 petaflops. If we have three machines at exaflop, that would bump it way up. And so that would be an exciting time as we look at that, for high performance computing, showing that the progression is still there. Right now it looks rather as if it’s in the doldrums, it’s very, very quiet.

Trader: Yeah, the list has some natural steps in it. But that will be the largest step.

Dongarra: It will be a big jump.

Trader: A big jump… You offered a lot of really interesting information there. And one of the things that caught caught my attention was this pivot that we are seeing [from China] away from the Top500. But then they redirected those efforts into the Gordon Bell list. And that’s, that’s actually given some interesting research results, some nice research.

Dongarra: And I think that’s a great thing. We need many kinds of metrics to understand how these machines can be used and what benefit they can provide, and what levels we can expect for applications. So showing the Top500 is one. Showing the Gordon Bell Prize certainly exhibits the fact that they can solve real problems on a machine at scale, doing things which cannot be done on any other machine. You know, that really shows the importance of that scientific instrument. And, you know, I view these supercomputers as scientific instruments, much like the Hubble Telescope, or the Webb Telescope, you know, something that is unique. It provides tremendous insight for people that get to use it. It provides the opportunity to push back the frontiers of science when it’s used appropriately. And that’s that’s really what supercomputing does.

Trader: Well said. And another thought I had: as you said, there’s potentially two exascale machines coming up on the list from the U.S., Oak Ridge and Argonne National Laboratories. And then we’ll see what China ends up doing. You know, that could be four. In the U.S., there’s kind of an interesting consideration. So Intel and Argonne, they announced a couple two, three weeks ago, that they’re basically essentially doubling the size of the forthcoming Aurora system. So I think what some folks are looking at now is what the timeline will be. And if there’s a potential even though, you know, Oak Ridge had the headstart with an installation on the floor, if somehow they could they could come in on the same list, which would be November 2022 [I meant June -tt]. With potentially Argonne coming in above.

Dongarra: I think it’d be great to see the Oak Ridge and the Argonne machine emerge next year, I think that would be good. You know, I have my question/doubts, whether or not the Argonne machine will make it. But, you know, I think that’s certainly in the possible realm of possibilities there. I’m pretty confident that the Oak Ridge machine will be on the list though. That machine is in place, running. And it’s just a question of tuning.

Trader: Yeah. And I mean, it’s just sort of speculation, but it’s possible that the Argonne would be at half size, and then they wouldn’t be to the full size. I don’t have any knowledge specifically of that. But it’s a reasonable thing that happens often where there’s sort of a phase one, and a phase two. That’s very common.

Dongarra: And then there’s another machine too, the Livermore machine, coming a little bit later.

Trader: Yes. Coming around the corner.

Dongarra: Yep. So three big machines.

Trader: That’s pretty exciting. Talking about geopolitic[s] and the new systems, there are four additional new systems that were interesting. There were four Russian systems as well that came into the latest list. And I think all four of them were within the top 50, mostly used for hyperscale and cloud. I don’t have the speeds and feeds at hand, but they were pretty significant, and had decent interconnects.

Dongarra: Decent interconnect. I think they’re Nvidia based systems, A100 accelerators being used. I think the official website says something about AI based applications, cloud things, of course, coming from that context. And yeah, I don’t have any more details about the exact placement. They appear to be commodity based systems in that sense. There doesn’t seem to be anything that was specially customized. I’m sure there’s a lot of software that has been customized. But I don’t think that there’s any customized hardware going into that.

Trader: Any other thoughts on the list or highlights from the show?

Dongarra: Well, I haven’t been able to walk around the show floor here. It’s a booming place, not quite as large as it has been in the past. But a lot of vendors are here. Some vendors are not. But there seems to be enough activity here that it resembles a regular show, I’m reminded that this conference this year is about the same size as the conference held in Germany, the ISC conference, maybe a little bit bigger now. I think the numbers are 3,500 people, I think I heard said this morning. And I think the German conference has maybe about 2,000 people.

Trader: In-person attendees?

Dongarra: Yeah. So 3,500 in person here in St. Louis. And it looks like it’s a valid, thriving show. There’s been some hiccups with some of the virtual nature of things, links not working correctly, but I’m sure we’ll sort it out. This is the second day of the conference.

Trader: And the other thing is SCinet was really good here. Raised floor. I thought that was pretty cool. That’s pretty appropriate and on brand.

Dongarra: Yeah, on brand. So of course, those guys do a remarkable job. They come in a week ahead of time, maybe two weeks ahead of time, and get everything set up, put in the network, put in all the wiring, put in everything that’s needed to make this event, the wireless stuff. And then they have to tear it down in a day. So it’s just incredible.

Trader: And all volunteer.

Dongarra: Yeah, all volunteer. It’s just a remarkable story.

Trader: Yeah. So you recently revealed some changes coming down the path in your in your career and some retirement plans. You want to tell us a little bit more about that?

Dongarra: I announced that I’m going to retire. I’ll retire in the summer. I’ve been at the University of Tennessee for 32 years. And I’ll be 72, I guess, at that point. So it’s time for me to step down and transition in some sense. So I’ll be an emeritus professor at the University of Tennessee. What does that mean? Well, emeritus means I have no teaching duties. I have no committee work. I can still have an office. I can still run a research group. I can still come in whenever I want. I think I get a free parking spot. I don’t get any pay, and I will be able to be co-PI on grants. So there’s a lot of things that will remain the same.

I have a group in Tennessee: the Innovative Computing Laboratory. My group has 45 people, and that’s composed of research professors, postdocs, graduate students, programmers, and some administrative staff; and that group with 45 people are on soft money is what we call it. So soft money means we get grants, and that’s how they get paid, and if we don’t get a grant, there may not be money to pay people. So we strive very hard. We have a pretty good system for putting in place grants, and we have a good success rate as well, not 100 percent, but enough to sustain us in our in our activities. My hope is that we when I leave, officially step down as distinguished professor, we’ll be able to hire someone to come in and fill the position. The MathWorks Corporation has given us a donation. So we can have a named professor, the MathWorks professor in scientific computing, and I’m hoping that individual will come and help run the activities that have been put in place over the last 32 years.

So I’m looking forward to retirement and stepping down from some of the jobs that I don’t like to do, and maybe continuing on with the jobs I enjoy doing that are what I consider to be my hobbies. One of my hobbies is benchmarking. So I’m not going to abandon ship. In terms of the benchmarks, we have the Top500, we have the HPCG benchmark, we have the HPL-AI benchmark, and there are other things to that we’re dabbling with. So there’s a number of things that will continue. Given more free time, I’ll probably be able to help out those things a little bit more.

Trader: Even more, yeah. So you’ll be maintaining your your benchmarking responsibilities and activities, maybe putting even a little bit more focus into those. And, I can probably expect to see you next year as part of that the press briefing for the Top500. That show is in Dallas; SC22 is in Dallas. And will you be there?

Dongarra: Hey, I’m a perennial.

Trader: There you go. I look forward to next year. Thanks for joining us.


To view more HPCwire exclusive SC21 video interviews, including our joint interview with Raja Koduri and Satoshi Matsuoka, go here. Interviews with SiPearl and Preferred Networks will be out soon.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire