Blue Waters Opts Out of TOP500

By Tiffany Trader

November 16, 2012

The NCSA Blue Waters system is one of the fastest supercomputers in the world, but it won’t be appearing on the TOP500 list – nor will it be taking part in the HPC Challenge (HPCC) awards. While it’s generally understood that there are an unknown number of classified and commercial systems that don’t show up on the list, this is the first time an open science system has opted out in such a fashion.

According to the folks at the National Center for Supercomputing Applications (NCSA), there’s a good reason for this. In the days leading up to the 24th annual Supercomputing Conference (SC12) in Salt Lake City, HPCwire spoke with Blue Waters Project Director Bill Kramer to find out what went into this decision.

HPCwire: How long has Blue Waters been up and running? Would there have been enough time to run Linpack benchmark and submit to the TOP500 list?

Bill Kramer: Oh sure, and we would have had good results if we had chosen to run it. We even had an early science system that was a resource in the US academic world going back to January last year, and we chose not to submit that for the June list.

The system has been up and running full-scale applications in test mode and debugging and scaling platforms and so on from mid-summer on, and particularly since Linpack is such a simple test and does not require I/O, we had plenty of time to run the test.

In fact we have run the test across the entire system and the HPCC test as well, so this was a very conscious decision not to do it – it does not reflect any problems or issues.

HPCwire: Did you get the results you would have expected and are you going to release them?

Kramer: We don’t see any reason to publicize it, but there were requirements in the contract. These tests obtained very good results, but we’d rather exercise the system with real applications. For example, there are some full-scale science codes that have run over 25,000 nodes for multiple days, and they’re actually doing a science problem as opposed to a trivial problem.

We’d much rather use real applications with all the I/O and everything else in there to vet the system and accomplish a real result along the way and those are at least as stressful on the system as Linpack would be because they exercise all parts of the system not just the floating point units. Our focus is reflecting what the real scientists do not a very small subset of what some teams do.

HPCwire: So the contract with Cray did specify Linpack?

Kramer: HPCC was specified [editor’s note: HPCC includes Linpack], and that was one of hundreds of points – all of the others are much more relevant tests. For historical purposes, that was in there from the original NSF release, so we are meeting that, but it’s not relevant to whether the system is a quality system for sustained performance.

HPCwire: Are you releasing the HPCC results?

Kramer: No, and for the same reason. It’s better, but still doesn’t really reflect what to expect for real sustained performance for real applications. It’s better because it has multiple categories, but HPCC still lacks anything that has to do what to do with I/O, which is one of the major bottlenecks, so testing interconnect and testing memory performance.

Our challenge is not with Linpack as a benchmark and not with having a list, our concern is using a very simplified benchmark that has value in its own right, but not for the purpose of indicating usefulness of the system, or productivity of the system or effectiveness of the system.

HPCwire: How and when was the decision arrived at?

Kramer: Our entire project focus has been on sustained petascale performance, and it’s not one-dimensional, it’s not peak performance, it’s not Linpack performance – it’s performance for sustained real-world applications. If you go back to the original NSF solicitation, they encapsulated that into a set of six applications that they projected far forward to the challenging scientific problems that required this type of system and they set their metric to solving that problem within a certain amount of wall-clock time.

Going back to the very beginning, the philosophical nature of how this project came to be was all about delivering effective petascale computing. The investment strategy was to have a very large amount of memory, a very large amount of storage rather than trying to obtain a high single metric.

As we progressed, we have with National Science Foundation and many reviews developed a much more meaningful metric from our point of view called the Sustained Petascale Performance (SPP) test. The way we crafted that was by going to the science teams that we know and have been working with on the system and getting their real applications and their real science problems and using those as the measure of performance.

There are 12 application combinations that we are using to establish the performance of the system over a sustained petaflop in addition to the original NSF six applications. So we are actually going back to first principles: what are the scientists trying to do and making sure they’re able to do their required work within a reasonable amount of elapsed time.

The other part of this is enabling a diverse science base. The NSF, computational and data analytics community have a diverse portfolio of science, arguably the most diverse, and that diverse portfolio requires systems that perform well on that wide range of codes.

That’s really what our measures are and that’s what we remain focused on, so the decision to not list it is very consistent with what the project’s been about and what NSF’s goals have been going back to day one. The decision was made well before we needed to do any work to even submit the early system back in last January. It’s been a long–term process; it was made mutually by the university and NSF as being the right thing to do for the real goals of our project, and we’re very comfortable with it.

Next >>

HPCwire: Do you think we need a ranking system?

Kramer: I think lists are good, and I think as a focused, purposed benchmark, Linpack is good. I think the TOP500 list, though, combines those two things in a way that was interesting at some point, a while ago, but that now in some ways may be doing detriment to the community.

I have no trouble with lists and I think actually the community needs some idea of how we’re progressing, but we really need to be clear on what these lists mean, so for example, for much of the high-level systems on TOP500, what really determines how high they are is how much money is spent, not how well they perform on real applications.

There have been systems that never really get out to perform on real applications, but are on the list. There are ways to submit systems well before they are able to run many scientific or engineering applications. The historical nature of the list is perturbed by those other attributes and maybe those are what the lists measure. I can say for sure it doesn’t measure the progress in real sustained performance because there’s a severe disconnect between what the list says and what real sustained performance measures indicate.

HPCwire: Do we need something new or could we improve our current metrics to your satisfaction?

Kramer: I think there are ways to improve on relevance under the Linpack measurement. The people who put together the original list and maintain the list also talk about these things. Everybody’s afraid to take the first step. In the hallways everybody talks about the issues and the risks for misinterpretation for people who are not in our community, but then everyone says, “but I have to do it.”

Well we’re fortunate enough that we don’t have to do it, and we’re talking the first step by saying this is enough, we need to go to do something else. We are committed to working with others in the community to come up with a better way to describe how effective supercomputing is for solving unsolvable problems and that’s really the important thing.

HPCwire: If the benchmarks are very complex or we have too many of them, is that practical for a wide range of systems?

Kramer: Yes, I’m convinced it is. The NAS parallel benchmarks were very effective in their time. I’m not saying that they’re the right ones now, but in their time period, for a decade or so… There were eight tests that everybody ran. They were pseudo-applications; they didn’t have I/O in them for example, and I/O was less of a challenge in those days, but they gave you a much better picture of what you could expect out of systems.

Other benchmark suites that have between 8–12 tests are being used. The DoD has a pretty good suite that represents a reasonable workload. NERSC has a good persistence suite that has evolved over time, but I think there are enough proofs of existence that yes, you can have a much more dynamic set of things. HPCC might be a place to go leverage with those codes, but that’s also still difficult to figure how it translates into real world applications and how much you can get out of that.

If you look at the graph of real measured performance, say with the NERSC suite of codes, and look at that through 15 years of history and you look at the TOP500 lists, you see that there’s a strong disconnect between what really is achievable with systems and what the list says.

The list also correlates with the amount of funding available to pay for things. The challenges that bottleneck real performance are not being addressed. So I think yes, you can craft those processes in a tractable amount of time that is portable and expandable and that’s been done several different ways.

Next >>

HPCwire: Who are you directing this statement at? What outcome are you hoping for?

Kramer: Blue Waters is a leader in the community in many different ways, and this was another way we felt we could lead to get a more explicit dialogue going in the community about whether this is the way we want to use our metric for say exascale computing and whether this is still relevant.

HPCwire: What about push-back, both in general and your vendors, Cray and NVIDIA?

Kramer: We’ve been very clear with all of our partners and others who may have been partners, that spending tremendous effort to get a number on a list is not indicative of what’s really important to the project is not our priority so we’ve been very open with the partners and they have no objection to this.

HPCwire: In an article on the NCSA website, you write that “the TOP500 list and its associated Linpack have multiple serious problems,” and you’ve covered some of those already, would you like to highlight the ones you feel are most problematic?

Kramer: The main concerns are that it does not give an indication of value and particularly it doesn’t give an indication of value for sustained performance. Value is really the potential of a system to do work divided by its cost, so you can’t tell anything about the value; all you can tell is if you spend a lot of money on a system, you can get up high on the list.

Blue Waters is a project that is spending a significant amount of money, but it’s going into a very balanced system, not one that could have high FLOPS rates. I can tell you that if we had put all our money into peak performance and Linpack, we would have been number one on the list, for sure, for awhile.

If I had not done the investment in the world’s largest memory or the world’s most intense storage system, and just said I want to have the most number of peak FLOPS that directly translate into Linpack FLOPS that directly translates to this number and I don’t care about how hard it is for the science community to make use of those and how many science projects get disenfranchised because they’re not able to use GPUs at scale for a while, then we easily could have been on the top of the list for a number of cycles.

But that’s not our mission. It’s not what we designed our system for and it’s not what many people design their systems for. It could have led to a very poor choice for the real mission by paying attention to where the position is on the TOP500 list.

There are other aspects: the fact that you spend an awful lot of effort on getting something to work that you use once and throw away essentially all that effort. Some places have had to spend multiple weeks or months trying to get a number instead of doing science and engineering.

The improvements that we’re going to make to these SPP codes are actually improvements that go back to the science teams, so it’s a permanent improvement rather than a lot of that effort just going into a test case. It’s not a good way of allocating resources because you can’t reuse those resources.

HPCwire: Why now?

Kramer: The algorithmic space, the application space has changed dramatically from when the major implementation issues were dense linear algebra. There are many more things that are at least as important if not more important now in the way that systems are designed and what we’re trying to deal with.

Many methods have gone to sparse rather than dense, for example. As an indicator of what is really important in a system – we’re saying it’s time to relook at that and it’s not in the mission of our project to continue in that mode.

Last year at Supercomputing, there was a theme of sustained-performance and there were many parties that took part in this discussion. There were panel sessions and papers, etc. and this year, we hope we’ll be able to start the dialogue about how we do a better job of metrics that we can easily explain, but are much more much more meaningful for the real missions of our HPC systems.

Maybe by SC13 there’s a way to report back to the community – a better way that parts of the community, or hopefully the whole community, can say … after 20 years of doing it this way it’s time to do something different.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

What’s New in Computing vs. COVID-19: Fugaku, Congress, De Novo Design & More

July 2, 2020

Supercomputing, big data and artificial intelligence are crucial tools in the fight against the coronavirus pandemic. Around the world, researchers, corporations and governments are urgently devoting their computing reso Read more…

By Oliver Peckham

OpenPOWER Reboot – New Director, New Silicon Partners, Leveraging Linux Foundation Connections

July 2, 2020

Earlier this week the OpenPOWER Foundation announced the contribution of IBM’s A21 Power processor core design to the open source community. Roughly this time last year, IBM announced open sourcing its Power instructio Read more…

By John Russell

HPC Career Notes: July 2020 Edition

July 1, 2020

In this monthly feature, we'll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it's a promotion, new company hire, or even an accolade, we've got Read more…

By Mariana Iriarte

Supercomputers Enable Radical, Promising New COVID-19 Drug Development Approach

July 1, 2020

Around the world, innumerable supercomputers are sifting through billions of molecules in a desperate search for a viable therapeutic to treat COVID-19. Those molecules are pulled from enormous databases of known compoun Read more…

By Oliver Peckham

HPC-Powered Simulations Reveal a Looming Climatic Threat to Vital Monsoon Seasons

June 30, 2020

As June draws to a close, eyes are turning to the latter half of the year – and with it, the monsoon and hurricane seasons that can prove vital or devastating for many of the world’s coastal communities. Now, climate Read more…

By Oliver Peckham

AWS Solution Channel

Maxar Builds HPC on AWS to Deliver Forecasts 58% Faster Than Weather Supercomputer

When weather threatens drilling rigs, refineries, and other energy facilities, oil and gas companies want to move fast to protect personnel and equipment. And for firms that trade commodity shares in oil, precious metals, crops, and livestock, the weather can significantly impact their buy-sell decisions. Read more…

Intel® HPC + AI Pavilion

Supercomputing the Pandemic: Scientific Community Tackles COVID-19 from Multiple Perspectives

Since their inception, supercomputers have taken on the biggest, most complex, and most data-intensive computing challenges—from confirming Einstein’s theories about gravitational waves to predicting the impacts of climate change. Read more…

Hyperion Forecast – Headwinds in 2020 Won’t Stifle Cloud HPC Adoption or Arm’s Rise

June 30, 2020

The semiannual taking of HPC’s pulse by Hyperion Research – late fall at SC and early summer at ISC – is a much-watched indicator of things come. This year is no different though the conversion of ISC to a digital Read more…

By John Russell

OpenPOWER Reboot – New Director, New Silicon Partners, Leveraging Linux Foundation Connections

July 2, 2020

Earlier this week the OpenPOWER Foundation announced the contribution of IBM’s A21 Power processor core design to the open source community. Roughly this time Read more…

By John Russell

Hyperion Forecast – Headwinds in 2020 Won’t Stifle Cloud HPC Adoption or Arm’s Rise

June 30, 2020

The semiannual taking of HPC’s pulse by Hyperion Research – late fall at SC and early summer at ISC – is a much-watched indicator of things come. This yea Read more…

By John Russell

Racism and HPC: a Special Podcast

June 29, 2020

Promoting greater diversity in HPC is a much-discussed goal and ostensibly a long-sought goal in HPC. Yet it seems clear HPC is far from achieving this goal. Re Read more…

Top500 Trends: Movement on Top, but Record Low Turnover

June 25, 2020

The 55th installment of the Top500 list saw strong activity in the leadership segment with four new systems in the top ten and a crowning achievement from the f Read more…

By Tiffany Trader

ISC 2020 Keynote: Hope for the Future, Praise for Fugaku and HPC’s Pandemic Response

June 24, 2020

In stark contrast to past years Thomas Sterling’s ISC20 keynote today struck a more somber note with the COVID-19 pandemic as the central character in Sterling’s annual review of worldwide trends in HPC. Better known for his engaging manner and occasional willingness to poke prickly egos, Sterling instead strode through the numbing statistics associated... Read more…

By John Russell

ISC 2020’s Student Cluster Competition Winners Announced

June 24, 2020

Normally, the Student Cluster Competition involves teams of students building real computing clusters on the show floors of major supercomputer conferences and Read more…

By Oliver Peckham

Hoefler’s Whirlwind ISC20 Virtual Tour of ML Trends in 9 Slides

June 23, 2020

The ISC20 experience this year via livestreaming and pre-recordings is interesting and perhaps a bit odd. That said presenters’ efforts to condense their comments makes for economic use of your time. Torsten Hoefler’s whirlwind 12-minute tour of ML is a great example. Hoefler, leader of the planned ISC20 Machine Learning... Read more…

By John Russell

At ISC, the Fight Against COVID-19 Took the Stage – and Yes, Fugaku Was There

June 23, 2020

With over nine million infected and nearly half a million dead, the COVID-19 pandemic has seized the world’s attention for several months. It has also dominat Read more…

By Oliver Peckham

Supercomputer Modeling Tests How COVID-19 Spreads in Grocery Stores

April 8, 2020

In the COVID-19 era, many people are treating simple activities like getting gas or groceries with caution as they try to heed social distancing mandates and protect their own health. Still, significant uncertainty surrounds the relative risk of different activities, and conflicting information is prevalent. A team of Finnish researchers set out to address some of these uncertainties by... Read more…

By Oliver Peckham

[email protected] Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

Global Supercomputing Is Mobilizing Against COVID-19

March 12, 2020

Tech has been taking some heavy losses from the coronavirus pandemic. Global supply chains have been disrupted, virtually every major tech conference taking place over the next few months has been canceled... Read more…

By Oliver Peckham

Supercomputer Simulations Reveal the Fate of the Neanderthals

May 25, 2020

For hundreds of thousands of years, neanderthals roamed the planet, eventually (almost 50,000 years ago) giving way to homo sapiens, which quickly became the do Read more…

By Oliver Peckham

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Steve Scott Lays Out HPE-Cray Blended Product Roadmap

March 11, 2020

Last week, the day before the El Capitan processor disclosures were made at HPE's new headquarters in San Jose, Steve Scott (CTO for HPC & AI at HPE, and former Cray CTO) was on-hand at the Rice Oil & Gas HPC conference in Houston. He was there to discuss the HPE-Cray transition and blended roadmap, as well as his favorite topic, Cray's eighth-gen networking technology, Slingshot. Read more…

By Tiffany Trader

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems waded deliberately into the then calmer quantum computing (QC) waters. Fast forward to March when Honeywell announced plans to introduce an ion trap-based quantum computer whose ‘performance’ would... Read more…

By John Russell

Leading Solution Providers

Contributors

Neocortex Will Be First-of-Its-Kind 800,000-Core AI Supercomputer

June 9, 2020

Pittsburgh Supercomputing Center (PSC - a joint research organization of Carnegie Mellon University and the University of Pittsburgh) has won a $5 million award Read more…

By Tiffany Trader

‘Billion Molecules Against COVID-19’ Challenge to Launch with Massive Supercomputing Support

April 22, 2020

Around the world, supercomputing centers have spun up and opened their doors for COVID-19 research in what may be the most unified supercomputing effort in hist Read more…

By Oliver Peckham

Australian Researchers Break All-Time Internet Speed Record

May 26, 2020

If you’ve been stuck at home for the last few months, you’ve probably become more attuned to the quality (or lack thereof) of your internet connection. Even Read more…

By Oliver Peckham

Nvidia’s Ampere A100 GPU: Up to 2.5X the HPC, 20X the AI

May 14, 2020

Nvidia's first Ampere-based graphics card, the A100 GPU, packs a whopping 54 billion transistors on 826mm2 of silicon, making it the world's largest seven-nanom Read more…

By Tiffany Trader

15 Slides on Programming Aurora and Exascale Systems

May 7, 2020

Sometime in 2021, Aurora, the first planned U.S. exascale system, is scheduled to be fired up at Argonne National Laboratory. Cray (now HPE) and Intel are the k Read more…

By John Russell

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

TACC Supercomputers Run Simulations Illuminating COVID-19, DNA Replication

March 19, 2020

As supercomputers around the world spin up to combat the coronavirus, the Texas Advanced Computing Center (TACC) is announcing results that may help to illumina Read more…

By Staff report

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This