Blue Waters Opts Out of TOP500

By Tiffany Trader

November 16, 2012

The NCSA Blue Waters system is one of the fastest supercomputers in the world, but it won’t be appearing on the TOP500 list – nor will it be taking part in the HPC Challenge (HPCC) awards. While it’s generally understood that there are an unknown number of classified and commercial systems that don’t show up on the list, this is the first time an open science system has opted out in such a fashion.

According to the folks at the National Center for Supercomputing Applications (NCSA), there’s a good reason for this. In the days leading up to the 24th annual Supercomputing Conference (SC12) in Salt Lake City, HPCwire spoke with Blue Waters Project Director Bill Kramer to find out what went into this decision.

HPCwire: How long has Blue Waters been up and running? Would there have been enough time to run Linpack benchmark and submit to the TOP500 list?

Bill Kramer: Oh sure, and we would have had good results if we had chosen to run it. We even had an early science system that was a resource in the US academic world going back to January last year, and we chose not to submit that for the June list.

The system has been up and running full-scale applications in test mode and debugging and scaling platforms and so on from mid-summer on, and particularly since Linpack is such a simple test and does not require I/O, we had plenty of time to run the test.

In fact we have run the test across the entire system and the HPCC test as well, so this was a very conscious decision not to do it – it does not reflect any problems or issues.

HPCwire: Did you get the results you would have expected and are you going to release them?

Kramer: We don’t see any reason to publicize it, but there were requirements in the contract. These tests obtained very good results, but we’d rather exercise the system with real applications. For example, there are some full-scale science codes that have run over 25,000 nodes for multiple days, and they’re actually doing a science problem as opposed to a trivial problem.

We’d much rather use real applications with all the I/O and everything else in there to vet the system and accomplish a real result along the way and those are at least as stressful on the system as Linpack would be because they exercise all parts of the system not just the floating point units. Our focus is reflecting what the real scientists do not a very small subset of what some teams do.

HPCwire: So the contract with Cray did specify Linpack?

Kramer: HPCC was specified [editor’s note: HPCC includes Linpack], and that was one of hundreds of points – all of the others are much more relevant tests. For historical purposes, that was in there from the original NSF release, so we are meeting that, but it’s not relevant to whether the system is a quality system for sustained performance.

HPCwire: Are you releasing the HPCC results?

Kramer: No, and for the same reason. It’s better, but still doesn’t really reflect what to expect for real sustained performance for real applications. It’s better because it has multiple categories, but HPCC still lacks anything that has to do what to do with I/O, which is one of the major bottlenecks, so testing interconnect and testing memory performance.

Our challenge is not with Linpack as a benchmark and not with having a list, our concern is using a very simplified benchmark that has value in its own right, but not for the purpose of indicating usefulness of the system, or productivity of the system or effectiveness of the system.

HPCwire: How and when was the decision arrived at?

Kramer: Our entire project focus has been on sustained petascale performance, and it’s not one-dimensional, it’s not peak performance, it’s not Linpack performance – it’s performance for sustained real-world applications. If you go back to the original NSF solicitation, they encapsulated that into a set of six applications that they projected far forward to the challenging scientific problems that required this type of system and they set their metric to solving that problem within a certain amount of wall-clock time.

Going back to the very beginning, the philosophical nature of how this project came to be was all about delivering effective petascale computing. The investment strategy was to have a very large amount of memory, a very large amount of storage rather than trying to obtain a high single metric.

As we progressed, we have with National Science Foundation and many reviews developed a much more meaningful metric from our point of view called the Sustained Petascale Performance (SPP) test. The way we crafted that was by going to the science teams that we know and have been working with on the system and getting their real applications and their real science problems and using those as the measure of performance.

There are 12 application combinations that we are using to establish the performance of the system over a sustained petaflop in addition to the original NSF six applications. So we are actually going back to first principles: what are the scientists trying to do and making sure they’re able to do their required work within a reasonable amount of elapsed time.

The other part of this is enabling a diverse science base. The NSF, computational and data analytics community have a diverse portfolio of science, arguably the most diverse, and that diverse portfolio requires systems that perform well on that wide range of codes.

That’s really what our measures are and that’s what we remain focused on, so the decision to not list it is very consistent with what the project’s been about and what NSF’s goals have been going back to day one. The decision was made well before we needed to do any work to even submit the early system back in last January. It’s been a long–term process; it was made mutually by the university and NSF as being the right thing to do for the real goals of our project, and we’re very comfortable with it.

Next >>

HPCwire: Do you think we need a ranking system?

Kramer: I think lists are good, and I think as a focused, purposed benchmark, Linpack is good. I think the TOP500 list, though, combines those two things in a way that was interesting at some point, a while ago, but that now in some ways may be doing detriment to the community.

I have no trouble with lists and I think actually the community needs some idea of how we’re progressing, but we really need to be clear on what these lists mean, so for example, for much of the high-level systems on TOP500, what really determines how high they are is how much money is spent, not how well they perform on real applications.

There have been systems that never really get out to perform on real applications, but are on the list. There are ways to submit systems well before they are able to run many scientific or engineering applications. The historical nature of the list is perturbed by those other attributes and maybe those are what the lists measure. I can say for sure it doesn’t measure the progress in real sustained performance because there’s a severe disconnect between what the list says and what real sustained performance measures indicate.

HPCwire: Do we need something new or could we improve our current metrics to your satisfaction?

Kramer: I think there are ways to improve on relevance under the Linpack measurement. The people who put together the original list and maintain the list also talk about these things. Everybody’s afraid to take the first step. In the hallways everybody talks about the issues and the risks for misinterpretation for people who are not in our community, but then everyone says, “but I have to do it.”

Well we’re fortunate enough that we don’t have to do it, and we’re talking the first step by saying this is enough, we need to go to do something else. We are committed to working with others in the community to come up with a better way to describe how effective supercomputing is for solving unsolvable problems and that’s really the important thing.

HPCwire: If the benchmarks are very complex or we have too many of them, is that practical for a wide range of systems?

Kramer: Yes, I’m convinced it is. The NAS parallel benchmarks were very effective in their time. I’m not saying that they’re the right ones now, but in their time period, for a decade or so… There were eight tests that everybody ran. They were pseudo-applications; they didn’t have I/O in them for example, and I/O was less of a challenge in those days, but they gave you a much better picture of what you could expect out of systems.

Other benchmark suites that have between 8–12 tests are being used. The DoD has a pretty good suite that represents a reasonable workload. NERSC has a good persistence suite that has evolved over time, but I think there are enough proofs of existence that yes, you can have a much more dynamic set of things. HPCC might be a place to go leverage with those codes, but that’s also still difficult to figure how it translates into real world applications and how much you can get out of that.

If you look at the graph of real measured performance, say with the NERSC suite of codes, and look at that through 15 years of history and you look at the TOP500 lists, you see that there’s a strong disconnect between what really is achievable with systems and what the list says.

The list also correlates with the amount of funding available to pay for things. The challenges that bottleneck real performance are not being addressed. So I think yes, you can craft those processes in a tractable amount of time that is portable and expandable and that’s been done several different ways.

Next >>

HPCwire: Who are you directing this statement at? What outcome are you hoping for?

Kramer: Blue Waters is a leader in the community in many different ways, and this was another way we felt we could lead to get a more explicit dialogue going in the community about whether this is the way we want to use our metric for say exascale computing and whether this is still relevant.

HPCwire: What about push-back, both in general and your vendors, Cray and NVIDIA?

Kramer: We’ve been very clear with all of our partners and others who may have been partners, that spending tremendous effort to get a number on a list is not indicative of what’s really important to the project is not our priority so we’ve been very open with the partners and they have no objection to this.

HPCwire: In an article on the NCSA website, you write that “the TOP500 list and its associated Linpack have multiple serious problems,” and you’ve covered some of those already, would you like to highlight the ones you feel are most problematic?

Kramer: The main concerns are that it does not give an indication of value and particularly it doesn’t give an indication of value for sustained performance. Value is really the potential of a system to do work divided by its cost, so you can’t tell anything about the value; all you can tell is if you spend a lot of money on a system, you can get up high on the list.

Blue Waters is a project that is spending a significant amount of money, but it’s going into a very balanced system, not one that could have high FLOPS rates. I can tell you that if we had put all our money into peak performance and Linpack, we would have been number one on the list, for sure, for awhile.

If I had not done the investment in the world’s largest memory or the world’s most intense storage system, and just said I want to have the most number of peak FLOPS that directly translate into Linpack FLOPS that directly translates to this number and I don’t care about how hard it is for the science community to make use of those and how many science projects get disenfranchised because they’re not able to use GPUs at scale for a while, then we easily could have been on the top of the list for a number of cycles.

But that’s not our mission. It’s not what we designed our system for and it’s not what many people design their systems for. It could have led to a very poor choice for the real mission by paying attention to where the position is on the TOP500 list.

There are other aspects: the fact that you spend an awful lot of effort on getting something to work that you use once and throw away essentially all that effort. Some places have had to spend multiple weeks or months trying to get a number instead of doing science and engineering.

The improvements that we’re going to make to these SPP codes are actually improvements that go back to the science teams, so it’s a permanent improvement rather than a lot of that effort just going into a test case. It’s not a good way of allocating resources because you can’t reuse those resources.

HPCwire: Why now?

Kramer: The algorithmic space, the application space has changed dramatically from when the major implementation issues were dense linear algebra. There are many more things that are at least as important if not more important now in the way that systems are designed and what we’re trying to deal with.

Many methods have gone to sparse rather than dense, for example. As an indicator of what is really important in a system – we’re saying it’s time to relook at that and it’s not in the mission of our project to continue in that mode.

Last year at Supercomputing, there was a theme of sustained-performance and there were many parties that took part in this discussion. There were panel sessions and papers, etc. and this year, we hope we’ll be able to start the dialogue about how we do a better job of metrics that we can easily explain, but are much more much more meaningful for the real missions of our HPC systems.

Maybe by SC13 there’s a way to report back to the community – a better way that parts of the community, or hopefully the whole community, can say … after 20 years of doing it this way it’s time to do something different.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

At SC18: AMD Sets Up for Epyc Epoch

November 16, 2018

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

Dell EMC’s HPC Chief on Strategy and Emerging Processor Diversity

November 16, 2018

Last January Thierry Pellegrino, a long-time Dell/Dell EMC veteran, became vice president of HPC. His tenure comes at a time when the very definition of HPC is blurring with AI writ large (data analytics, machine learnin Read more…

By John Russell

IBM’s AI-HPC Combine for ‘Intelligent Simulation’: Eliminating the Unnecessary 

November 16, 2018

A powerhouse concept in attaining new knowledge is the notion of the “emergent property,” the combination of formerly stovepiped scientific disciplines and exploratory methods to form cross-disciplinary intelligence Read more…

By Doug Black

HPE Extreme Performance Solutions

AI Can Be Scary. But Choosing the Wrong Partners Can Be Mortifying!

As you continue to dive deeper into AI, you will discover it is more than just deep learning. AI is an extremely complex set of machine learning, deep learning, reinforcement, and analytics algorithms with varying compute, storage, memory, and communications needs. Read more…

IBM Accelerated Insights

From Deep Blue to Summit – 30 Years of Supercomputing Innovation

This week, in honor of the 30th anniversary of the SC conference, we are highlighting some of the most significant IBM contributions to supercomputing over the past 30 years. Read more…

How the United States Invests in Supercomputing

November 14, 2018

The CORAL supercomputers Summit and Sierra are now the world's fastest computers and are already contributing to science with early applications. Ahead of SC18, Maciej Chojnowski with ICM at the University of Warsaw discussed the details of the CORAL project with Dr. Dimitri Kusnezov from the U.S. Department of Energy. Read more…

By Maciej Chojnowski

At SC18: AMD Sets Up for Epyc Epoch

November 16, 2018

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

Dell EMC’s HPC Chief on Strategy and Emerging Processor Diversity

November 16, 2018

Last January Thierry Pellegrino, a long-time Dell/Dell EMC veteran, became vice president of HPC. His tenure comes at a time when the very definition of HPC is Read more…

By John Russell

IBM’s AI-HPC Combine for ‘Intelligent Simulation’: Eliminating the Unnecessary 

November 16, 2018

A powerhouse concept in attaining new knowledge is the notion of the “emergent property,” the combination of formerly stovepiped scientific disciplines and Read more…

By Doug Black

How the United States Invests in Supercomputing

November 14, 2018

The CORAL supercomputers Summit and Sierra are now the world's fastest computers and are already contributing to science with early applications. Ahead of SC18, Maciej Chojnowski with ICM at the University of Warsaw discussed the details of the CORAL project with Dr. Dimitri Kusnezov from the U.S. Department of Energy. Read more…

By Maciej Chojnowski

At SC18: Humanitarianism Amid Boom Times for HPC

November 14, 2018

At SC18 in Dallas, the feeling on the ground is one of forward-looking buoyancy. Like boom times that cycle through the Texas oil fields, the HPC industry is en Read more…

By Doug Black

Nvidia’s Jensen Huang Delivers Vision for the New HPC

November 14, 2018

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can do. Animated. Backstopped by a stream of data charts, product photos, and even a beautiful image of supernovae... Read more…

By John Russell

New Panasas High Performance Storage Straddles Commercial-Traditional HPC

November 13, 2018

High performance storage vendor Panasas has launched a new version of its ActiveStor product line this morning featuring what the company said is the industry’s first plug-and-play, portable parallel file system that delivers up to 75 Gb/s per rack on industry standard hardware combined with “enterprise-grade reliability and manageability.” Read more…

By Doug Black

SC18 Student Cluster Competition – Revealing the Field

November 13, 2018

It’s November again and we’re almost ready for the kick-off of one of the greatest computer sports events in the world – the SC Student Cluster Competitio Read more…

By Dan Olds

Cray Unveils Shasta, Lands NERSC-9 Contract

October 30, 2018

Cray revealed today the details of its next-gen supercomputing architecture, Shasta, selected to be the next flagship system at NERSC. We've known of the code-name "Shasta" since the Argonne slice of the CORAL project was announced in 2015 and although the details of that plan have changed considerably, Cray didn't slow down its timeline for Shasta. Read more…

By Tiffany Trader

TACC Wins Next NSF-funded Major Supercomputer

July 30, 2018

The Texas Advanced Computing Center (TACC) has won the next NSF-funded big supercomputer beating out rivals including the National Center for Supercomputing Ap Read more…

By John Russell

IBM at Hot Chips: What’s Next for Power

August 23, 2018

With processor, memory and networking technologies all racing to fill in for an ailing Moore’s law, the era of the heterogeneous datacenter is well underway, Read more…

By Tiffany Trader

Requiem for a Phi: Knights Landing Discontinued

July 25, 2018

On Monday, Intel made public its end of life strategy for the Knights Landing "KNL" Phi product set. The announcement makes official what has already been wide Read more…

By Tiffany Trader

House Passes $1.275B National Quantum Initiative

September 17, 2018

Last Thursday the U.S. House of Representatives passed the National Quantum Initiative Act (NQIA) intended to accelerate quantum computing research and developm Read more…

By John Russell

CERN Project Sees Orders-of-Magnitude Speedup with AI Approach

August 14, 2018

An award-winning effort at CERN has demonstrated potential to significantly change how the physics based modeling and simulation communities view machine learni Read more…

By Rob Farber

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

By John Russell

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
NVIDIA @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

US Leads Supercomputing with #1, #2 Systems & Petascale Arm

November 12, 2018

The 31st Supercomputing Conference (SC) - commemorating 30 years since the first Supercomputing in 1988 - kicked off in Dallas yesterday, taking over the Kay Ba Read more…

By Tiffany Trader

At SC18: AMD Sets Up for Epyc Epoch

November 16, 2018

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

TACC’s ‘Frontera’ Supercomputer Expands Horizon for Extreme-Scale Science

August 29, 2018

The National Science Foundation and the Texas Advanced Computing Center announced today that a new system, called Frontera, will overtake Stampede 2 as the fast Read more…

By Tiffany Trader

HPE No. 1, IBM Surges, in ‘Bucking Bronco’ High Performance Server Market

September 27, 2018

Riding healthy U.S. and global economies, strong demand for AI-capable hardware and other tailwind trends, the high performance computing server market jumped 28 percent in the second quarter 2018 to $3.7 billion, up from $2.9 billion for the same period last year, according to industry analyst firm Hyperion Research. Read more…

By Doug Black

Intel Announces Cooper Lake, Advances AI Strategy

August 9, 2018

Intel's chief datacenter exec Navin Shenoy kicked off the company's Data-Centric Innovation Summit Wednesday, the day-long program devoted to Intel's datacenter Read more…

By Tiffany Trader

Germany Celebrates Launch of Two Fastest Supercomputers

September 26, 2018

The new high-performance computer SuperMUC-NG at the Leibniz Supercomputing Center (LRZ) in Garching is the fastest computer in Germany and one of the fastest i Read more…

By Tiffany Trader

Houston to Field Massive, ‘Geophysically Configured’ Cloud Supercomputer

October 11, 2018

Based on some news stories out today, one might get the impression that the next system to crack number one on the Top500 would be an industrial oil and gas mon Read more…

By Tiffany Trader

Nvidia’s Jensen Huang Delivers Vision for the New HPC

November 14, 2018

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can do. Animated. Backstopped by a stream of data charts, product photos, and even a beautiful image of supernovae... Read more…

By John Russell

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This