Blue Waters Opts Out of TOP500

By Tiffany Trader

November 16, 2012

The NCSA Blue Waters system is one of the fastest supercomputers in the world, but it won’t be appearing on the TOP500 list – nor will it be taking part in the HPC Challenge (HPCC) awards. While it’s generally understood that there are an unknown number of classified and commercial systems that don’t show up on the list, this is the first time an open science system has opted out in such a fashion.

According to the folks at the National Center for Supercomputing Applications (NCSA), there’s a good reason for this. In the days leading up to the 24th annual Supercomputing Conference (SC12) in Salt Lake City, HPCwire spoke with Blue Waters Project Director Bill Kramer to find out what went into this decision.

HPCwire: How long has Blue Waters been up and running? Would there have been enough time to run Linpack benchmark and submit to the TOP500 list?

Bill Kramer: Oh sure, and we would have had good results if we had chosen to run it. We even had an early science system that was a resource in the US academic world going back to January last year, and we chose not to submit that for the June list.

The system has been up and running full-scale applications in test mode and debugging and scaling platforms and so on from mid-summer on, and particularly since Linpack is such a simple test and does not require I/O, we had plenty of time to run the test.

In fact we have run the test across the entire system and the HPCC test as well, so this was a very conscious decision not to do it – it does not reflect any problems or issues.

HPCwire: Did you get the results you would have expected and are you going to release them?

Kramer: We don’t see any reason to publicize it, but there were requirements in the contract. These tests obtained very good results, but we’d rather exercise the system with real applications. For example, there are some full-scale science codes that have run over 25,000 nodes for multiple days, and they’re actually doing a science problem as opposed to a trivial problem.

We’d much rather use real applications with all the I/O and everything else in there to vet the system and accomplish a real result along the way and those are at least as stressful on the system as Linpack would be because they exercise all parts of the system not just the floating point units. Our focus is reflecting what the real scientists do not a very small subset of what some teams do.

HPCwire: So the contract with Cray did specify Linpack?

Kramer: HPCC was specified [editor’s note: HPCC includes Linpack], and that was one of hundreds of points – all of the others are much more relevant tests. For historical purposes, that was in there from the original NSF release, so we are meeting that, but it’s not relevant to whether the system is a quality system for sustained performance.

HPCwire: Are you releasing the HPCC results?

Kramer: No, and for the same reason. It’s better, but still doesn’t really reflect what to expect for real sustained performance for real applications. It’s better because it has multiple categories, but HPCC still lacks anything that has to do what to do with I/O, which is one of the major bottlenecks, so testing interconnect and testing memory performance.

Our challenge is not with Linpack as a benchmark and not with having a list, our concern is using a very simplified benchmark that has value in its own right, but not for the purpose of indicating usefulness of the system, or productivity of the system or effectiveness of the system.

HPCwire: How and when was the decision arrived at?

Kramer: Our entire project focus has been on sustained petascale performance, and it’s not one-dimensional, it’s not peak performance, it’s not Linpack performance – it’s performance for sustained real-world applications. If you go back to the original NSF solicitation, they encapsulated that into a set of six applications that they projected far forward to the challenging scientific problems that required this type of system and they set their metric to solving that problem within a certain amount of wall-clock time.

Going back to the very beginning, the philosophical nature of how this project came to be was all about delivering effective petascale computing. The investment strategy was to have a very large amount of memory, a very large amount of storage rather than trying to obtain a high single metric.

As we progressed, we have with National Science Foundation and many reviews developed a much more meaningful metric from our point of view called the Sustained Petascale Performance (SPP) test. The way we crafted that was by going to the science teams that we know and have been working with on the system and getting their real applications and their real science problems and using those as the measure of performance.

There are 12 application combinations that we are using to establish the performance of the system over a sustained petaflop in addition to the original NSF six applications. So we are actually going back to first principles: what are the scientists trying to do and making sure they’re able to do their required work within a reasonable amount of elapsed time.

The other part of this is enabling a diverse science base. The NSF, computational and data analytics community have a diverse portfolio of science, arguably the most diverse, and that diverse portfolio requires systems that perform well on that wide range of codes.

That’s really what our measures are and that’s what we remain focused on, so the decision to not list it is very consistent with what the project’s been about and what NSF’s goals have been going back to day one. The decision was made well before we needed to do any work to even submit the early system back in last January. It’s been a long–term process; it was made mutually by the university and NSF as being the right thing to do for the real goals of our project, and we’re very comfortable with it.

Next >>

HPCwire: Do you think we need a ranking system?

Kramer: I think lists are good, and I think as a focused, purposed benchmark, Linpack is good. I think the TOP500 list, though, combines those two things in a way that was interesting at some point, a while ago, but that now in some ways may be doing detriment to the community.

I have no trouble with lists and I think actually the community needs some idea of how we’re progressing, but we really need to be clear on what these lists mean, so for example, for much of the high-level systems on TOP500, what really determines how high they are is how much money is spent, not how well they perform on real applications.

There have been systems that never really get out to perform on real applications, but are on the list. There are ways to submit systems well before they are able to run many scientific or engineering applications. The historical nature of the list is perturbed by those other attributes and maybe those are what the lists measure. I can say for sure it doesn’t measure the progress in real sustained performance because there’s a severe disconnect between what the list says and what real sustained performance measures indicate.

HPCwire: Do we need something new or could we improve our current metrics to your satisfaction?

Kramer: I think there are ways to improve on relevance under the Linpack measurement. The people who put together the original list and maintain the list also talk about these things. Everybody’s afraid to take the first step. In the hallways everybody talks about the issues and the risks for misinterpretation for people who are not in our community, but then everyone says, “but I have to do it.”

Well we’re fortunate enough that we don’t have to do it, and we’re talking the first step by saying this is enough, we need to go to do something else. We are committed to working with others in the community to come up with a better way to describe how effective supercomputing is for solving unsolvable problems and that’s really the important thing.

HPCwire: If the benchmarks are very complex or we have too many of them, is that practical for a wide range of systems?

Kramer: Yes, I’m convinced it is. The NAS parallel benchmarks were very effective in their time. I’m not saying that they’re the right ones now, but in their time period, for a decade or so… There were eight tests that everybody ran. They were pseudo-applications; they didn’t have I/O in them for example, and I/O was less of a challenge in those days, but they gave you a much better picture of what you could expect out of systems.

Other benchmark suites that have between 8–12 tests are being used. The DoD has a pretty good suite that represents a reasonable workload. NERSC has a good persistence suite that has evolved over time, but I think there are enough proofs of existence that yes, you can have a much more dynamic set of things. HPCC might be a place to go leverage with those codes, but that’s also still difficult to figure how it translates into real world applications and how much you can get out of that.

If you look at the graph of real measured performance, say with the NERSC suite of codes, and look at that through 15 years of history and you look at the TOP500 lists, you see that there’s a strong disconnect between what really is achievable with systems and what the list says.

The list also correlates with the amount of funding available to pay for things. The challenges that bottleneck real performance are not being addressed. So I think yes, you can craft those processes in a tractable amount of time that is portable and expandable and that’s been done several different ways.

Next >>

HPCwire: Who are you directing this statement at? What outcome are you hoping for?

Kramer: Blue Waters is a leader in the community in many different ways, and this was another way we felt we could lead to get a more explicit dialogue going in the community about whether this is the way we want to use our metric for say exascale computing and whether this is still relevant.

HPCwire: What about push-back, both in general and your vendors, Cray and NVIDIA?

Kramer: We’ve been very clear with all of our partners and others who may have been partners, that spending tremendous effort to get a number on a list is not indicative of what’s really important to the project is not our priority so we’ve been very open with the partners and they have no objection to this.

HPCwire: In an article on the NCSA website, you write that “the TOP500 list and its associated Linpack have multiple serious problems,” and you’ve covered some of those already, would you like to highlight the ones you feel are most problematic?

Kramer: The main concerns are that it does not give an indication of value and particularly it doesn’t give an indication of value for sustained performance. Value is really the potential of a system to do work divided by its cost, so you can’t tell anything about the value; all you can tell is if you spend a lot of money on a system, you can get up high on the list.

Blue Waters is a project that is spending a significant amount of money, but it’s going into a very balanced system, not one that could have high FLOPS rates. I can tell you that if we had put all our money into peak performance and Linpack, we would have been number one on the list, for sure, for awhile.

If I had not done the investment in the world’s largest memory or the world’s most intense storage system, and just said I want to have the most number of peak FLOPS that directly translate into Linpack FLOPS that directly translates to this number and I don’t care about how hard it is for the science community to make use of those and how many science projects get disenfranchised because they’re not able to use GPUs at scale for a while, then we easily could have been on the top of the list for a number of cycles.

But that’s not our mission. It’s not what we designed our system for and it’s not what many people design their systems for. It could have led to a very poor choice for the real mission by paying attention to where the position is on the TOP500 list.

There are other aspects: the fact that you spend an awful lot of effort on getting something to work that you use once and throw away essentially all that effort. Some places have had to spend multiple weeks or months trying to get a number instead of doing science and engineering.

The improvements that we’re going to make to these SPP codes are actually improvements that go back to the science teams, so it’s a permanent improvement rather than a lot of that effort just going into a test case. It’s not a good way of allocating resources because you can’t reuse those resources.

HPCwire: Why now?

Kramer: The algorithmic space, the application space has changed dramatically from when the major implementation issues were dense linear algebra. There are many more things that are at least as important if not more important now in the way that systems are designed and what we’re trying to deal with.

Many methods have gone to sparse rather than dense, for example. As an indicator of what is really important in a system – we’re saying it’s time to relook at that and it’s not in the mission of our project to continue in that mode.

Last year at Supercomputing, there was a theme of sustained-performance and there were many parties that took part in this discussion. There were panel sessions and papers, etc. and this year, we hope we’ll be able to start the dialogue about how we do a better job of metrics that we can easily explain, but are much more much more meaningful for the real missions of our HPC systems.

Maybe by SC13 there’s a way to report back to the community – a better way that parts of the community, or hopefully the whole community, can say … after 20 years of doing it this way it’s time to do something different.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Simulating Car Crashes with Supercomputers – and Lego

October 18, 2019

It’s an experiment many of us have carried out at home: crashing two Lego creations into each other, bricks flying everywhere. But for the researchers at the General German Automobile Club (ADAC) – which is comparabl Read more…

By Oliver Peckham

NASA Uses Deep Learning to Monitor Solar Weather

October 17, 2019

Solar flares may be best-known as sci-fi MacGuffins, but those flares – and other space weather – can have serious impacts on not only spacecraft and satellites, but also on Earth-based systems such as radio communic Read more…

By Oliver Peckham

Federated Learning Applied to Cancer Research

October 17, 2019

The ability to share and analyze data while protecting patient privacy is giving medical researchers a new tool in their efforts to use what one vendor calls “federated learning” to train models based on diverse data Read more…

By George Leopold

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

NSB 2020 S&E Indicators Dig into Workforce and Education

October 16, 2019

Every two years the National Science Board is required by Congress to issue a report on the state of science and engineering in the U.S. This year, in a departure from past practice, the NSB has divided the 2020 S&E Read more…

By John Russell

AWS Solution Channel

Making High Performance Computing Affordable and Accessible for Small and Medium Businesses with HPC on AWS

High performance computing (HPC) brings a powerful set of tools to a broad range of industries, helping to drive innovation and boost revenue in finance, genomics, oil and gas extraction, and other fields. Read more…

HPE Extreme Performance Solutions

Intel FPGAs: More Than Just an Accelerator Card

FPGA (Field Programmable Gate Array) acceleration cards are not new, as they’ve been commercially available since 1984. Typically, the emphasis around FPGAs has centered on the fact that they’re programmable accelerators, and that they can truly offer workload specific hardware acceleration solutions without requiring custom silicon. Read more…

IBM Accelerated Insights

How Do We Power the New Industrial Revolution?

[Attend the IBM LSF, HPC & AI User Group Meeting at SC19 in Denver on November 19!]

Almost everyone is talking about artificial intelligence (AI). Read more…

What’s New in HPC Research: Rabies, Smog, Robots & More

October 14, 2019

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

NSB 2020 S&E Indicators Dig into Workforce and Education

October 16, 2019

Every two years the National Science Board is required by Congress to issue a report on the state of science and engineering in the U.S. This year, in a departu Read more…

By John Russell

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Summit Simulates Braking – on Mars

October 14, 2019

NASA is planning to send humans to Mars by the 2030s – and landing on the surface will be considerably trickier than landing a rover like Curiosity. To solve Read more…

By Staff report

Trovares Drives Memory-Driven, Property Graph Analytics Strategy with HPE

October 10, 2019

Trovares, a high performance property graph analytics company, has partnered with HPE and its Superdome Flex memory-driven servers on a cybersecurity capability the companies say “routinely” runs near-time workloads on 24TB-capacity systems... Read more…

By Doug Black

Intel, Lenovo Join Forces on HPC Cluster for Flatiron

October 9, 2019

An HPC cluster with deep learning techniques will be used to process petabytes of scientific data as part of workload-intensive projects spanning astrophysics to genomics. AI partners Intel and Lenovo said they are providing... Read more…

By George Leopold

Optimizing Offshore Wind Farms with Supercomputer Simulations

October 9, 2019

Offshore wind farms offer a number of benefits; many of the areas with the strongest winds are located offshore, and siting wind farms offshore ameliorates many of the land use concerns associated with onshore wind farms. Some estimates say that, if leveraged, offshore wind power... Read more…

By Oliver Peckham

Harvard Deploys Cannon, New Lenovo Water-Cooled HPC Cluster

October 9, 2019

Harvard's Faculty of Arts & Sciences Research Computing (FASRC) center announced a refresh of their primary HPC resource. The new cluster, called Cannon after the pioneering American astronomer Annie Jump Cannon, is supplied by Lenovo... Read more…

By Tiffany Trader

Supercomputer-Powered AI Tackles a Key Fusion Energy Challenge

August 7, 2019

Fusion energy is the Holy Grail of the energy world: low-radioactivity, low-waste, zero-carbon, high-output nuclear power that can run on hydrogen or lithium. T Read more…

By Oliver Peckham

DARPA Looks to Propel Parallelism

September 4, 2019

As Moore’s law runs out of steam, new programming approaches are being pursued with the goal of greater hardware performance with less coding. The Defense Advanced Projects Research Agency is launching a new programming effort aimed at leveraging the benefits of massive distributed parallelism with less sweat. Read more…

By George Leopold

Cray Wins NNSA-Livermore ‘El Capitan’ Exascale Contract

August 13, 2019

Cray has won the bid to build the first exascale supercomputer for the National Nuclear Security Administration (NNSA) and Lawrence Livermore National Laborator Read more…

By Tiffany Trader

AMD Launches Epyc Rome, First 7nm CPU

August 8, 2019

From a gala event at the Palace of Fine Arts in San Francisco yesterday (Aug. 7), AMD launched its second-generation Epyc Rome x86 chips, based on its 7nm proce Read more…

By Tiffany Trader

Ayar Labs to Demo Photonics Chiplet in FPGA Package at Hot Chips

August 19, 2019

Silicon startup Ayar Labs continues to gain momentum with its DARPA-backed optical chiplet technology that puts advanced electronics and optics on the same chip Read more…

By Tiffany Trader

Using AI to Solve One of the Most Prevailing Problems in CFD

October 17, 2019

How can artificial intelligence (AI) and high-performance computing (HPC) solve mesh generation, one of the most commonly referenced problems in computational engineering? A new study has set out to answer this question and create an industry-first AI-mesh application... Read more…

By James Sharpe

D-Wave’s Path to 5000 Qubits; Google’s Quantum Supremacy Claim

September 24, 2019

On the heels of IBM’s quantum news last week come two more quantum items. D-Wave Systems today announced the name of its forthcoming 5000-qubit system, Advantage (yes the name choice isn’t serendipity), at its user conference being held this week in Newport, RI. Read more…

By John Russell

Chinese Company Sugon Placed on US ‘Entity List’ After Strong Showing at International Supercomputing Conference

June 26, 2019

After more than a decade of advancing its supercomputing prowess, operating the world’s most powerful supercomputer from June 2013 to June 2018, China is keep Read more…

By Tiffany Trader

Leading Solution Providers

ISC 2019 Virtual Booth Video Tour

CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
GOOGLE
GOOGLE
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
VERNE GLOBAL
VERNE GLOBAL

A Behind-the-Scenes Look at the Hardware That Powered the Black Hole Image

June 24, 2019

Two months ago, the first-ever image of a black hole took the internet by storm. A team of scientists took years to produce and verify the striking image – an Read more…

By Oliver Peckham

Intel Confirms Retreat on Omni-Path

August 1, 2019

Intel Corp.’s plans to make a big splash in the network fabric market for linking HPC and other workloads has apparently belly-flopped. The chipmaker confirmed to us the outlines of an earlier report by the website CRN that it has jettisoned plans for a second-generation version of its Omni-Path interconnect... Read more…

By Staff report

Crystal Ball Gazing: IBM’s Vision for the Future of Computing

October 14, 2019

Dario Gil, IBM’s relatively new director of research, painted a intriguing portrait of the future of computing along with a rough idea of how IBM thinks we’ Read more…

By John Russell

Kubernetes, Containers and HPC

September 19, 2019

Software containers and Kubernetes are important tools for building, deploying, running and managing modern enterprise applications at scale and delivering enterprise software faster and more reliably to the end user — while using resources more efficiently and reducing costs. Read more…

By Daniel Gruber, Burak Yenier and Wolfgang Gentzsch, UberCloud

Intel Debuts Pohoiki Beach, Its 8M Neuron Neuromorphic Development System

July 17, 2019

Neuromorphic computing has received less fanfare of late than quantum computing whose mystery has captured public attention and which seems to have generated mo Read more…

By John Russell

Rise of NIH’s Biowulf Mirrors the Rise of Computational Biology

July 29, 2019

The story of NIH’s supercomputer Biowulf is fascinating, important, and in many ways representative of the transformation of life sciences and biomedical res Read more…

By John Russell

Quantum Bits: Neven’s Law (Who Asked for That), D-Wave’s Steady Push, IBM’s Li-O2- Simulation

July 3, 2019

Quantum computing’s (QC) many-faceted R&D train keeps slogging ahead and recently Japan is taking a leading role. Yesterday D-Wave Systems announced it ha Read more…

By John Russell

With the Help of HPC, Astronomers Prepare to Deflect a Real Asteroid

September 26, 2019

For years, NASA has been running simulations of asteroid impacts to understand the risks (and likelihoods) of asteroids colliding with Earth. Now, NASA and the European Space Agency (ESA) are preparing for the next, crucial step in planetary defense against asteroid impacts: physically deflecting a real asteroid. Read more…

By Oliver Peckham

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This