Nvidia
NCSA
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

TOP500: The Missing Puzzle Pieces


[Editor's Note: This article was modified to address a comment from Oak Ridge National Laboratory. It no longer says that the Titan supercomputer "failed" its acceptance test. The comment from ORNL is at the end of the article.]

 

According to recent news reports, Titan, the Cray XK7 located at the Department of Energy’s Oak Ridge National Laboratory and currently sitting at the top of the Top500 List, “isn’t working like it should”. This has puzzled many folks in the HPC community. How could Titan win the Top500 race last November, but be reported in February to have “bugs” that have “prevented users from getting access to the full Titan so far“? Some details are beginning to emerge, and the folks at Oak Ridge do expect Titan to pass its acceptance testing after Cray finishes repairing it. However, this situation does serve to raise some interesting questions about the Top500 List - and, in particular, about some pieces of the Top500 puzzle that are requested by the List keepers but are absent from the List. Let’s take a closer look.

Top500 Project

The Top500 Project is very clear about its objectives and the methodology it uses to accomplish them: 

The main objective of the TOP500 list is to provide a ranked list of general purpose systems that are in common use for high end applications.

As a yardstick of performance we are using the `best' performance as measured by the LINPACK Benchmark. LINPACK was chosen because it is widely used and performance numbers are available for almost all relevant systems.

The benchmark used in the LINPACK Benchmark is to solve a dense system of linear equations. For the TOP500, we used that version of the benchmark that allows the user to scale the size of the problem and to optimize the software in order to achieve the best performance for a given machine.

Since the problem is very regular, the performance achieved is quite high, and the performance numbers give a good correction of peak performance.

By measuring the actual performance for different problem sizes n, a user can get not only the maximal achieved performance Rmax for the problem size Nmax but also the problem size N1/2 where half of the performance Rmax is achieved. These numbers together with the theoretical peak performance Rpeak are the numbers given in the TOP500.

Next>>

They’ve been collecting data for more than 20 years and, through the Top500 Lists, they provide a valuable information resource to the HPC community. 

Some relevant data for the top ten computers on the Fall 2012 Top500 List are presented in the Table 1 below. 

Table 1 – Selected Data from the November 2012 Top500 List

 

Missing Puzzle Pieces

Acceptance Testing

The conditions for the Top500 competition are spelled out in its Call for Participation. Among them are these statements (the emphases are ours): 

The authors of the Top500 reserve the right to independently verify submitted LINPACK results, and exclude systems from the list which are not valid or not general purpose in nature. By general purpose system we mean that the computer system must be able to be used to solve a range of scientific problems. Any system designed specifically to solve the LINPACK benchmark problem or have as its major purpose the goal of a high Top500 ranking will be disqualified.

The systems in the Top500 list are expected to be persistent and available for use for an extended period of time. Any system assembled to run a LINPACK benchmark only, and set up specifically to gain an entry in the Top500 will be excluded from the list. The TOP500 authors will reserve the right to deny inclusion in the list if it is suspected that the system violates these conditions. 

If a general purpose computing system hasn't successfully completed its acceptance testing, the rules of the Top500 competition could be interpreted as precluding it from the competition.  So, it’s probably worth waiting until after acceptance to compete.  Otherwise, the case that a computer system is “general purpose” and “persistent and available for use for an extended period of time” would appear to be weak.

Next>>

Unreported Data

Recall that, as cited above, the objectives of the Top500 Project include reporting not only Rmax but also Nmax, the problem size where Rmax is achieved, and Nhalf, where half of the performance Rmax is achieved. These numbers, together with the theoretical peak performance Rpeak, are to be reported in the TOP500 List.

From Table 1, we see that all top ten systems have reported their Rmax. Since without this datum there is no basis for being on the List, this is not a surprise. What is surprising however is that four of the top ten systems do not show an entry for Nmax and nine of the top ten have no entry for Nhalf. Note that among those reporting neither value is the number one system: Titan.

This begs a couple of obvious questions:

If the missing data were not reported, why were those systems included in the Top500 List?

If the missing data were reported, why are they not disclosed in the Top500 List?

If the answers have something to do with “confidentiality”, we note that all of the top ten systems appear to have been acquired with public money – and complete reporting and disclosure are clearly in the public interest.

Furthermore, incomplete reporting and/or disclosure serve to limit the utility of the Top500 List and erode public confidence in it. Given the sustained value that the List has provided over the past couple of decades, this would be a shame.

Time to Completion

Computers are for solving problems – not just running fast. Even in automobile racing it’s not just about maximum speed – it’s about crossing the finish line first. So, wouldn’t it be a good idea to add a couple of data points to the Top500 List:

Tmax – the time required to complete the Linpack Rmax run

Thalf – the time required to complete the Linpack Rhalf run

In fact, we suspect that some folks in the HPC community would be more interested in these numbers than in the maximum speed ones.

We strongly suspect that Tmax and Thalf data are available for the machines on the current Top500 List. Some number of people in our HPC community have these numbers (you know who you are  ). So, how about providing them – and also filling in the blanks in the Rmax and Rhalf columns?

To seed the process of supplementing the List, we’ve provided Table 2 below. In it we’ve included some anecdotal and unverified – but presumed roughly accurate – data for a few of the top ten systems. The times listed are given in hours. If you can improve on these rough estimates or fill in any of the other blanks, please send us the data.

Next>>

Table 2 – Supplementary Data for the November 2012 Top500 List

Going Forward

The next Top500 List is scheduled to be released in June at the International Supercomputing Conference in Leipzig, Germany. The submission deadline is May 18th. By ensuring that: all competing systems have passed their acceptance tests; all data traditionally disclosed are complete; and perhaps adding the Tmax and Thalf data, the next release of Top500 List could be made even more valuable to the HPC community.

 

Postscript

As noted in comments below, the times to completion, while not explicitly included in the published lists, may be calculated from Rmax and Nmax as follows: 

Tmax = 2.0/3.0 * Nmax^3 / (Rmax * 10^9)

This yields Tmax in seconds. Table 3 below shows the results of this calculation, with Tmax converted to hours. Note that, since some of the Nmax data is approximate, so are the corresponding Tmax calculations. As mentioned above, you are invited to fill in the blanks and correct any errors you may find in this Table.

Table 3 – Supplementary Data Calculated for the November 2012 Top500 List

 

 

Comment from ORNL:

We are writing to address a factual error in Gary Johnson’s February 27th article in HPCwire “Top500: The Missing Puzzle Pieces.”  In his article, Mr. Johnson states that “Titan, the Cray XK7 sitting at the top of the current TOP500 List, recently failed its acceptance test…”  This statement  is incorrect.  Titan has not yet completed the full suite of acceptance tests but has successfully passed both the functionality and performance phases of acceptance testing.  Moreover, Titan is within 1% of passing its stability test, the last component of the acceptance test suite. The original project schedule called for fully completing acceptance testing by June of 2013, a schedule we expect to meet. And, as we proceed through this complex testing procedure, users are making productive use of the system.

Thank you for the opportunity to correct the record.

James J. Hack

National Center for Computational Sciences, Director

Arthur S. Bland

Oak Ridge Leadership Computing Facility, Project Director

 

Related Articles

World's Fastest Supercomputer Hits Speed Bump

Experts Discuss the Future of Supercomputers

Waiting for Exascale

Titan Knocks Off Sequoia as Top Supercomputer

DOE Labs Set Records with IBM Blue Gene/Q

Podcast: Accelerator Triple Play; TOP500 Results

Sponsored Links

High-Performance Computing in Action
Businesses that want to be on the cutting edge of their industries are increasingly turning to high-performance computing (HPC) solutions to handle complex compute processes and speed up their rate of innovation. Download this Executive Brief to see how businesses in energy, life sciences and entertainment put HPC solutions to work in their operations.

Accelerate your science with Seneca
One of the first HPC providers installing a 4X NVIDIA Kepler K-20 cluster. Invites you to a free evaluation on Seneca’s NVIDIA K20 Kepler cluster, pre-loaded with AMBER, NAMD, LAMMPS

Webinar: Programming Heterogeneous X64+GPU Systems Using OpenACC
Join Michael Wolfe as he compares the advantages and costs of using both low-level models and the directive-based OpenACC model for programming accelerated heterogeneous systems. Registration is free.

May 21, 2013

May 20, 2013

May 17, 2013

May 16, 2013

May 15, 2013

May 14, 2013

May 13, 2013

May 10, 2013

May 09, 2013


Most Read Features

Most Read Around the Web

Most Read This Just In


Short Takes

Running Computational Fluid Dynamics in the Cloud

May 16, 2013 | When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...

Computing the Physics of Bubbles

May 15, 2013 | Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...

Internet2 Awards Program Seeks Innovative Applications

May 10, 2013 | Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
Read more...

Floating Funding to Exascale Island

May 09, 2013 | The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

SGI DMF ZeroWatt Disk Solution

In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

SC12 Editorial Feature HPCwire Soundbite sponsored by ISC

HPC Job Bank


Featured Events


  • June 16, 2013 - June 20, 2013
    ISC'13
    Leipzig,
    Germany

  • June 17, 2013 - June 18, 2013
    Forecast 2013
    San Francisco, CA
    United States





HPCwire Events