The SC07 Cluster Challenge was held in conjunction with the SC07 conference in Reno, Nevada. The event sought to create an exhibition and competition in which teams of undergraduate students would compete in a demonstration of talent, technology and accessibility of entry-level supercomputing. The activity was intended to highlight the gains in hardware performance, ease of use of clusters and the power and availability of simulation software.
To whet peoples’ appetites, the Cluster Challenge Committee challenged that a half rack of a modern cluster would be competitive with the number one system on the TOP500 from only 10 years ago! In fact, the top Linpack score realized in the Challenge was 420 gigaflops, which would have made the TOP500 list only three years ago. This announcement was made during the TOP500 BoF session at SC07 and was met with loud cheers and applause from the attendees.
The rules of the Challenge were simple: a maximum of 26 amps max power (at 110 volts) were to be used, and no team member may have completed an undergraduate degree. Six teams and their vendor partners chose to compete:
- Stony Brook University + Dell.
- National Tsing Hua University + ASUSTek.
- University of Colorado + Aspen Systems.
- University of Alberta + SGI.
- Indiana University + Apple.
- Purdue University + HP.
Each team partnered with a vendor who loaned the equipment for the event and, in some cases, provided travel funds to the team. In most cases, the teams had access to the equipment for a few months, but some only had it for a few weeks prior to shipping. On Saturday, Nov. 10, teams arrived to find (or not, in some cases) their equipment waiting in the contest area. Saturday and Sunday were then spent rebuilding systems, optimizing power, and finalizing the benchmarks and applications for the start of the competition.
Teams were asked to run the HPC Challenge benchmarks at the beginning of the event at 8:00 PM on Monday. Once the results were submitted, teams were given access to data sets for the three previously announced applications: GAMESS, POP and POVRay. Teams then spent the remainder of their time — up to 4:00 PM on Wednesday — completing as many of the data sets as they could. At the end of the competition, judges interviewed the teams and awarded points based on this interaction. The judges were lead by Jack Dongarra (University of Tennessee and ORNL) and included Dona Crawford (LLNL), Satoshi Matsuoka (Tokyo Tech) and Tim Lyons (Morgan Stanley).
For about 44 straight hours, teams worked on the benchmarks and applications in shifts. About halfway through the event, around Tuesday at noon, there was a general power interruption to the section of Reno where the convention center is located. All teams experienced a hard crash and had to scramble to recover. The event couldn’t have asked for a more brutal real-life experience. Some of the teams lost upwards of ten hours of compute time and others lost hardware and time associated with debugging the failure. In the end, however, all teams were back online within a couple of hours and some chose to run with automatic checkpoint restart, as available in some of the applications, to protect against further interruption.
Ultimately, the winner of the contest was the team from The University of Alberta in Edmonton, Canada. While that team did not have the fastest system on paper, a combination of good preparation and good fortune during the brief power outage gave them the advantage.
The quality of teams, systems and computational work exceeded all expectations. While one could attribute this to individuals standing up to the challenge, the committee’s opinion is that a larger force is also at work. This is that the entry barrier to supercomputing has dropped significantly. The results show that if you have a need for simulation computing, it is reasonable to believe that you can use local college or university talent and commonly available software and applications to get started on that work.
Another significant outcome of the event is its impact on the curricula of the participating institutions. Half of these schools decided to modify their undergraduate offerings in the future to include cluster and parallel computing classes.
Computational simulation, driven by continued advances in hardware, availability and maturity of cluster OS software, and enabled by parallel application software, has reached a point where it is clearly accessible and available. These tools are now available to industry and we predict the technology will soon be considered critical to enhance competitiveness of businesses of all sizes and in all markets.
In closing, there are numerous people and organizations to credit for the event itself. Starting with the ACM and IEEE (sponsors of SC07), the individual team vendor partners (Dell, ASUSTek, Aspen Systems, SGI, Apple and HP) and the event partners (Chevron, WesternGeco and Morgan). The results are exciting and we are already planning for the next event at SC08, Nov. 15-21, in Austin, Texas.