February 27, 2013

World’s Fastest Supercomputer Hits Speed Bump

Tiffany Trader

Reining TOP500 champ, Titan, is not performing as expected. Jeff Nichols, head of Oak Ridge National Laboratory’s scientific computing division, told Knoxville News that the massive supercomputer encountered technical issues that halted the final acceptance test.

This means that the DOE’s Oak Ridge National Laboratory (ORNL) won’t yet be taking official ownership of the $100 million dollar machine, and payments to Cray will be put on hold.

On the bright side, the problem has been identified and both parties are working on a solution.

“We’ve found a few bugs that have held us back,” Nichols said, “and we’re doing some repair work with Cray in order to get the stability tests where we want them to be.”

The problems were traced to the interconnect fabric that enables the CPU and GPU components to communicate. The CPU-side of this hybrid supercomputer is operational, but applications that call on GPUs have encountered sporadic faults. ORNL is sending back sections of the system to Cray on a rotating basis for repair.

Even with these issues, Titan came close to meeting the goals for a successful acceptance test. A passing score is awarded for completing 95 percent of the jobs in the test, and the Cray supercomputer came in at 92-93 percent, only a few percentage points shy.

From what Nichols told Knoxville News, the issues sound more like a speed bump as opposed to a fatal flaw. Nichols expects final acceptance of Titan to be delayed no more than a month or two at most. He believes that once the connecters are repaired, the rest of the process should be a “slam dunk.”

Despite recent setbacks, Titan passed initial testing in time for the November 2012 TOP500 list. This 27-petaflops (peak) Cray XK7 scored 17.59 petaflops on the Linpack benchmark, earning it bragging rights as the “world’s fastest supercomputer.”

The DOE’s Oak Ridge National Laboratory describes Titan as “the world’s most powerful supercomputer for open science with a theoretical peak performance exceeding 20 petaflops (quadrillion calculations per second).” This unprecedented level of power opens up a new possibilities for ground-breaking research, including complex climate change models and sophisticated nuclear reactor simulations.

Share This