June 5, 2014

Remembering Kraken

Tiffany Trader

In supercomputing, it is often the case that top systems are only around for about half a decade. With the breaking of the petascale barrier now fixed in the rearview mirror of 2008, we are starting to see some of those ground-breaking systems reach their retirement age. It happened with Roadrunner, the world’s first petaflopper, not too long ago, and at noon on April 30, 2014, it happened to Kraken, the first academic petaflop computer.

To mark this occasion and highlight the important contributions that Kraken enabled, Troy Baer, Senior HPC System Administrator at the National Institute for Computational Sciences, has written a few words about the “end of an era.”

“In total, Kraken ran 4.2 million jobs on behalf of 2,660 users in 1,121 projects over its lifetime, using 4.12 billion core-hours at an average utilization of 85.1%,” Baer shares. “Of these, the vast majority were from projects from the National Science Foundation’s TeraGrid and later XSEDE allocation process.”

Kraken is a 1.17-petaflop (peak) Cray XT5 system with 18,816 compute sockets, 112,896 compute cores, and more than 147 terabytes of memory. When it came online in November 2009, it was the third fastest computer in the world.

Kraken stood apart as being both a capacity machine and a capability machine. The capability side was a natural fit for the Cray XT5 architecture, yet Kraken was the also the primary NSF computing resource for the majority of its operational life. This meant that Kraken was called on to support a wide range of job types. Facilitating this flexibility was a unique bi-modal scheduling approach, championed by NICS’ first director, the late Dr. Phil Andrews.

“The default mode in Kraken’s bi-modal scheduling supported capacity computing; capability jobs were saved up and run one after another following maintenance windows or outages, using the Moab scheduler’s advanced reservation and trigger features,” Baer explains. “This approach allowed Kraken to reach an unheard-of ~90%+ utilization for over three years, and that approach has subsequently been adopted by other systems such as Georgia Tech’s Keeneland.”

Although Kraken has been officially retired, its login nodes and Lustre file system will be available to users through the end of August to allow data transfers to other systems. After August 27, 2014, it will no longer be accessible. The University of Tennessee, which managed Kraken, has dedicated a webpage to the system’s decommissioning.

Baer ends his eulogy with a reminder of all the other UT HPC resources that have benefitted from some of the same technology that went into Kraken and from the same support and user community, including “Nautilus, an SGI UV1000 system used for remote data analysis and visualization; Beacon, a Cray CS300 system used for porting and optimization of applications to Intel Xeon Phi coprocessors; Darter, a Cray XC30 system used for general purpose computing; and Thunderhead, a Cray GreenBlade system used as a cloud testbed.”