After seven years of faithful service, and a long reign as the United States’ fastest supercomputer, the Cray XK7-based Titan supercomputer at the Oak Ridge Leadership Computing Facility (OLCF) will be decommissioned on August 1.
Leveraging 18,688 AMD Opteron CPUs and 18,688 Nvidia K20 GPUs, Titan hit a peak performance of nearly 18 Linpack petaflops, dethroning Lawrence Livermore National Laboratory’s Sequoia to take the top spot on the November 2012 Top500 list. An upgrade begun in late 2011 transformed the Cray XT5 “Jaguar” system into the Cray XK6 Titan, providing a 10-fold boost in performance, while consuming only 14.6 percent more energy.
“Choosing a GPU-accelerated system was considered a risky choice,” said OLCF Program Director Buddy Bland. “A DOE independent project review committee insisted that we demonstrate that our users would be able to effectively use Titan for the broad range of modeling and simulation applications we support. We spent 6 months working with Cray, Nvidia, and our users to convince the reviewers, DOE, and ourselves that GPUs would deliver what we needed. Yes, there was risk, but we developed effective ways to manage the risks and educate both our staff and users in how to use the system. The result has been a remarkably productive system that has led the way for many GPU-accelerated systems.”
Titan was a workhorse for DOE scientific computing and held strong among the Top500’s top 10 until last month, when it placed 12th. Titan was the United States’ leading supercomputer from its debut in November 2012 until June 2018, when Titan was eclipsed at Oak Ridge by Summit, the IBM-Nvidia built machine that is currently the world’s number-one ranked supercomputer with 148 Linpack petaflops.
“Titan has run its course,” said Operations Manager Stephen McNally. “The components of Titan are now seven years old, and it’s really impressive that users have been successfully producing high-impact science results since the system became available to them. But the reality is, in electronic years, Titan is ancient. Think of what a cell phone was like seven years ago compared to the cell phones available today. Technology advances rapidly, including supercomputers.”
The decommissioning process for a supercomputer at Titan’s scale requires careful planning and collaboration.
“We’ve communicated shutdown deadlines to users so they can be prepared while still getting high-quality research done,” McNally said. “One big task for users has been cleaning up 32 petabytes of data and moving data from [Titan’s Atlas file system] to other storage systems.”
Once its jobs are done and the data has been transferred, electricians will shut down the 9 megawatt system – and the components will be recycled.
“People ask why we can’t split up Titan and donate sets of cabinets to different research groups, but the answer is that it’s simply not worth the cost to a data center or university of powering and cooling even fragments of Titan,” McNally said. “Titan’s value lies in the system as a whole.”
Titan’s decommissioning is part of a 20,000-square foot retrofit of datacenter space, which will make way for OLCF’s planned 2021 exascale system: Frontier, another Cray-AMD partnering.
This article relies on a report by Katie Elyce Jones with the Oak Ridge Leadership Computing Facility.