DOE Supercomputing Aims for 100-200 PFLOPS in 2017
In a recent post on Atomic City Underground, Frank Munger examines the progress made by the CORAL project, which HPCwire detailed last year. CORAL – which stands for Collaboration of Oak Ridge, Argonne and Livermore – formed so that Oak Ridge National Laboratory (ORNL), Argonne (ANL) and Lawrence Livermore (LLNL) – could combine forces when purchasing their next major supercomputing installations.
In the next 2-3 years, all three Department of Energy (DOE) centers will be seeking to deploy their first 100-plus petaflop systems. The collaboration enables the labs to combine experience and buying power. The three-way partnership includes about 100 experts, who will be participating in the acquisition process. The systems are expected to carry a hefty price tag of about $125 million, which will buy about 100-200 petaflops of computing power. According to the December 2012 Request for Information (RFI), “the expectation is that the proposed 2016-2017 system will be roughly an order of magnitude less in time-to-solution than today’s systems at our facilities.”
Frank Munger spoke with Buddy Bland, director of the Oak Ridge Leadership Computing Facility (OLCF) at Oak Ridge National Laboratory and the project director for the Oak Ridge side of CORAL, for an update.
CORAL was initiated because all three labs will be in the market for next-generation so-called “pre-exascale” supercomputers at about the same time. ORNL will be looking to replace Titan, Argonne will want a follow-on to Mira, and Livermore will be preparing for Sequoia’s successor. The DOE’s Office of Science and the National Nuclear Security Administration are helping to coordinate the project.
CORAL will foster two unique architecture choices. Most likely, the two solutions will be backed by two different vendors, but there’s a chance one company could make two options available, notes Bland. “We’re going to see what the marketplace will bring,” he told Munger.
IBM and Cray are the likely candidates. ORNL has a longstanding relationship with Cray as the vendor for both Jaguar and Titan, but that doesn’t automatically guarantee that Cray will win the next contract. As Bland notes, this is an open competition.
While the Cray partnership has worked out very well for ORNL, Bland notes that Argonne and Livermore have had similarly good experiences with IBM.
To help offset the the research and development costs associated with these next-generation petascale machines, CORAL is providing NRE (non-recurring engineering) money to the tune of $25 million to each of the primary vendors. The contracts will be issued by Lawrence Livermore with input from the other two labs. The funding is there to help “accelerate technology, improve capabilities, improve application performance, and lower the total cost of ownership of the delivered systems.”
Both the NRE and build subcontracts will be decided on this calendar year, Bland said, and delivery is anticipated in the 2017 timeframe.
The director of OLCF also said that the new Oak Ridge supercomputer – expected to be about the same size as Titan – will not require a new building, nor will it require the Titan system to be disassembled. It will, however, require upgrades for electrical power and chilled water.
The plan is for the new supercomputer to be housed in the bottom floor of the annex that was constructed at the back of Building 5600 a few years back – a space that has never been occupied.
Titan will continue to operate through 2017, according to Bland. While it will no longer be the world’s second fastest computer at that time, it should still be a valuable resource.
HPCwire covered the initial CORAL announcement here.