The National Center for Supercomputing Applications (NCSA) has awarded Cray a $188 million contract to complete the NSF-funded Blue Waters supercomputer project at the University of Illinois. An 11.5 petaflops Cray XE6/XK6 hybrid system outfitted with AMD CPUs and NVIDIA GPUs will be deployed next year and become the center’s petascale resource for open science and engineering. The much-anticipated deal was announced on Monday, just as the Supercomputing Conference (SC11) in Seattle got underway.
This is NCSA’s second shot at Blue Waters. In 2007, IBM was selected to build the petascale machine as part of the NSF Track 1 leadership system program. That system, a Power7-based supercomputer, based on IBM’s DARPA-funded PERCS architecture, was to deliver 10 peak petaflops and one petaflop of sustained performance. The IBM machine was on track to be deployed in 2012.
Some IBM cabinets had already been delivered in 2011, when in August, the company abruptly terminated the contract after determining the effort required to complete the work would ultimately be unprofitable to the company. Following IBM’s embarrassing withdrawal, NCSA and NSF re-solicited the work, hoping to put the project back on schedule with a new vendor.
According to NCSA Director Thom Dunning, the solicitation attracted Cray, along with three other bidders, who he declined to name. Dunning told HPCwire that Cray’s approach lined up very well with the Blue Waters’ mission. “As we have always done, we didn’t pick a system that was just focused on peak performance, but a system that focused on sustained performance and had the memory and disk performance that is really needed by the science and engineering community,” said Dunning.
Cray CEO Peter Ungaro reiterated that point, noting that the Blue Waters project is a great fit for his company’s vision of adaptive and heterogeneous computing at scale, and with a customer that is focused on sustained performance rather than raw flops or Linpack benchmarks. “NCSA and NSF could have made a lot of tradeoffs to build a much bigger machine from a peaks flops standpoint and to get a better ranking on the TOP500,” said Ungaro.
The addition of GPU acceleration was brought in at the behest of the researchers who are gearing up to use the Blue Waters system. In fact, according to Dunning, two thirds of the researchers who are in line to run their application on the machine are now asking for these accelerators, which influenced NCSA’s choice to go with Cray’s XE6/XK6 hybrid supercomputer. Over the past five years, some of these researchers have ported portions of their science codes to take advantage of GPGPUs. “That was the one major change that occurred between 2006 and 2011,” said Dunning.
That said, the supercomputer will mostly rely in CPUs. Cray estimates the system will have more than 235 cabinets of CPU-only XE6 cabinets and over 30 cabinets of CPU-GPU XK6 cabinets. In both cases, the CPU will be AMD’s “Interlagos” Opteron 6200 processor, which was officially launched on Monday. Specifically, the machine will be outfitted with the 16-core 2.3 GHz Opteron 6276.
In aggregate, more than 49,000 of these CPUs will be used in the machine, representing about two-thirds of the total flops. The remaining third will be supplied by more than 3,000 “Kepler” GPUs, NVIDIA’s next-generation graphics processor that is expected to go into production in 2012. Dunning said the CPUs alone will be enough to sustain one petaflop of performance on science applications capable of scaling to that level. If those codes can employ GPUs effectively, an additional performance boost will be possible.
The supercomputer will be impressive in nearly every other dimension as well. The machine will have more than 1.5 PB of total memory, an aggregate I/O bandwidth of over 1 TB/s, and an enormous interconnect network, with about 4,500 km of wires. Cray will also be supplying more than 25 petabytes of external storage integrated with the Lustre file system. “It’s going to be the biggest supercomputer we’ve ever built.” said Ungaro.
The application work for Blue Waters will span the breadth of big science applications, in particular, those in molecular science, climate/weather forecasting, earth science, life sciences, and astrophysics. As Ungaro implied, NCSA and the NSF could have built a much larger machine from a pure flops perspective if they had maximized the GPU components, but instead felt that the CPU-heavy mix matches the current state of these science codes much more closely at this point.
NCSA and Cray are planning to stick to the same deployment schedule as was being pursued with the IBM PERCS machine, with the final system up and running by next fall. The general plan is to deploy the CPU nodes first, with the GPU components installed during the last stage of the build-out.
Cray expects to book most of the $188 million contract money in 2012, but the funding includes five years of Blue Waters services and support. This time around there is no termination clause in the contract, which from Ungaro’s point of view is not an issue. Delivering supercomputing to scientists is essentially Cray’s whole business, so there really no consideration that they wouldn’t follow through. “It was the easiest part of the negotiation.” laughed Ungaro.
In next couple of weeks, a small Interlagos-based test system will be installed, followed in early 2012, by a much larger machine. This will allow the researchers to work on optimizing and scaling their applications, at least with the CPU components. By the middle of the summer, they will have the full system deployed, with the exception of the “Kepler” GPUs, which are expected to arrive in early fall. If all goes as planned, the entire system should be up and running by this time next year.
After being in limbo since August, Dunning is eager to move forward with the project, adding that he and his team are delighted to work with Cray. “We’ve been waiting for four years to put hardware on the floor that the science and engineering teams could use, and it’s finally happening,” he said.