NetApp
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

NCSA Signs Up Cray for Blue Waters Redo


The National Center for Supercomputing Applications (NCSA) has awarded Cray a $188 million contract to complete the NSF-funded Blue Waters supercomputer project at the University of Illinois. An 11.5 petaflops Cray XE6/XK6 hybrid system outfitted with AMD CPUs and NVIDIA GPUs will be deployed next year and become the center's petascale resource for open science and engineering. The much-anticipated deal was announced on Monday, just as the Supercomputing Conference (SC11) in Seattle got underway.

This is NCSA's second shot at Blue Waters. In 2007, IBM was selected to build the petascale machine as part of the NSF Track 1 leadership system program. That system, a Power7-based supercomputer, based on IBM's DARPA-funded PERCS architecture, was to deliver 10 peak petaflops and one petaflop of sustained performance. The IBM machine was on track to be deployed in 2012.

Some IBM cabinets had already been delivered in 2011, when in August, the company abruptly terminated the contract after determining the effort required to complete the work would ultimately be unprofitable to the company. Following IBM's embarrassing withdrawal, NCSA and NSF re-solicited the work, hoping to put the project back on schedule with a new vendor.

According to NCSA Director Thom Dunning, the solicitation attracted Cray, along with three other bidders, who he declined to name. Dunning told HPCwire that Cray's approach lined up very well with the Blue Waters' mission. "As we have always done, we didn't pick a system that was just focused on peak performance, but a system that focused on sustained performance and had the memory and disk performance that is really needed by the science and engineering community," said Dunning.

Cray CEO Peter Ungaro reiterated that point, noting that the Blue Waters project is a great fit for his company's vision of adaptive and heterogeneous computing at scale, and with a customer that is focused on sustained performance rather than raw flops or Linpack benchmarks. "NCSA and NSF could have made a lot of tradeoffs to build a much bigger machine from a peaks flops standpoint and to get a better ranking on the TOP500," said Ungaro.

The addition of GPU acceleration was brought in at the behest of the researchers who are gearing up to use the Blue Waters system. In fact, according to Dunning, two thirds of the researchers who are in line to run their application on the machine are now asking for these accelerators, which influenced NCSA's choice to go with Cray's XE6/XK6 hybrid supercomputer. Over the past five years, some of these researchers have ported portions of their science codes to take advantage of GPGPUs. "That was the one major change that occurred between 2006 and 2011," said Dunning.

That said, the supercomputer will mostly rely in CPUs. Cray estimates the system will have more than 235 cabinets of CPU-only XE6 cabinets and over 30 cabinets of CPU-GPU XK6 cabinets. In both cases, the CPU will be AMD's "Interlagos" Opteron 6200 processor, which was officially launched on Monday. Specifically, the machine will be outfitted with the 16-core 2.3 GHz Opteron 6276.

In aggregate, more than 49,000 of these CPUs will be used in the machine, representing about two-thirds of the total flops. The remaining third will be supplied by more than 3,000 "Kepler" GPUs, NVIDIA's next-generation graphics processor that is expected to go into production in 2012. Dunning said the CPUs alone will be enough to sustain one petaflop of performance on science applications capable of scaling to that level. If those codes can employ GPUs effectively, an additional performance boost will be possible.

The supercomputer will be impressive in nearly every other dimension as well. The machine will have more than 1.5 PB of total memory, an aggregate I/O bandwidth of over 1 TB/s, and an enormous interconnect network, with about 4,500 km of wires. Cray will also be supplying more than 25 petabytes of external storage integrated with the Lustre file system. "It's going to be the biggest supercomputer we've ever built." said Ungaro.

The application work for Blue Waters will span the breadth of big science applications, in particular, those in molecular science, climate/weather forecasting, earth science, life sciences, and astrophysics. As Ungaro implied, NCSA and the NSF could have built a much larger machine from a pure flops perspective if they had maximized the GPU components, but instead felt that the CPU-heavy mix matches the current state of these science codes much more closely at this point.

NCSA and Cray are planning to stick to the same deployment schedule as was being pursued with the IBM PERCS machine, with the final system up and running by next fall. The general plan is to deploy the CPU nodes first, with the GPU components installed during the last stage of the build-out.

Cray expects to book most of the $188 million contract money in 2012, but the funding  includes five years of Blue Waters services and support. This time around there is no termination clause in the contract, which from Ungaro's point of view is not an issue. Delivering supercomputing to scientists is essentially Cray's whole business, so there really no consideration that they wouldn't follow through. "It was the easiest part of the negotiation." laughed Ungaro.
 
In next couple of weeks, a small Interlagos-based test system will be installed, followed in early 2012, by a much larger machine. This will allow the researchers to work on optimizing and scaling their applications, at least with the CPU components. By the middle of the summer, they will have the full system deployed, with the exception of the "Kepler" GPUs, which are expected to arrive in early fall. If all goes as planned, the entire system should be up and running by this time next year.

After being in limbo since August, Dunning is eager to move forward with the project, adding that he and his team are delighted to work with Cray. "We've been waiting for four years to put hardware on the floor that the science and engineering teams could use, and it's finally happening," he said.

June 18, 2013

June 17, 2013

June 14, 2013

June 13, 2013

June 12, 2013

June 11, 2013

June 10, 2013

June 07, 2013

June 06, 2013


Most Read Features

Most Read Around the Web

Most Read This Just In

Asetek

Short Takes

Supercomputers: Not Always the Best for Big Data

Jun 18, 2013 | The world's largest supercomputers, like Tianhe-2, are great at traditional, compute-intensive HPC workloads, such as simulating atomic decay or modeling tornados. But data-intensive applications--such as mining big data sets for connections--is a different sort of workload, and runs best on a different sort of computer.
Read more...

Gordon Flashes Its Versatility in HPC Workloads

Jun 18, 2013 | Researchers are finding innovative uses for Gordon, the 285 teraflop supercomputer housed at the San Diego Supercomputer Center (SDSC) that has a unique Flash-based storage system. Since going online, researchers have put the incredibly fast I/O to use on a wide variety of workloads, ranging from chemistry to political science.
Read more...

Supercomputers: Still the King of the HPC Hill

Jun 17, 2013 | The advent of low-power mobile processors and cloud delivery models is changing the economics of computing. But just as an economy car is good at different things than a full size truck, an HPC workload still has certain computing demands that neither the fastest smartphone nor the most elastic cloud cluster can fulfill.
Read more...

TACC Longhorn Takes On Natural Language Processing

Jun 14, 2013 | For all the progress we've made in IT over the last 50 years, there's one area of life that has steadfastly eluded the grasp of computers: understanding human language. Now, researchers at the Texas Advanced Computing Center (TACC) are utilizing a Hadoop cluster on its Longhorn supercomputer to move the state of the art of language processing a little bit further.
Read more...

Titan Didn't Redo LINPACK for June Top 500 List

Jun 13, 2013 | Titan, the Cray XK7 at the Oak Ridge National Lab that debuted last fall as the fastest supercomputer in the world with 17.59 petaflops of sustained computing power, will rely on its previous LINPACK test for the upcoming edition of the Top 500 list.
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

HPCwire Live! Atlanta's Big Data Kick Off Week Meets HPC

Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?

Webinar: Mellanox Virtual Modular Switch, the Most Efficient 40GbE Aggregation Switch Solution

Join our webinar to learn how IT managers can migrate to a more resilient, flexible and scalable solution that grows with the data center. Mellanox VMS is future-proof, efficient and brings significant CAPEX and OPEX savings. The VMS is available today.

Atlanta's Big Data Kick Off Week Meets HPC Cray Xyratex

HPC Job Bank


Featured Events






  • November 17, 2013 - November 22, 2013
    SC'13
    Denver, CO
    United States


HPCwire Events