Here’s a collection of highlights, selected totally subjectively, from this week’s HPC news stream as reported at insideHPC.com and HPCwire.
>>10 words and a link
Promise for the viability of Java in HPC;
Eadline on what nature can teach us regarding big clusters;
SGI announces Q4 results, posts continuing losses as backlog grows;
Six part series on managing multicore projects;
Blue Waters gets green light: 200k cores, 1PB globally addressable memory;
NSF puts $6M in multicore research program;
Dr. Dobb’s interview with SDSC’s infrastructure director;
MSC in the house of Lamborghini;
Colfax Introduces Quiet Compute Cluster;
SGI providing support for Verari in EMEA;
UT Knoxville Wins $16M Math and Biology Center;
Kotura supports Sun’s DARPA-funded nanophotonics research;
>>PRACE picks prototypes for petaflops proof
I wrote an article back in June for HPCwire that described some of the high level things going on in publicly funded HPC in Europe. In that article I talk about PRACE:
The goal of PRACE is to establish three to five European ‘tier 0’ centers, each with petascale resources, that will serve broader EU science and industrial research goals. PRACE is still very much a planning exercise, currently funded at 40M euros of what is expected to grow to an estimated 200M euro budget for operations alone.
While things are getting organized with PRACE, technical progress is being made. This week the organization announced that it has chosen six systems to evaluate as prototypes for petaflops systems to be installed in 2009 and 2010. Systems will be installed or evaluated as follows:
– BSC (Barcelona Supercomputing Center, Spain): hybrid prototype combining IBM Cell and Power6 processors.
– CEA (French Atomic Energy Commission, France) and FZJ (Forschungszentrum Jülich, Germany) jointly use Intel Nehalem/Xeon processors in their systems. Two shared-memory multiprocessors (thin node clusters) will be distributed over the two sites; a prototype produced by BULL at CEA and a larger system of the same architecture at FZJ.
– FZJ is also adding its already installed IBM BlueGene/P system for evaluation.
– CSC (The Finnish IT Center for Science, Finland) and CSCS (Swiss National Supercomputing Centre, Switzerland): Cray XT5.
– HLRS (High Performance Computing Center Stuttgart, Germany): an NEC SX-9 and an x86-based cluster
– NCF (Netherlands Computing Facilities Foundation, The Netherlands): IBM Power6 architecture, a shared-memory multiprocessor
(fat node cluster).
These are mostly machines with a high degree of special purpose technology. Noticeably absent: SGI, HP and, to a lesser degree, Sun. I’m glad to see the NEC in there; we don’t see too many of those in the US.
The prototypes are going to be used to evaluate performance and scalability and “total cost of ownership” (energy costs).
They will make also possible the evaluation of software for managing the distributed infrastructure, the preparation of benchmarks for future Petascale systems allowing better understanding of user requirements, the scaling and optimisation of libraries and codes and the definition of technical requirements and procurement procedures for the PRACE Petaflop/s production systems for 2009/2010.
>>Talent bifurcation in CS?
Intel’s Michael Wrinn comments on points of view expressed by panelists at the Academic Community Multi-core Programming Roundable:
Which brings us to the recurrent theme: performance. The audience, mainly from industry, certainly picked it up; several identified themselves as hiring managers, and lamented the general ignorance of performance and architecture details. At least one of them said he prefers to interview only EE graduates — for software jobs — since CS students typically do not bring what his company needs (the industries represented here were quite varied: search engine, medical instruments, cluster consulting etc).
At the conclusion of the article, he wonders if we’re entering a talent bifurcation:
I wonder if the academic computing universe is splitting into two camps: those where students deal directly with architecture, low-level languages, concurrency, and performance, and those where students stay at a higher level of abstraction (typically expressed with Java or Python)?
>>WRF CUDA Benchmarks
John Michalakes of the National Center for Atmospheric Research, has just announced the release of the latest benchmark Web sites for the Weather Research Forecast [WRF] code. In his post, he also made note of his latest page containing benchmark data from GPU-accelerated numerical weather prediction. The benchmark data comes from results compiled from a recent work published through the workshop on Large Scale Parallel Processing [LSPP] within the IEEE International Parallel and Distributed Processing Symposium [IPDPS] in April 2008.
The code, just recently updated by John and co., implements the WSM5 microphysics module in CUDA. On a GTX280 GPU, the module clocks in just over 64 Gflops. This compared to a measly 1.6 Gflops on a 2.4Ghz Opteron. Wowzers!
The team at NCAR is currently working on a CUDA port for a module that “computes 5th order positive definite tracer advection using finite difference approximation.” They’re also working on porting some of the radiation physics to the GPU.
For more on the GPU benchmarks, check out the WRF GPU benchmark page here.
For info on more traditional WRF benchmarks, check out the updated benchmark page here.