Manycore chip designs are in the works from top-tier companies IBM, Intel and AMD, and H-online’s Andreas Stiller has uncovered some recent developments. IBM presented the “heptadecacore” BlueGene/Q processor at SC10, but pinning down the exact number of cores on this baby can get a little bit complicated. There are actually 18 cores in all, but only 16 of them are intended for computing. The 17th core will run a Linux kernel, and a spare 18th core introduces redundancy to improve yield and reliability. This wunderchip is intended to power the 20-petaflop Sequoia computer, which IBM is scheduled to deliver to Lawrence Livermore National Laboratory in 2012.
H-online provides further details on the next BlueGene:
Unlike its BlueGene predecessors, the Q-version is upgraded to 64-bit processing and the SIMD unit widened so that now it can execute four double precision fused-multiply-add commands with eight floating-point operations per clock. Accordingly, at 1.6 GHz clock speed, the processor would manage 205 Gflops — but resourceful software engineers may still improve the performance even further by making the seventeenth core calculate too. Additionally, the processor supports four-way SMT and so, for instance, provides the operating systems (RHEL6 on the I/O nodes, special compute OS on the computing nodes) with 64 “logical” cores, or threads.
AMD revealed its own manycore chip plans at SC10, hinting that its 16-core Interlagos chip with the new Bulldozer architecture could debut earlier than expected, putting it on track for delivery in the third quarter of 2011. The company also countered concern about the “halved” FPU design, where each Bulldozer module contains two integer cores but only one floating point unit (FPU). According to H-online, AMD made “the argument that the ‘Flex FP’ is capable of executing two 128-bit commands simultaneously (SSE, AVX). In particular, this is true for the multiply-add commands (FMA) — which are much valued for HPC. These are not supported by Intel’s Sandy Bridge and will probably be lacking from the feature list of its successor, the Ivy Bridge, too.”
With eight modules, or 16-cores, Interlagos manages 64 double-precision floating-point operations per clock, for 224 Gflops at 3.5 GHz. Such specs put it head-to-head with Intel’s planned 8-core Sandy Bridge procesor, which will achieve the same theoretical peak value.
Further details on Intel’s processor plans can be found in the advance program guide (PDF) for the upcoming International Solid-State Circuits Conference (ISSCC) in February of 2011.
The abstracts in the program guide provide details on the Sandy Bridge EP and Westmere-EX as well as specs for the next-generation Itanium processor, code-named Poulson, the follow-on processor to Tukwila. The Itaniums solutions blog, also pointing to the ISSCC program guide, characterizes the Poulson processor, due out next year, as “a 32nm, 3.1 billion transistor, 12-Wide-Issue Itanium processor for mission-critical servers.” In addition, the processor has 8 multi-threaded cores, a ring-based system interface, and the combined cache on die is 50MB. High speed links will support 128 GB/s of bandwidth between the processors and 45 GB/s of memory bandwidth.