In the past couple of weeks we've been entertained by a plethora of announcements describing “breakthrough” semiconductor/microprocessor technology that promise far reaching effects on the industry. These include P.A. Semi's unveiling of its ultra-low-power dual-core Power processor, AMD's Barcelona quad-core chip re-announcement, Intel's demo of its 80-core teraflop processor, the unveiling of a new massively parallel stream processor, and the introduction of IBM's new embedded DRAM technology. And that's just a sample. February has ushered in a veritable Cambrian explosion of semiconductor gadgetry.
Most of these advancements won't make their effect felt until 2008 — and well beyond that in the case of Intel's 80-core wonder. How much effect? Separating hype from substance is always a challenge when the announcements are still warm, but I'll take a shot at it.
Eighty Cores, No Waiting
Intel demonstrated its 80-core prototype this week at the International Solid-State Circuits Conference (ISSCC). The 1+ teraflop chip is implemented at 65nm technology. The version demonstrated at the conference was made up of rather simple floating-point cores to achieve the teraflop-grabbing headlines. But to claim equivalence to a rack of HPC servers or a mainframe, as some analysts have done, is just hyperventilation. To produce a commercially viable version with Intel Architecture-based cores will undoubtedly require a much smaller feature size and more ingenious engineering. Nevertheless, the 62 watt power consumption is nothing short of amazing, while the on-chip routers and 3-D memory stacking offers an innovative approach to a many-core architecture.
More Barcelona
Also taking advantage of the ISSCC stage, AMD released some additional details of its upcoming quad-core “Barcelona” Opteron processor, scheduled to be released in mid-2007. In what should make high performance computing applications especially happy, the new processor will double Barcelona's floating-point execution pipeline to a width of 128 bits, allowing twice as many FP instructions and data to flow through. Most of the other announced features are related to new energy-conserving features. For example, the PowerNow! technology will be enhanced to provide dynamic adjustment of core frequencies, so that individual units don't run hot if they're idle or have reduced loads. The system memory interface will also include the capability to powers down memory logic when not in use. Additionally, the design takes advantage of “clock gating” to enable automatic shut-down of logic areas that are not being utilized.
Extreme Clock Gating
Clock gating, the feature that dynamically turns off circuits that are not being used, is becoming more commonplace as designers obsess about energy conservation. The feature is taken to a new level in P.A. Semi's new dual-core PA6T-1682M PWRficient processor, which sips just 5-13 watts of power at 2 GHz. In this implementation, clock gating is used systematically to shut off unused circuits throughout the processor. With the new emphasis on energy conservation throughout IT, it seems likely to become a more popular technique to reduce power consumption. According to Mark Hayter, Chief System Architect at P.A. Semi, using clock gating requires that the processor architect integrate the methodology during the design phase; it's not something that can be retrofitted.
“I think most people have done some degree of clock gating,” observed Hayter, “but certainly not to the level of granularity that we've done.”
A New Stream Processor
Another high performance embedded architecture that garnered some attention at ISSCC, was SPI's new Stream Processor. Bill Dally, the CS chair at Stanford, co-founded SPI (Stream Processor Inc.) in 2004 with the idea to develop a highly parallelized stream processor for digital signal processing. The processor consists of a heterogenous core set, including a data-parallel unit (DPU) and two MIPS cores — one for the OS and one to manage DSP threads and offload compute-intensive functions to the DPU. The current incantation of the DPU runs billion of operations per second (gigaops), but the young startup is already looking towards a teraops version for the next generation.
According to Matthew Papakipos, PeakStream chief technology officer, “Bill Dally at SPI has been a leader in the academic research developing stream processor hardware architectures. This research has been very influential in a wide variety of modern hardware designs including programmable graphics processors, the IBM/Sony/Toshiba Cell processor and upcoming many-core CPU designs. SPI's processor is likely to make a significant impact on digital signal processing for embedded applications.”
A standard C programming interface (along with some stream processing extensions) is included to provide developers with a familiar software environment. Access to the memory hierarchy is managed by the compiler/runtime system to take advantage of data locality. Sounds promising.
Embedded DRAM challenges SRAM
Also announced at ISSCC was IBM's new embedded dynamic RAM (DRAM) for on-chip memory. The new DRAM is designed to replace the static RAM (SRAM) currently being used for on-chip cache in most processors. Up until now, because it was so much slower than SRAM, DRAM was mostly relegated to off-chip memory. Although not quite as speedy as SRAM, DRAM has the advantages of smaller dimensions, less memory leakage, and overall better performance characteristics.
Especially for multi-core, high performance chips, the imbalance between memory and processor speeds continues to be one of the fundamental problems that limits overall application performance. Processors that consist of cores floating in a vast sea of SRAM cache is the result. The Itanium processor is probably the most extreme example of this type of arrangement, but it is by no means the only one.
In the past, IBM used embedded DRAM in its PowerPC-based processor for the Blue Gene/L supercomputers. This new technology will enable on-chip DRAM to be used in mainstream IBM chips in 2008 as part of it 45nm offerings. IBM claims that embedded DRAM will effectively double processor performance beyond what could have been achieved with traditional scaling.
For IBM at least, this seems like a can't-miss technology. They've been touting the praises of embedded DRAM for awhile and the technology circumvents many of the shortcomings of SRAM — especially for high-end chips. AMD is looking at Z-RAM technology from Innovative Silicon for denser caches, so in this case it may choose not to leverage the IBM partnership. Intel is researching “floating-body cell” memory to replace SRAM, but is still uncommitted as far as commercial production.
Super FPGAs
Researchers at Worcester Polytechnic Institute (WPI) want to develop a new kind of reconfigurable computing device that has the superior performance and energy consumptions characteristics of an ASIC, but the programmability of an FPGA. According to this week's announcement: “Using a type of parallel computing called stream processing, the chip will complete hundreds of calculations simultaneously, enabling it to perform up to 300 times faster than microprocessors and about 15 times faster than FPGAs.”
DARPA is funding this with a very modest 18-month, $150,000 award, so don't expect any miracles in the next year or two.
Quantum Leaping
Perhaps the biggest processor-related news of recent weeks came from a company that has very little to do with digital computing. On Tuesday, a small Canadian startup named D-Wave demonstrated its prototype quantum computer. In front of a few hundred people in Mountain View, California, D-Wave's 16-qubit computer (the actual quantum hardware was off-site) ran three different applications. The company then proceeded to put forth its vision of how QC will change the nature of computing forever. As the first commercial QC vendor, D-Wave is looking to become the Cray of quantum computing.
To really understand quantum computing requires a course in graduate physics and perhaps a belief in parallel universes as well. But even mere mortals can appreciate the possibilities for this new technology. Read our feature coverage by me and Bob Feldman (no relation) in this week's issue to see what all the hubbub is about.
And for an overall perspective of the week's HPC news, chips or otherwise, check out John West's excellent wrap-up: The Week in Review.
—-
As always, comments about HPCwire are welcomed and encouraged. Write to me, Michael Feldman, at [email protected].