The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
From the Editor | Main Blog Index
February 16, 2007
In the past couple of weeks we've been entertained by a plethora of announcements describing "breakthrough" semiconductor/microprocessor technology that promise far reaching effects on the industry. These include P.A. Semi's unveiling of its ultra-low-power dual-core Power processor, AMD's Barcelona quad-core chip re-announcement, Intel's demo of its 80-core teraflop processor, the unveiling of a new massively parallel stream processor, and the introduction of IBM's new embedded DRAM technology. And that's just a sample. February has ushered in a veritable Cambrian explosion of semiconductor gadgetry.
Most of these advancements won't make their effect felt until 2008 -- and well beyond that in the case of Intel's 80-core wonder. How much effect? Separating hype from substance is always a challenge when the announcements are still warm, but I'll take a shot at it.
Eighty Cores, No Waiting
Intel demonstrated its 80-core prototype this week at the International Solid-State Circuits Conference (ISSCC). The 1+ teraflop chip is implemented at 65nm technology. The version demonstrated at the conference was made up of rather simple floating-point cores to achieve the teraflop-grabbing headlines. But to claim equivalence to a rack of HPC servers or a mainframe, as some analysts have done, is just hyperventilation. To produce a commercially viable version with Intel Architecture-based cores will undoubtedly require a much smaller feature size and more ingenious engineering. Nevertheless, the 62 watt power consumption is nothing short of amazing, while the on-chip routers and 3-D memory stacking offers an innovative approach to a many-core architecture.
More Barcelona
Also taking advantage of the ISSCC stage, AMD released some additional details of its upcoming quad-core "Barcelona" Opteron processor, scheduled to be released in mid-2007. In what should make high performance computing applications especially happy, the new processor will double Barcelona's floating-point execution pipeline to a width of 128 bits, allowing twice as many FP instructions and data to flow through. Most of the other announced features are related to new energy-conserving features. For example, the PowerNow! technology will be enhanced to provide dynamic adjustment of core frequencies, so that individual units don't run hot if they're idle or have reduced loads. The system memory interface will also include the capability to powers down memory logic when not in use. Additionally, the design takes advantage of "clock gating" to enable automatic shut-down of logic areas that are not being utilized.
Extreme Clock Gating
Clock gating, the feature that dynamically turns off circuits that are not being used, is becoming more commonplace as designers obsess about energy conservation. The feature is taken to a new level in P.A. Semi's new dual-core PA6T-1682M PWRficient processor, which sips just 5-13 watts of power at 2 GHz. In this implementation, clock gating is used systematically to shut off unused circuits throughout the processor. With the new emphasis on energy conservation throughout IT, it seems likely to become a more popular technique to reduce power consumption. According to Mark Hayter, Chief System Architect at P.A. Semi, using clock gating requires that the processor architect integrate the methodology during the design phase; it's not something that can be retrofitted.
"I think most people have done some degree of clock gating," observed Hayter, "but certainly not to the level of granularity that we've done."
A New Stream Processor
Another high performance embedded architecture that garnered some attention at ISSCC, was SPI's new Stream Processor. Bill Dally, the CS chair at Stanford, co-founded SPI (Stream Processor Inc.) in 2004 with the idea to develop a highly parallelized stream processor for digital signal processing. The processor consists of a heterogenous core set, including a data-parallel unit (DPU) and two MIPS cores -- one for the OS and one to manage DSP threads and offload compute-intensive functions to the DPU. The current incantation of the DPU runs billion of operations per second (gigaops), but the young startup is already looking towards a teraops version for the next generation.
According to Matthew Papakipos, PeakStream chief technology officer, "Bill Dally at SPI has been a leader in the academic research developing stream processor hardware architectures. This research has been very influential in a wide variety of modern hardware designs including programmable graphics processors, the IBM/Sony/Toshiba Cell processor and upcoming many-core CPU designs. SPI's processor is likely to make a significant impact on digital signal processing for embedded applications."
A standard C programming interface (along with some stream processing extensions) is included to provide developers with a familiar software environment. Access to the memory hierarchy is managed by the compiler/runtime system to take advantage of data locality. Sounds promising.
Embedded DRAM challenges SRAM
Also announced at ISSCC was IBM's new embedded dynamic RAM (DRAM) for on-chip memory. The new DRAM is designed to replace the static RAM (SRAM) currently being used for on-chip cache in most processors. Up until now, because it was so much slower than SRAM, DRAM was mostly relegated to off-chip memory. Although not quite as speedy as SRAM, DRAM has the advantages of smaller dimensions, less memory leakage, and overall better performance characteristics.
Especially for multi-core, high performance chips, the imbalance between memory and processor speeds continues to be one of the fundamental problems that limits overall application performance. Processors that consist of cores floating in a vast sea of SRAM cache is the result. The Itanium processor is probably the most extreme example of this type of arrangement, but it is by no means the only one.
In the past, IBM used embedded DRAM in its PowerPC-based processor for the Blue Gene/L supercomputers. This new technology will enable on-chip DRAM to be used in mainstream IBM chips in 2008 as part of it 45nm offerings. IBM claims that embedded DRAM will effectively double processor performance beyond what could have been achieved with traditional scaling.
For IBM at least, this seems like a can't-miss technology. They've been touting the praises of embedded DRAM for awhile and the technology circumvents many of the shortcomings of SRAM -- especially for high-end chips. AMD is looking at Z-RAM technology from Innovative Silicon for denser caches, so in this case it may choose not to leverage the IBM partnership. Intel is researching "floating-body cell" memory to replace SRAM, but is still uncommitted as far as commercial production.
Super FPGAs
Researchers at Worcester Polytechnic Institute (WPI) want to develop a new kind of reconfigurable computing device that has the superior performance and energy consumptions characteristics of an ASIC, but the programmability of an FPGA. According to this week's announcement: "Using a type of parallel computing called stream processing, the chip will complete hundreds of calculations simultaneously, enabling it to perform up to 300 times faster than microprocessors and about 15 times faster than FPGAs."
DARPA is funding this with a very modest 18-month, $150,000 award, so don't expect any miracles in the next year or two.
Quantum Leaping
Perhaps the biggest processor-related news of recent weeks came from a company that has very little to do with digital computing. On Tuesday, a small Canadian startup named D-Wave demonstrated its prototype quantum computer. In front of a few hundred people in Mountain View, California, D-Wave's 16-qubit computer (the actual quantum hardware was off-site) ran three different applications. The company then proceeded to put forth its vision of how QC will change the nature of computing forever. As the first commercial QC vendor, D-Wave is looking to become the Cray of quantum computing.
To really understand quantum computing requires a course in graduate physics and perhaps a belief in parallel universes as well. But even mere mortals can appreciate the possibilities for this new technology. Read our feature coverage by me and Bob Feldman (no relation) in this week's issue to see what all the hubbub is about.
And for an overall perspective of the week's HPC news, chips or otherwise, check out John West's excellent wrap-up: The Week in Review.
----
As always, comments about HPCwire are welcomed and encouraged. Write to me, Michael Feldman, at editor@hpcwire.com.
Posted by Michael Feldman - February 16 @ 12:00AM
(Digg, Technorati, more)
PGI Accelerator™ Fortran 95/03 and C99 compilers for x64+NVIDIA
Accelerate applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing Fortran and C programs. Available now for Linux, MacOS and Windows. Download a free 15 day trial.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager integrates all the cluster productivity tools you need to deploy, run and manage your HPC environment.
Michael Feldman is the editor of HPCwire.
More Michael Feldman
Compairson to Core i7-980X by rsingle
HPC? not so much by ewahl
Re: IBM and HPC by truly64
HPC = servers but a lot more by lawries
Multi core deployment becomes a memory game by truly64
Re: Venture Capital Drought? Not So Much. by Ron Van Holst
Re: Podcast: Cray Awarded Defense Deal; SGI Makes Storage Buy; IBM Invents New Algorithm by Nastyanna
Painful Truth by jeffrey.mcallister
SGI = graphics + HPC by johnbarr
HPC = servers but a lot more by truly64
Oracle SPARC != Fujitsu SPARC by Alan M. Feldstein
Sun & HPC != Oracle & HPC by Merblich
a third vendor for lossless low latency 10GbE fabric by lee.fisher@hp.com
Response to GAH by KevinButerbaugh
Response to KevinButerbaugh by GAH
Response to KevinButerbaugh by GAH
Response to GAH by KevinButerbaugh
Response to bdrupp by KevinButerbaugh
Climate Crisis and Exaflops by bdrupp
Climate Crisis and Exaflops by John Hules
Climate Crisis and Exaflops by GAH
Climate Crisis by KevinButerbaugh
IBM "Brain Simulation" article is not properly presented. by Merritt
563 out of 1206 by vvolkov
Little Iron by gadunk
At least it's not "cloud" by KevinButerbaugh
Native QPI Interface? by commike
Mmmmmm by hellcats
New transistorized IC chip scales. by symmecon
Itanium at IDF by Alan M. Feldstein
Communication time by jnapper
"The financial meltdown and computing" by donpellegrino
Human Models by mdgabriel
High-End SPARC Chip for Scientific Applications by Alan M. Feldstein
RapidMind by Mr LolO
Rapidmind by dminor
Longer run times by JohnWest
re: Algo trading Angst by jshore
Results of Testing by in_the_crease
C-DAC announces plans for a petaflop system; IBM researchers are working on vertical integration techniques to extend Moore's Law another 15 years. We recap those stories and more in our weekly wrapup.
Read More...
The Moscow State University supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of tripling its processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.
Read More...
Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.
Read More...
Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...
Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...
Mar 17 | The Register | But what about the tier ones? Read more...
Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...
Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html