Visit additional Tabor Communication Publications
February 16, 2007
In the past couple of weeks we've been entertained by a plethora of announcements describing "breakthrough" semiconductor/microprocessor technology that promise far reaching effects on the industry. These include P.A. Semi's unveiling of its ultra-low-power dual-core Power processor, AMD's Barcelona quad-core chip re-announcement, Intel's demo of its 80-core teraflop processor, the unveiling of a new massively parallel stream processor, and the introduction of IBM's new embedded DRAM technology. And that's just a sample. February has ushered in a veritable Cambrian explosion of semiconductor gadgetry.
Most of these advancements won't make their effect felt until 2008 -- and well beyond that in the case of Intel's 80-core wonder. How much effect? Separating hype from substance is always a challenge when the announcements are still warm, but I'll take a shot at it.
Eighty Cores, No Waiting
Intel demonstrated its 80-core prototype this week at the International Solid-State Circuits Conference (ISSCC). The 1+ teraflop chip is implemented at 65nm technology. The version demonstrated at the conference was made up of rather simple floating-point cores to achieve the teraflop-grabbing headlines. But to claim equivalence to a rack of HPC servers or a mainframe, as some analysts have done, is just hyperventilation. To produce a commercially viable version with Intel Architecture-based cores will undoubtedly require a much smaller feature size and more ingenious engineering. Nevertheless, the 62 watt power consumption is nothing short of amazing, while the on-chip routers and 3-D memory stacking offers an innovative approach to a many-core architecture.
Also taking advantage of the ISSCC stage, AMD released some additional details of its upcoming quad-core "Barcelona" Opteron processor, scheduled to be released in mid-2007. In what should make high performance computing applications especially happy, the new processor will double Barcelona's floating-point execution pipeline to a width of 128 bits, allowing twice as many FP instructions and data to flow through. Most of the other announced features are related to new energy-conserving features. For example, the PowerNow! technology will be enhanced to provide dynamic adjustment of core frequencies, so that individual units don't run hot if they're idle or have reduced loads. The system memory interface will also include the capability to powers down memory logic when not in use. Additionally, the design takes advantage of "clock gating" to enable automatic shut-down of logic areas that are not being utilized.
Extreme Clock Gating
Clock gating, the feature that dynamically turns off circuits that are not being used, is becoming more commonplace as designers obsess about energy conservation. The feature is taken to a new level in P.A. Semi's new dual-core PA6T-1682M PWRficient processor, which sips just 5-13 watts of power at 2 GHz. In this implementation, clock gating is used systematically to shut off unused circuits throughout the processor. With the new emphasis on energy conservation throughout IT, it seems likely to become a more popular technique to reduce power consumption. According to Mark Hayter, Chief System Architect at P.A. Semi, using clock gating requires that the processor architect integrate the methodology during the design phase; it's not something that can be retrofitted.
"I think most people have done some degree of clock gating," observed Hayter, "but certainly not to the level of granularity that we've done."
A New Stream Processor
Another high performance embedded architecture that garnered some attention at ISSCC, was SPI's new Stream Processor. Bill Dally, the CS chair at Stanford, co-founded SPI (Stream Processor Inc.) in 2004 with the idea to develop a highly parallelized stream processor for digital signal processing. The processor consists of a heterogenous core set, including a data-parallel unit (DPU) and two MIPS cores -- one for the OS and one to manage DSP threads and offload compute-intensive functions to the DPU. The current incantation of the DPU runs billion of operations per second (gigaops), but the young startup is already looking towards a teraops version for the next generation.
According to Matthew Papakipos, PeakStream chief technology officer, "Bill Dally at SPI has been a leader in the academic research developing stream processor hardware architectures. This research has been very influential in a wide variety of modern hardware designs including programmable graphics processors, the IBM/Sony/Toshiba Cell processor and upcoming many-core CPU designs. SPI's processor is likely to make a significant impact on digital signal processing for embedded applications."
A standard C programming interface (along with some stream processing extensions) is included to provide developers with a familiar software environment. Access to the memory hierarchy is managed by the compiler/runtime system to take advantage of data locality. Sounds promising.
Embedded DRAM challenges SRAM
Also announced at ISSCC was IBM's new embedded dynamic RAM (DRAM) for on-chip memory. The new DRAM is designed to replace the static RAM (SRAM) currently being used for on-chip cache in most processors. Up until now, because it was so much slower than SRAM, DRAM was mostly relegated to off-chip memory. Although not quite as speedy as SRAM, DRAM has the advantages of smaller dimensions, less memory leakage, and overall better performance characteristics.
Especially for multi-core, high performance chips, the imbalance between memory and processor speeds continues to be one of the fundamental problems that limits overall application performance. Processors that consist of cores floating in a vast sea of SRAM cache is the result. The Itanium processor is probably the most extreme example of this type of arrangement, but it is by no means the only one.
In the past, IBM used embedded DRAM in its PowerPC-based processor for the Blue Gene/L supercomputers. This new technology will enable on-chip DRAM to be used in mainstream IBM chips in 2008 as part of it 45nm offerings. IBM claims that embedded DRAM will effectively double processor performance beyond what could have been achieved with traditional scaling.
For IBM at least, this seems like a can't-miss technology. They've been touting the praises of embedded DRAM for awhile and the technology circumvents many of the shortcomings of SRAM -- especially for high-end chips. AMD is looking at Z-RAM technology from Innovative Silicon for denser caches, so in this case it may choose not to leverage the IBM partnership. Intel is researching "floating-body cell" memory to replace SRAM, but is still uncommitted as far as commercial production.
Researchers at Worcester Polytechnic Institute (WPI) want to develop a new kind of reconfigurable computing device that has the superior performance and energy consumptions characteristics of an ASIC, but the programmability of an FPGA. According to this week's announcement: "Using a type of parallel computing called stream processing, the chip will complete hundreds of calculations simultaneously, enabling it to perform up to 300 times faster than microprocessors and about 15 times faster than FPGAs."
DARPA is funding this with a very modest 18-month, $150,000 award, so don't expect any miracles in the next year or two.
Perhaps the biggest processor-related news of recent weeks came from a company that has very little to do with digital computing. On Tuesday, a small Canadian startup named D-Wave demonstrated its prototype quantum computer. In front of a few hundred people in Mountain View, California, D-Wave's 16-qubit computer (the actual quantum hardware was off-site) ran three different applications. The company then proceeded to put forth its vision of how QC will change the nature of computing forever. As the first commercial QC vendor, D-Wave is looking to become the Cray of quantum computing.
To really understand quantum computing requires a course in graduate physics and perhaps a belief in parallel universes as well. But even mere mortals can appreciate the possibilities for this new technology. Read our feature coverage by me and Bob Feldman (no relation) in this week's issue to see what all the hubbub is about.
And for an overall perspective of the week's HPC news, chips or otherwise, check out John West's excellent wrap-up: The Week in Review.
As always, comments about HPCwire are welcomed and encouraged. Write to me, Michael Feldman, at email@example.com.
Posted by Michael Feldman - February 15, 2007 @ 9:00 PM, Pacific Standard Time
Michael Feldman is the editor of HPCwire.
No Recent Blog Comments
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
Although Horst Simon was named Deputy Director of Lawrence Berkeley National Laboratory, he maintains his strong ties to the scientific computing community as an editor of the TOP500 list and as an invited speaker at conferences.
Supercomputing veteran, Bo Ewald, has been neck-deep in bleeding edge system development since his twelve-year stint at Cray Research back in the mid-1980s, which was followed by his tenure at large organizations like SGI and startups, including Scale Eight Corporation and Linux Networx. He has put his weight behind quantum company....
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
May 08, 2013 |
For engineers looking to leverage high-performance computing, the accessibility of a cloud-based approach is a powerful draw, but there are costs that may not be readily apparent.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.