HPCwire

Leading HPC
Solution Providers




















HPCwire >> Blogs

Blog: From the Editor

From the Editor | Main Blog Index

Chips Galore


In the past couple of weeks we've been entertained by a plethora of announcements describing "breakthrough" semiconductor/microprocessor technology that promise far reaching effects on the industry. These include P.A. Semi's unveiling of its ultra-low-power dual-core Power processor, AMD's Barcelona quad-core chip re-announcement, Intel's demo of its 80-core teraflop processor, the unveiling of a new massively parallel stream processor, and the introduction of IBM's new embedded DRAM technology. And that's just a sample. February has ushered in a veritable Cambrian explosion of semiconductor gadgetry.

Most of these advancements won't make their effect felt until 2008 -- and well beyond that in the case of Intel's 80-core wonder. How much effect? Separating hype from substance is always a challenge when the announcements are still warm, but I'll take a shot at it.

Eighty Cores, No Waiting

Intel demonstrated its 80-core prototype this week at the International Solid-State Circuits Conference (ISSCC). The 1+ teraflop chip is implemented at 65nm technology. The version demonstrated at the conference was made up of rather simple floating-point cores to achieve the teraflop-grabbing headlines. But to claim equivalence to a rack of HPC servers or a mainframe, as some analysts have done, is just hyperventilation. To produce a commercially viable version with Intel Architecture-based cores will undoubtedly require a much smaller feature size and more ingenious engineering. Nevertheless, the 62 watt power consumption is nothing short of amazing, while the on-chip routers and 3-D memory stacking offers an innovative approach to a many-core architecture.

More Barcelona

Also taking advantage of the ISSCC stage, AMD released some additional details of its upcoming quad-core "Barcelona" Opteron processor, scheduled to be released in mid-2007. In what should make high performance computing applications especially happy, the new processor will double Barcelona's floating-point execution pipeline to a width of 128 bits, allowing twice as many FP instructions and data to flow through. Most of the other announced features are related to new energy-conserving features. For example, the PowerNow! technology will be enhanced to provide dynamic adjustment of core frequencies, so that individual units don't run hot if they're idle or have reduced loads. The system memory interface will also include the capability to powers down memory logic when not in use. Additionally, the design takes advantage of "clock gating" to enable automatic shut-down of logic areas that are not being utilized.

Extreme Clock Gating

Clock gating, the feature that dynamically turns off circuits that are not being used, is becoming more commonplace as designers obsess about energy conservation. The feature is taken to a new level in P.A. Semi's new dual-core PA6T-1682M PWRficient processor, which sips just 5-13 watts of power at 2 GHz. In this implementation, clock gating is used systematically to shut off unused circuits throughout the processor. With the new emphasis on energy conservation throughout IT, it seems likely to become a more popular technique to reduce power consumption. According to Mark Hayter, Chief System Architect at P.A. Semi, using clock gating requires that the processor architect integrate the methodology during the design phase; it's not something that can be retrofitted.

"I think most people have done some degree of clock gating," observed Hayter, "but certainly not to the level of granularity that we've done."

A New Stream Processor
 
Another high performance embedded architecture that garnered some attention at ISSCC, was SPI's new Stream Processor. Bill Dally, the CS chair at Stanford, co-founded SPI (Stream Processor Inc.) in 2004 with the idea to develop a highly parallelized stream processor for digital signal processing. The processor consists of a heterogenous core set, including a data-parallel unit (DPU) and two MIPS cores -- one for the OS and one to manage DSP threads and offload compute-intensive functions to the DPU. The current incantation of the DPU runs billion of operations per second (gigaops), but the young startup is already looking towards a teraops version for the next generation.

According to Matthew Papakipos, PeakStream chief technology officer, "Bill Dally at SPI has been a leader in the academic research developing stream processor hardware architectures. This research has been very influential in a wide variety of modern hardware designs including programmable graphics processors, the IBM/Sony/Toshiba Cell processor and upcoming many-core CPU designs. SPI's processor is likely to make a significant impact on digital signal processing for embedded applications."

A standard C programming interface (along with some stream processing extensions) is included to provide developers with a familiar software environment. Access to the memory hierarchy is managed by the compiler/runtime system to take advantage of data locality. Sounds promising.

Embedded DRAM challenges SRAM

Also announced at ISSCC was IBM's new embedded dynamic RAM (DRAM) for on-chip memory. The new DRAM is designed to replace the static RAM (SRAM) currently being used for on-chip cache in most processors. Up until now, because it was so much slower than SRAM, DRAM was mostly relegated to off-chip memory. Although not quite as speedy as SRAM, DRAM has the advantages of smaller dimensions, less memory leakage, and overall better performance characteristics.

Especially for multi-core, high performance chips, the imbalance between memory and processor speeds continues to be one of the fundamental problems that limits overall application performance. Processors that consist of cores floating in a vast sea of SRAM cache is the result. The Itanium processor is probably the most extreme example of this type of arrangement, but it is by no means the only one.

In the past, IBM used embedded DRAM in its PowerPC-based processor for the Blue Gene/L supercomputers. This new technology will enable on-chip DRAM to be used in mainstream IBM chips in 2008 as part of it 45nm offerings. IBM claims that embedded DRAM will effectively double processor performance beyond what could have been achieved with traditional scaling.

For IBM at least, this seems like a can't-miss technology. They've been touting the praises of embedded DRAM for awhile and the technology circumvents many of the shortcomings of SRAM -- especially for high-end chips. AMD is looking at Z-RAM technology from Innovative Silicon for denser caches, so in this case it may choose not to leverage the IBM partnership. Intel is researching "floating-body cell" memory to replace SRAM, but is still uncommitted as far as commercial production.

Super FPGAs

Researchers at Worcester Polytechnic Institute (WPI) want to develop a new kind of reconfigurable computing device that has the superior performance and energy consumptions characteristics of an ASIC, but the programmability of an FPGA. According to this week's announcement: "Using a type of parallel computing called stream processing, the chip will complete hundreds of calculations simultaneously, enabling it to perform up to 300 times faster than microprocessors and about 15 times faster than FPGAs."

DARPA is funding this with a very modest 18-month, $150,000 award, so don't expect any miracles in the next year or two.

Quantum Leaping

Perhaps the biggest processor-related news of recent weeks came from a company that has very little to do with digital computing. On Tuesday, a small Canadian startup named D-Wave demonstrated its prototype quantum computer. In front of a few hundred people in Mountain View, California, D-Wave's 16-qubit computer (the actual quantum hardware was off-site) ran three different applications. The company then proceeded to put forth its vision of how QC will change the nature of computing forever. As the first commercial QC vendor, D-Wave is looking to become the Cray of quantum computing.

To really understand quantum computing requires a course in graduate physics and perhaps a belief in parallel universes as well. But even mere mortals can appreciate the possibilities for this new technology. Read our feature coverage by me and Bob Feldman (no relation) in this week's issue to see what all the hubbub is about.

And for an overall perspective of the week's HPC news, chips or otherwise, check out John West's excellent wrap-up: The Week in Review.

----

As always, comments about HPCwire are welcomed and encouraged. Write to me, Michael Feldman, at editor@hpcwire.com.

Posted by Michael Feldman - February 16 @ 12:00AM

(Digg, Technorati, more)

Discussion

There are 0 discussion items posted.  

Sponsored Links

New Paper: Parallel Computing Without Parallel Programming
Learn how domain experts can run VHLL programs like MATLAB® on a variety of high-performance platforms without low-level reprogramming and how to work with the largest datasets and complex algorithms without sacrificing ease of use or reducing productivity.

Michael Feldman

Michael Feldman is the editor of HPCwire.

More Michael Feldman



Recent Comments

Feature Articles

Spider Up and Spinning Connections to All Computing Platforms at ORNL

Spider, the world's biggest Lustre-based, centerwide file system, has been fully tested to support Oak Ridge National Laboratory's new petascale Cray XT4/XT5 Jaguar supercomputer and is now offering early access to scientists.
Read More...

Wolfram Alpha: A Web-Based Application That Embraced Supercomputers

Wolfram Alpha, the Web-based computational engine introduced in May, is not a traditional supercomputing application, but relies on supercomputers to satisfy its unique requirements.
Read More...

TeraGrid '09: Student Participation Soars

There was a new energy at this year's TeraGrid '09 conference thanks to an outstanding turnout for the student program. Thanks to support from the National Science Foundation, more than 100 high school, undergraduate and graduate students were able to participate in the conference.
Read More...

Top Headlines

3D Seismic Data: Taking a Smarter Approach to Interpretation

Jul 09 | Engineer Live | The demand for computational tools to underpin the 3D seismic interpretation process has never been more apparent. Read more...

Engineering Unemployment Soared in 2Q to 8.6%

Jul 08 | EE Times | Unemployment for U.S. engineers has reached record levels, according to government figures. Read more...

Gartner Adjusts 2009 IT Spend Downward Again

Jul 08 | Network World | Global spending for 2009 projected to drop 6 percent, for a total of $3.2 trillion. Read more...

Concurrent and Parallel Are Not The Same

Jul 08 | Linux Magazine | Portability or efficiency? Neither is guaranteed when writing explicit parallel code. Read more...

800 TFLOP Real-Time Ray Tracing GPU Unveiled, Not for Gamers

Jul 07 | Ars Technica | Japanese company builds custom ASIC to accelerate real-time ray traced rendering for the auto industry. Read more...

Featured Whitepapers

Parallel Computing Without Parallel Programming

Jul 10 | | Engineers, scientists, and other domain experts depend on the productivity enabled by very high-level language (VHLL) tools like MATLAB® and Python. However, as datasets grow larger and programs get more sophisticated, ordinary desktop computers can no longer keep up. The paper explores how to run VHLL programs on high-performance platforms without low-level reprogramming. Work with large datasets and complex algorithms without sacrificing ease of use or reducing productivity.

Building High Performance Computing in a Green and Modular Solution Building Block

Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.

Multimedia

Webcast: Dell Expands HPC Access and Adoption with Intel Cluster Ready Program


Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell

Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.

Video White Paper: Architecting a Better Network Storage Solution

BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.

Webcast: HPC Development Solutions: Sun Studio & Sun HPC ClusterTools


Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.

Blogs by Topics

Blogs by Author

HPC Blogroll



Featured Events

WORLDCOMP 2009
Data Mining Courses