HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Blogs

Blog: From the Editor

From the Editor | Main Blog Index

Commodity Processor Chaos or Convergence?


In this week's issue of HPCwire, Scott Michel's feature article -- GPGPU Computing And the Heterogeneous Multi-Core Future -- does a nice job of discussing how commodity accelerators like GPUs and the Cell BE processor are helping to set the stage for heterogeneous multi-core computing. In doing so he provides some context for the emerging model of heterogeneous processing. He also talks about some of the important challenges that are being confronted, including software compatibility, compiler technologies and language environments. Scott hosted a general-purpose GPU computing tutorial workshop at last month's Supercomputing conference and was kind enough to share his thoughts on this evolving topic.

Reading Scott's article got me to thinking about the "disruptive" nature of new technologies. Incompatible architectures including multi-core x86 processors, the Cell BE processor, and GPU co-processors from ATI (now AMD) and NVIDIA are all tempting targets, waiting to be exploited for their performance prowess. The adoption of these new processors is making for exciting times in the world of high performance computing, but from the software developer's point of view, it seems chaotic.

For these new processors to be successful, the average programmer must have access to a familiar development environment. This is especially important for architectures such as GPUs and the Cell, which up until recently were only programmable through low-level software environments for game developers and graphics coders. However in one sense, all these architectures are converging; they are all going parallel. So the techniques used to program a GPU or Cell are similar to those used to program a standard homogeneous multi-core processor.

Two companies, PeakStream and RapidMind, are taking advantage of this commonality and each has built a software platform that targets these parallel architectures. PeakStream introduced their product back in September. RapidMind's offering is currently in beta, but seems to be close to a release date. I recently talked with the founders of both companies, Matthew Papakipos at PeakStream and Michael McCool at RapidMind, to get a sense of why these new parallel architectures are being mainstreamed now and where this trend is taking us.

Matthew Papakipos, PeakStream's founder and chief technology officer, has been intimately involved with GPUs for almost 10 years. He ran the GPU architecture group at NVIDIA, from 1997 to 2003, the period when GPUs grew from simple graphics engines to general-purpose processors. This parallels the rise of graphics processing in the computer and electronic games industry. Papakipos told me that when he started at NVIDIA in 1997, there were 70 people. When he left there were 2500.

At the beginning, the GPU logic was all in hardware. The programmability was added later to get more generalized graphics functionality. Papakipos said that during the early years, NVIDIA was being inundated with requests for new features from all the game developers, like new fog modes, new color interpolation or bump mapping. Microsoft was leading the charge by demanding that games be more interesting looking.

"We realized it would be easier to make the chips programmable rather than give them all the crazy features they were asking for," said Papakipos. "We were going down this path of adding all these bell and knobs and whistles that individual developers were asking for to differentiate the way their games looked."

So making the devices programmable enabled the game developers to create their own visual effects via software. In 2000, NVIDIA introduced its first programmable chip, the NV20, which ended up in the first Xbox. ATI was going down the same path as NVIDIA with their GPU device. Over the years the graphics engines evolved to become more powerful and even more general-purpose.

"It's not like we set out to make a chip for high performance computing," explained Papakipos, "but after adding enough features, we had a pretty general-purpose processor. And suddenly it became possible to do some interesting things with it in HPC."

By 2003, people started to realized that GPUs might serve as commodity replacements for proprietary floating point vector processors, representing a real opportunity to bring these devices into the HPC world. Subsidized by legions of game enthusiasts, supercomputing hardware became "almost free."

"The spark that set this off was a bunch of folks at Stanford who did some really good research in late 2004, on getting a real application to run on these GPUs," said Papakipos. "That was the first time anybody had taken a real HPC application and gotten it to run on these graphic processors."

The application was called ClawHMMER, which performs protein sequence matching. That work was done by Pat Hanrahan and was demonstrated over a year ago at SC05. A flurry of other applications were ported by the graphics research community. But Papakipos realized that only graphics programmers could figure out how to get the devices to do anything.

"There was a software gap and that's what led us to create PeakStream," said Papakipos.

The PeakStream platform provides HPC-type APIs (similar to the Intel Math Kernel Library or the MATLAB interfaces) and developer tools (debuggers and profilers) for a C/C++ programming environment. Some real compiler work was required to make that happen. The API is the front door to a virtual machine that provides the JIT (just-in-time) compiler. The virtual machine retargets the code to the particular processor the user is running on.

RapidMind software platform has a similar model. Like the PeakStream offering, it provides C++ programmers a high-level interface to data parallelism. RapidMind's runtime compiler generates the appropriate machine code for the target processor type.

Like Papakipos, Michael McCool, co-founder and chief scientist at RapidMind realized that non-graphics programmers would require a more familiar development environment to be able to apply GPUs and the Cell to a broader set of applications. McCool, a professor at the Computer Graphics Lab at the University of Waterloo, has done research into advanced programming interfaces for the graphics processors. This research, funded by the CITO, resulted in a programming system called Sh. The Sh system enabled developers to use the GPU co-processors in a PC for both graphics and general-purpose computing applications. In 2004, McCool and Stefanus Du Toit co-founded Serious Hack Inc. to commercialize this technology. Since then, the company has been renamed from Serious Hack to RapidMind.

And like his PeakStream counterpart, McCool also sees GPUs evolving towards greater and greater generality. With each new generation he sees them looking more like vector or stream co-processors.

"GPUs were actually capable of doing all this stuff a year ago but it wasn't until the X1900 and the 7000 series GPUs, from ATI and NVIDIA respectively, that there was enough of a performance leap to make it worthwhile," explained McCool. "You needed that order of magnitude. Also, it took a year for the tools and for the applications to be written at the commercial level."

The evolution of the GPU over the past five years has been dramatic and should continue to be so for the foreseeable future. Not only greater performance will be available, but new capabilities as well. The addition of double precision floating point hardware to the GPU (recently announced by NVIDIA for a 2007 device) will be especially important for HPC applications that require 64-bit FP accuracy, which should further accelerate industry adoption. It's still unclear how quickly the commodity markets will drive GPUs into the double precision realm. So far, game developers have been very resourceful with single precision.

"But there are other limitations in the GPU," noted McCool. "For example, you have floating point but no integers, which turns out to be a real pain in the neck. So in RTT's ray tracer we had to worry about floating point round-off error in our pointers. The next generation of GPUs will make those kind of weird problems go away."

Compared to a GPU, which is more akin to a co-processor, the Cell processor represents a more complex architecture, consisting of a PowerPC core with eight synergistic processing elements (SPEs) and a local memory store. The Cell design lends itself to more complex computations than might be feasible with a GPU.

This week, Gianni De Fabritiis, a researcher with the Computational Biochemistry and Biophysics Lab (GRIB-IMIM/UPF) in the Barcelona Biomedical Research Park published a white paper (http://arxiv.org/PS_cache/physics/pdf/0611/0611201.pdf) describing a molecular dynamics simulation application that achieved 30 gigaflops sustained performance on a Cell BE, representing an order of magnitude improvement when compared to a standard scalar CPU. The only notable downside was the effort required to change the application's software model. Concludes Fabritiis:

"The cost of this effort cannot be underestimated, but the performance obtainable compared to a traditional processor is about 20 times faster for the realistic case of molecular dynamics of biomolecules. Similar results are also possible for other computing intensive scientific and technological problems, such as computational fluid dynamics, systems biology and Monte Carlo methods for finance."

He continues:

"New multi-core standard processors will need to show that they can reach similar performance levels at the same cost. The implications of this technology for science are also important. Without a doubt it expands the frontier of scientific computing while lowering the cost of entry in terms of the computational infrastructure required to run molecular based software."

There's a notion that GPUs, the Cell and x86 architectures are actually converging. PeakStream's Papakipos thinks the Cell BE and AMD's future "Fusion" (CPU-GPU) processor are part of a larger phenomenon that will transform general-purpose computing. He envisions CPUs becoming more GPU-like, and processors evolving into architectures that include a large number of cores, distributed memory, NUMA (Non-Uniform Memory Access) and SIMD (Single Instruction Multiple Data) hardware. Even the 80-core prototype Intel talked up at the Intel Developer Forum this September follows this same general pattern.

"There's a convergence starting to happen between multi-core x86 processors, GPUs and the Cell processor," said Papakipos. "If you look at those three processors today, they all look pretty different. But if you look forward a few years, they're all going to the same place."

-----

As always, comments about HPCwire are welcomed and encouraged. Write to me, Michael Feldman, at editor@hpcwire.com.

Posted by Michael Feldman - December 1 @ 12:00AM

(Digg, Technorati, more)

Discussion

There are 0 discussion items posted.  

Michael Feldman

Michael Feldman is the editor of HPCwire.

More Michael Feldman



Recent Comments

Compairson to Core i7-980X by rsingle

HPC? not so much by ewahl

Re: IBM and HPC by truly64

HPC = servers but a lot more by lawries

Multi core deployment becomes a memory game by truly64

Re: Venture Capital Drought? Not So Much. by Ron Van Holst

Re: Podcast: Cray Awarded Defense Deal; SGI Makes Storage Buy; IBM Invents New Algorithm by Nastyanna

Painful Truth by jeffrey.mcallister

SGI = graphics + HPC by johnbarr

HPC = servers but a lot more by truly64

Oracle SPARC != Fujitsu SPARC by Alan M. Feldstein

Sun & HPC != Oracle & HPC by Merblich

a third vendor for lossless low latency 10GbE fabric by lee.fisher@hp.com

Response to GAH by KevinButerbaugh

Response to KevinButerbaugh by GAH

Response to KevinButerbaugh by GAH

Response to GAH by KevinButerbaugh

Response to bdrupp by KevinButerbaugh

Climate Crisis and Exaflops by bdrupp

Climate Crisis and Exaflops by John Hules

Climate Crisis and Exaflops by GAH

Climate Crisis by KevinButerbaugh

IBM "Brain Simulation" article is not properly presented. by Merritt

563 out of 1206 by vvolkov

Little Iron by gadunk

At least it's not "cloud" by KevinButerbaugh

Native QPI Interface? by commike

Mmmmmm by hellcats

New transistorized IC chip scales. by symmecon

Itanium at IDF by Alan M. Feldstein

Communication time by jnapper

"The financial meltdown and computing" by donpellegrino

Human Models by mdgabriel

High-End SPARC Chip for Scientific Applications by Alan M. Feldstein

RapidMind by Mr LolO

Rapidmind by dminor

Longer run times by JohnWest

re: Algo trading Angst by jshore

Results of Testing by in_the_crease

Feature Articles

The Week in Review

C-DAC announces plans for a petaflop system; IBM researchers are working on vertical integration techniques to extend Moore's Law another 15 years. We recap those stories and more in our weekly wrapup.
Read More...

Moscow State University Supercomputer Has Petaflop Aspirations

The Moscow State University supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of tripling its processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.
Read More...

Intel Ups Performance Ante with Westmere Server Chips

Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.
Read More...

Top Headlines

Australia Commissions Cray Supercomputer

Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...

Intel Partners See 'Easy' Upgrade Path With Xeon 5600 Chips

Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...

AMD: OEMs primed for Opteron 6100s

Mar 17 | The Register | But what about the tier ones? Read more...

Arrival of the Desktop Supercomputer

Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...

Scheduling HPC In The Cloud

Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

Blogs by Topics

Blogs by Author

HPC Blogroll



Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Lab
ESC
DEISA PRACE Symposium