HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Special Features >> HPWS08 >> HPWS08 Blogs

Blog: High Performance on Wall Street Commentary

Commentary from the 2008 High Performance on Wall Street Conference and Exhibition in New York City

High Performance on Wall Street Commentary | Main Blog Index

Intel: CPUs Will Prevail Over Accelerators in HPC


HPC hardware accelerators -- GPUs, FPGAs, the Cell processor, and custom ASICs like the ClearSpeed floating point device -- have captured the imagination of HPC users in search of higher performance and lower power consumption. While these offload engines continue to show impressive performance results for supercomputing workloads, Intel is sticking to its CPU guns to deliver HPC to the broader market. According to Richard Dracott, Intel's general manager of the company's High Performance Computing business unit, CPU multicore processors, and eventually manycore processors, will prevail over accelerator solutions in the financial services industry, as well as for HPC applications in general.

Dracott says he's seen the pattern before where people get attracted to specialized hardware for particular applications. But in the end, he says, general-purpose CPUs turn out to deliver the best ROI. Dracott claims that to exploit acceleration in HPC, developers need to modify the software anyway, so they might as well modify it for multicore. "What we're finding is that if someone is going to go to the effort of optimizing an application to take advantage of an offload engine, whatever it may be, the first thing they have to do is parallelize their code," he told me.

To Intel's credit, the company has developed a full-featured set of tools and libraries to help mainstream developers parallelize their codes for x86 hardware. With the six-core Dunnington in the field today and eight-core Nehalem processors just around the corner, developers will need all the help they can get to fully utilize the additional processing power.

In fact though, adding CPU-based multithreading parallelism to your app tends to be more difficult than adding data parallelism. The latter is the only type of parallelism accelerators are any good at. And if your workload can exploit data parallelism, this can be done rather straightforwardly. With the advent of NVIDIA's CUDA, AMD's Brook+, RapidMind's development platform, FPGA C-based frameworks, and SDKs from ClearSpeed and other vendors, the programming of these devices has become simpler.

And it may get simpler yet. PGI compiler developer Michael Wolfe thinks there is no reason why high-level language compilers can't take advantage of these offload engines. "We believe we can produce compilers that allow evolutionary migration from today's processors to accelerators, and that accelerators provide the most promising path to high performance in the future," he wrote recently in his HPCwire column.

Of course, CPUs are not standing still performance-wise. According to Dracott, when financial customers were asked how long a 10x performance advantage over a CPU-based solution would have to be maintained to make it worth their while, they told him anywhere from 2-3 years up to as much as 7 years. For production environments, the software investment required to bring accelerators into the mix needs to account for re-testing and re-certification. In the case of the financial services industry (because of regulatory and other legal requirements), this can be a significant part of the effort. "And by the time they actually make the investment in the software, the general-purpose [CPU] hardware has caught up," says Dracott.

Maybe. A lot of applications are already realizing much better than a 10x improvements in performance with hardware acceleration. SciComp, a company that offers derivatives pricing software, recently announced a "20-100X execution speed increase" for its pricing models. Other HPC workloads have done even better. And while the CPU hardware will eventually catch up to current accelerators, all silicon is moving up the performance ladder, roughly according to Moore's Law. So the CPU-accelerator performance gap will in all likelihood remain.

Accelerators do have a steeper hill to climb in certain areas though. Except for the Cell processor, where a PowerPC core is built-in, all accelerators require a connection to a CPU host. Depending upon the nature of the connection (PCI, HyperTransport, QuickPath, etc.) the offload engine can become starved for data because of bandwidth limitations. In fact, the time spent talking to the host can eat up any performance gains realized through faster execution. More local store on the accelerator and careful programming can often mitigate this, but the general-purpose CPU has a built-in advantage here.

Dracott points out that the lack of double precision floating point capabilities and error correction code (ECC) memory limits accelerator deployment in many HPC production environments. This is especially true in the financial space, where predictability and reliability of results are paramount. But the latest generation of offload engines all support DP to some degree, and only GPUs have an ECC problem. ClearSpeed ASICs, in particular, have full-throttle 64-bit support plus enterprise-level ECC protection. GPUs, on the other hand, will have to deal with soft error protection in some systematic way to become a more widely deployed solution for technical computing. I've got to believe that NVIDIA and AMD will eventually add this capability to their GPU computing offerings.
 
The shortcomings of accelerator solutions have prevented much real-world deployment in production situations, according to Dracott. He thinks users will continue to experiment with offload engines for several more years, but with the exception of certain application niches, most will eventually end up back at the CPU. But interest in these more exotic solutions remains high in the HPC community. HPCwire's Dennis Barker, at this week's High Performance on Wall Street conference, reports that the hardware accelerator companies were drawing quite a crowd and a number of FPGA-accelerated products are already on the market. "Sellers of these products were all over the place, their booths were busy, and several sessions on the subject were standing-room only," he writes.

And despite Intel's commitment to the x86 CPU and Dracott's take on the future of accelerators, the company has been evolving its position on co-processor acceleration. Intel's (and IBM's) Geneseo initiative to extend PCI Express for offload engines and its plans to license the new QuickPath interconnect technology would seem to indicate that the company hasn't completely discounted acceleration. AMD, of course, has Torenzza, its own co-processor integration technology. Whether Intel is just hedging its bets to counter its rival or is genuinely committed to sharing the computing world with other architectures remains to be seen.

Posted by Michael Feldman - September 24 @ 7:50PM

(Digg, Technorati, more)

Discussion

There are 0 discussion items posted.  

Michael Feldman

Michael Feldman is the editor of HPCwire.

More Michael Feldman



Recent Comments

HPC? not so much by ewahl

Re: Podcast: A Trio of HPC Apps by sibat0705

Re: Podcast: A Trio of HPC Apps by sibat0705

Re: Cray Corrals Big Defense Deal by watchesuk

We think by watchesuk

Re: IBM and HPC by truly64

HPC = servers but a lot more by lawries

Lena by Nastyanna

Lena by Nastyanna

Multi core deployment becomes a memory game by truly64

Re: Venture Capital Drought? Not So Much. by Ron Van Holst

Re: AMD Confirms 12-Core Opteron Production by Nastyanna

Re: Cray Corrals Big Defense Deal by Nastyanna

Re: Podcast: Cray Awarded Defense Deal; SGI Makes Storage Buy; IBM Invents New Algorithm by Nastyanna

Painful Truth by jeffrey.mcallister

SGI = graphics + HPC by johnbarr

HPC = servers but a lot more by truly64

Oracle SPARC != Fujitsu SPARC by Alan M. Feldstein

Sun & HPC != Oracle & HPC by Merblich

a third vendor for lossless low latency 10GbE fabric by lee.fisher@hp.com

Response to GAH by KevinButerbaugh

Response to KevinButerbaugh by GAH

Response to KevinButerbaugh by GAH

Response to GAH by KevinButerbaugh

Response to bdrupp by KevinButerbaugh

Climate Crisis and Exaflops by bdrupp

Climate Crisis and Exaflops by John Hules

Climate Crisis and Exaflops by GAH

Climate Crisis by KevinButerbaugh

IBM "Brain Simulation" article is not properly presented. by Merritt

563 out of 1206 by vvolkov

Little Iron by gadunk

At least it's not "cloud" by KevinButerbaugh

Native QPI Interface? by commike

Mmmmmm by hellcats

New transistorized IC chip scales. by symmecon

Itanium at IDF by Alan M. Feldstein

Communication time by jnapper

"The financial meltdown and computing" by donpellegrino

Human Models by mdgabriel

High-End SPARC Chip for Scientific Applications by Alan M. Feldstein

RapidMind by Mr LolO

Rapidmind by dminor

Longer run times by JohnWest

re: Algo trading Angst by jshore

Results of Testing by in_the_crease

Feature Articles

Intel Ups Performance Ante with Westmere Server Chips

Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.
Read More...

The Week in Review

The ACM Turing Award goes to the creator of the modern personal computer; and Voltaire announces a mid-range InfiniBand switch and new technology that accelerates distributed applications. We recap those stories and more in our weekly wrapup.
Read More...

Florida State Gives Virtual SMPs a Spin

The prospects for virtual SMP technology got another boost last month when Florida State University announced it had installed a new HPC system from 3Leaf Systems. The servers are being housed at the university's HPC facility and will be used across a range of scientific disciplines.
Read More...

Top Headlines

Tailoring Medicine with Supercomputers

Mar 16 | Bio-IT World | Biotech firm builds genetic models from patient data. Read more...

Gelsinger Stuns Analysts and Colleagues with Storage Pool Plan

Mar 15 | The Register | EMC's grand vision for unified global storage. Read more...

Cisco Containers Target Federal Market

Mar 15 | Data Center Knowledge | Company delivers UCS-container solution to NASA. Read more...

GP-GPUs: OpenCL Is Ready For The Heavy Lifting

Mar 11 | Linux Magazine | CUDA may be the rage, but OpenCL is a standard that has some features you may need. Read more...

Can Free Software Drive the Fourth Paradigm?

Mar 09 | Free Software Magazine | Data-driven computing will need open software. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

Blogs by Topics

Blogs by Author

HPC Blogroll



Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Slam
ESC
DEISA PRACE Symposium