HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Features

Understanding the Different Acceleration Technologies


Page:  1  of  5
1 | 2 | 3 | 4 | 5   All  »  

The Coprocessor Revival

The idea of using specialized coprocessors to accelerate general-purpose computers for specific applications is at least as old as the attached processors of the late 1970s and early 1980s. Back then, a DEC or IBM minicomputer with a peak speed of less than a megaflop could become a "poor man's supercomputer" by adding a cabinet full of hardware designed for floating-point operations. And don't forget that Intel's original foray into serious FLOPS was when John Palmer convinced Intel to build a "coprocessor," the 8087 chip, to kick up the speed of the 8086 on technical applications. As transistor sizes shrank, vendors found it less of a burden to integrate what once had been prohibitively bulky and expensive hardware for fast arithmetic, and coprocessors faded from view.

Mark Twain once said, "History doesn't repeat itself, but it does rhyme." It's two decades later, and coprocessors are back with a vengeance. This time, the reasons are different: computing is increasingly limited by power consumption, cooling, space and weight; if you know how your workload is different from the general one, you can exploit that difference to get far more computing done within those limits, by applying the right accelerator technology.

The Questions to Ask

All accelerators are good... for the purpose for which they were designed. The old saying "if you give a five-year-old a hammer, everything starts to look like a nail" comes to mind when we see attempts to use accelerators outside their intended range. Some of the things to ask in considering the fitness of an accelerator for a particular purpose are:

  • Is my main data type floating-point or integer, and what precision do I need?

  • How much data needs to be local to the accelerator?

  • Does existing software meet my needs or will I have to write my own?

  • If I have to write my own software, are the tools mature and complete enough for my tastes (or the abilities of my programmers)?

  • Am I trying to improve performance, or do I mainly want to improve the ratio of performance to something else (like power consumption, price, or footprint)?

  • Does my system (the one I own, or the one I plan to buy) have spare sockets or spare PCI slots that might accommodate accelerators, and will the accelerators fit them?

  • Is the accelerator compatible with the "big-endian" or "little-endian" native byte ordering of the host?

  • Will the performance still be higher once I include the time to move data from the host to the accelerator and back?

The last one is so fundamental to the use of accelerators, it deserves its own section.

Bandwidth: Is This Trip Necessary?

Adding an accelerator is much like adding another node to a computing cluster, in that you weigh the cost of sending data to it against the benefit of the extra processing power. Whether an accelerator fits in a socket on the motherboard or a PCI slot or even an extension chassis, you have to ask: Will it really pay to use this, including the time to move the data there and back? And is an accelerator really better than simply adding another node to the cluster?

It's elementary computer architecture to figure out how much computation per data point you have to have to amortize (or overlap) the time to move the data to and from a processor resource. I like to think about this "grain size" issue in terms of a simple dot product of two vectors of length N. If I have two processors, and can divide the vectors to reside on separate processors and then communicate to get the final sum, how big does N have to be for two processors to be faster than a single one? On a typical cluster with about 2 microseconds of latency between nodes, N can easily be in the thousands of elements. It's the same way with accelerators. Accelerators are for substantial tasks, not small-grain stuff like computing the cosine of a single number. Even if you have an infinitely fast accelerator in a low-latency socket on the motherboard, 10 nanoseconds away, a modern general-purpose chip can do 200 floating-point operations in the time it takes to get operands to the socket and the result back.

Too often I see benchmark specifications for accelerators that clearly don't take into account the time to get the data in and out. So be careful. This is especially true for Fast Fourier Transforms (FFTs), which really don't perform very many floating-point operations per data point. And remember what you're comparing against: The current crop of mainstream microprocessors can produce over 20 64-bit GFLOPS per socket, so if you don't do your homework (understanding the match between what you are doing and the specific capabilities of an accelerator), you can easily find that your accelerator solution is more like, well, a decelerator.

Page:  1  of  5
1 | 2 | 3 | 4 | 5   All  »  

HPCwire on Twitter

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  

HPC in the Cloud Part 2
People to Watch 2010


Top Headlines

AMD: OEMs primed for Opteron 6100s

Mar 17 | The Register | But what about the tier ones? Read more...

Arrival of the Desktop Supercomputer

Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...

Scheduling HPC In The Cloud

Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...

Tailoring Medicine with Supercomputers

Mar 16 | Bio-IT World | Biotech firm builds genetic models from patient data. Read more...

Gelsinger Stuns Analysts and Colleagues with Storage Pool Plan

Mar 15 | The Register | EMC's grand vision for unified global storage. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

SC09 HPC in the Cloud

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Lab
ESC
DEISA PRACE Symposium