HPCwire

Leading HPC
Solution Providers





















HPCwire >> Features

Understanding the Different Acceleration Technologies


Page:  1  of  5
1 | 2 | 3 | 4 | 5   All  »  

The Coprocessor Revival

The idea of using specialized coprocessors to accelerate general-purpose computers for specific applications is at least as old as the attached processors of the late 1970s and early 1980s. Back then, a DEC or IBM minicomputer with a peak speed of less than a megaflop could become a "poor man's supercomputer" by adding a cabinet full of hardware designed for floating-point operations. And don't forget that Intel's original foray into serious FLOPS was when John Palmer convinced Intel to build a "coprocessor," the 8087 chip, to kick up the speed of the 8086 on technical applications. As transistor sizes shrank, vendors found it less of a burden to integrate what once had been prohibitively bulky and expensive hardware for fast arithmetic, and coprocessors faded from view.

Mark Twain once said, "History doesn't repeat itself, but it does rhyme." It's two decades later, and coprocessors are back with a vengeance. This time, the reasons are different: computing is increasingly limited by power consumption, cooling, space and weight; if you know how your workload is different from the general one, you can exploit that difference to get far more computing done within those limits, by applying the right accelerator technology.

The Questions to Ask

All accelerators are good... for the purpose for which they were designed. The old saying "if you give a five-year-old a hammer, everything starts to look like a nail" comes to mind when we see attempts to use accelerators outside their intended range. Some of the things to ask in considering the fitness of an accelerator for a particular purpose are:

  • Is my main data type floating-point or integer, and what precision do I need?

  • How much data needs to be local to the accelerator?

  • Does existing software meet my needs or will I have to write my own?

  • If I have to write my own software, are the tools mature and complete enough for my tastes (or the abilities of my programmers)?

  • Am I trying to improve performance, or do I mainly want to improve the ratio of performance to something else (like power consumption, price, or footprint)?

  • Does my system (the one I own, or the one I plan to buy) have spare sockets or spare PCI slots that might accommodate accelerators, and will the accelerators fit them?

  • Is the accelerator compatible with the "big-endian" or "little-endian" native byte ordering of the host?

  • Will the performance still be higher once I include the time to move data from the host to the accelerator and back?

The last one is so fundamental to the use of accelerators, it deserves its own section.

Bandwidth: Is This Trip Necessary?

Adding an accelerator is much like adding another node to a computing cluster, in that you weigh the cost of sending data to it against the benefit of the extra processing power. Whether an accelerator fits in a socket on the motherboard or a PCI slot or even an extension chassis, you have to ask: Will it really pay to use this, including the time to move the data there and back? And is an accelerator really better than simply adding another node to the cluster?

It's elementary computer architecture to figure out how much computation per data point you have to have to amortize (or overlap) the time to move the data to and from a processor resource. I like to think about this "grain size" issue in terms of a simple dot product of two vectors of length N. If I have two processors, and can divide the vectors to reside on separate processors and then communicate to get the final sum, how big does N have to be for two processors to be faster than a single one? On a typical cluster with about 2 microseconds of latency between nodes, N can easily be in the thousands of elements. It's the same way with accelerators. Accelerators are for substantial tasks, not small-grain stuff like computing the cosine of a single number. Even if you have an infinitely fast accelerator in a low-latency socket on the motherboard, 10 nanoseconds away, a modern general-purpose chip can do 200 floating-point operations in the time it takes to get operands to the socket and the result back.

Too often I see benchmark specifications for accelerators that clearly don't take into account the time to get the data in and out. So be careful. This is especially true for Fast Fourier Transforms (FFTs), which really don't perform very many floating-point operations per data point. And remember what you're comparing against: The current crop of mainstream microprocessors can produce over 20 64-bit GFLOPS per socket, so if you don't do your homework (understanding the match between what you are doing and the specific capabilities of an accelerator), you can easily find that your accelerator solution is more like, well, a decelerator.

Page:  1  of  5
1 | 2 | 3 | 4 | 5   All  »  

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  

Sponsored Links

White Paper: HPC in a Green and Modular Solution Building Block
Learn how the Appro GreenBlade™ System helps consolidate server, storage, network, power and simplified management capabilities in a single package while providing the performance-density, energy-efficiency and best ROI for your business.



Top Headlines

Cloudy With a Chance of HPC

Jul 01 | GenomeWeb Daily News | The popularity of cloud computing in the life sciences community was on full display at April's Bio-IT World conference. Read more...

HPC From the Beach

Jul 01 | Linux Magazine | How can getting to the ocean help with HPC computing? Read more...

DARPA Investigates Extreme Supercomputing

Jun 29 | GCN.com | Agency issues RFI for "Ubiquitous High Performance Computing" systems. Read more...

Supercomputers Go From Biggest to Cheapest

Jun 29 | Computerworld | The bottom of the TOP500 reveals the coming revolution in truly accessible high-end computing. Read more...

CPUs Gear Up For -- and Some Avoid -- Hot Chips

Jun 18 | EE Times | Parallel software also takes spotlight at Stanford confab. Read more...

Featured Whitepapers

Building High Performance Computing in a Green and Modular Solution Building Block

Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.

Multimedia

Webcast: Dell Expands HPC Access and Adoption with Intel Cluster Ready Program


Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell

Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.

Video White Paper: Architecting a Better Network Storage Solution

BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.

Webcast: HPC Development Solutions: Sun Studio & Sun HPC ClusterTools


Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.

Special Feature: ISC'09

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events


WORLDCOMP 2009
Data Mining Courses