The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
November 14, 2006
The Coprocessor Revival
The idea of using specialized coprocessors to accelerate general-purpose computers for specific applications is at least as old as the attached processors of the late 1970s and early 1980s. Back then, a DEC or IBM minicomputer with a peak speed of less than a megaflop could become a "poor man's supercomputer" by adding a cabinet full of hardware designed for floating-point operations. And don't forget that Intel's original foray into serious FLOPS was when John Palmer convinced Intel to build a "coprocessor," the 8087 chip, to kick up the speed of the 8086 on technical applications. As transistor sizes shrank, vendors found it less of a burden to integrate what once had been prohibitively bulky and expensive hardware for fast arithmetic, and coprocessors faded from view.
Mark Twain once said, "History doesn't repeat itself, but it does rhyme." It's two decades later, and coprocessors are back with a vengeance. This time, the reasons are different: computing is increasingly limited by power consumption, cooling, space and weight; if you know how your workload is different from the general one, you can exploit that difference to get far more computing done within those limits, by applying the right accelerator technology.
The Questions to Ask
All accelerators are good... for the purpose for which they were designed. The old saying "if you give a five-year-old a hammer, everything starts to look like a nail" comes to mind when we see attempts to use accelerators outside their intended range. Some of the things to ask in considering the fitness of an accelerator for a particular purpose are:
The last one is so fundamental to the use of accelerators, it deserves its own section.
Bandwidth: Is This Trip Necessary?
Adding an accelerator is much like adding another node to a computing cluster, in that you weigh the cost of sending data to it against the benefit of the extra processing power. Whether an accelerator fits in a socket on the motherboard or a PCI slot or even an extension chassis, you have to ask: Will it really pay to use this, including the time to move the data there and back? And is an accelerator really better than simply adding another node to the cluster?
It's elementary computer architecture to figure out how much computation per data point you have to have to amortize (or overlap) the time to move the data to and from a processor resource. I like to think about this "grain size" issue in terms of a simple dot product of two vectors of length N. If I have two processors, and can divide the vectors to reside on separate processors and then communicate to get the final sum, how big does N have to be for two processors to be faster than a single one? On a typical cluster with about 2 microseconds of latency between nodes, N can easily be in the thousands of elements. It's the same way with accelerators. Accelerators are for substantial tasks, not small-grain stuff like computing the cosine of a single number. Even if you have an infinitely fast accelerator in a low-latency socket on the motherboard, 10 nanoseconds away, a modern general-purpose chip can do 200 floating-point operations in the time it takes to get operands to the socket and the result back.
Too often I see benchmark specifications for accelerators that clearly don't take into account the time to get the data in and out. So be careful. This is especially true for Fast Fourier Transforms (FFTs), which really don't perform very many floating-point operations per data point. And remember what you're comparing against: The current crop of mainstream microprocessors can produce over 20 64-bit GFLOPS per socket, so if you don't do your homework (understanding the match between what you are doing and the specific capabilities of an accelerator), you can easily find that your accelerator solution is more like, well, a decelerator.
Page: 1 of 5(Digg, Technorati, more)
PGI Accelerator™ Fortran 95/03 and C99 compilers for x64+NVIDIA
Accelerate applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing Fortran and C programs. Available now for Linux, MacOS and Windows. Download a free 15 day trial.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager integrates all the cluster productivity tools you need to deploy, run and manage your HPC environment.
Mar 17 | The Register | But what about the tier ones? Read more...
Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...
Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...
Mar 16 | Bio-IT World | Biotech firm builds genetic models from patient data. Read more...
Mar 15 | The Register | EMC's grand vision for unified global storage. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html