The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
May 07, 2008
My prediction: High performance computing will soon be dominated by accelerator-based systems.
You may ask: Why will accelerators be better than multicore processors? Why now and not ten years ago? Why accelerators and not exciting new processors? Who will produce them? And how will we migrate to accelerators and how will we program them? I'll answer these questions in order.
Why accelerators? The market for accelerators has always been, and likely always will be, much smaller than the market for commodity processors. This means that they don't have a cost model that will support legions of designers and billion-dollar fab plants to use the very latest technology; accelerators will always be, technologically, one or two generations behind the big chip vendors. The big vendors must go after the high volume market to generate the revenue to pay for the most aggressive chip technology. Face it, HPC is not a high volume business.
Given that the clock race has ended, we move to multicore. Multicore designs aimed at the general-purpose market look a lot like shared memory multiprocessors, except with less total memory bandwidth. I've heard vendors and researchers point out that we could increase the core count dramatically if we're willing to simplify the cores by eliminating superscalar instruction issue, speculative and out-of-order instruction execution, register renaming, and so on. This is true, but look at what we're accomplishing. Today's aggressive processors manage many levels of low-level parallelism: multiple instructions issued simultaneously, dozens (perhaps a hundred) instructions in flight, speculative memory loads and branch prediction, all managed and synchronized by hardware at clock granularity. We can remove all that and move to software-managed parallelism. Software thread creation, software speculation (and mis-speculation, including squashing), software synchronization, all because the multiple cores have no hardware support for parallelism.
But accelerators can use proven chip technology with lower cost. This allows them to attack smaller markets, even niche markets. Where a major chip vendor makes its bread and butter with binary compatibility, an accelerator can (indeed, must) make up what it lacks in technology using architecture. Today's Clearspeed and programmable GPUs use multiple SIMD cores with high bandwidth memory. Any number of possible architectures may be replayed in the accelerator arena. These designs embrace parallelism in a way that multicore designs don't -- or won't. Hardware support for thread creation, synchronization, and so on make small-grain parallelism feasible.
Why now? Previously, the general-purpose chip vendors could always stay a step ahead, or only slightly behind, the accelerators just using clock rates. Who would make the considerable investment in an accelerator when the next generation processor would be just as fast?
Today's equations lean toward accelerators. Chips aren't getting faster, just fatter. We're going to have to invest in parallelism to get any performance increase, with multicore or with accelerators. We should invest in a strategy with the best support for parallelism and with the biggest upside. Accelerators depend on parallelism and have integral support for it; multicore processors are aimed at a much broader market, and only incidentally address HPC issues. Moore's law still works, for now, and on-chip density will increase predictably. Since accelerators are farther behind on that curve, they can enjoy more of that benefit.
Why not just new processors? We're back to economics on this question. Trying to develop and market a new processor means migrating a whole software ecosystem. This was done successfully in the RISC revolution of the 1980s, producing today's SPARC and Power processors, among others. More recently, Intel and HP developed Itanium, which has achieved more limited success. Only a few vendors have the resources to develop a new processor with the full support necessary to make it viable, and those vendors have vested interests in their current processor strategy.
However, a processor with an accelerator can still run standard system software and tools. The migration to such a system can be limited to the HPC applications. Most of the cost of the whole system will be in components that would be necessary anyway; the additional cost of the accelerator is relatively low, but the performance boost is compelling.
Whose accelerators? Accelerators come and go. Some focus on particular applications.The CNAPS chip developed by Adaptive Solutions (founded by a former colleague at the Oregon Graduate Institute) was intended for neural network simulations, and was quite successful at accelerating Photoshop functions. GPUs are, and have always been, accelerators for pixel processing. In HPC today, we have Clearspeed and NVIDIA GPUs. I'm going to declare myself neutral on this question, though I envision a growing industry here, as the cost of entry is relatively low. It will be interesting to see what develops with the open HyperTransport and QuickPath interfaces, and what the chip vendors may put on their own silicon.
Page: 1 of 2(Digg, Technorati, more)
PGI Accelerator™ Fortran 95/03 and C99 compilers for x64+NVIDIA
Accelerate applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing Fortran and C programs. Available now for Linux, MacOS and Windows. Download a free 15 day trial.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager integrates all the cluster productivity tools you need to deploy, run and manage your HPC environment.
Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...
Mar 17 | The Register | But what about the tier ones? Read more...
Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...
Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...
Mar 16 | Bio-IT World | Biotech firm builds genetic models from patient data. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html