The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
April 06, 2007
Imagine working as a clerk in an electronics store, and a customer comes up to you and says, "I'm very interested in one of your new 8.1 surround sound systems, but it has to be able to play eight-track tapes. I have a lot invested in those tapes, all stacked up in my garage. I'm not buying any of these new audio systems until they play my eight-track tapes in full surround sound."
It sounds too ridiculous to actually happen, but I hear the equivalent almost every day in high performance computing: some customer demands that the most current and advanced hardware runs algorithms that date from the Jimmy Carter administration, and runs them well.
Discarding Dated Algorithms
There have been some amazing developments in computer design (like low-cost vector floating point hardware), but they come with a catch: you'll have to change your idea about "best" algorithm if you really want to stay ahead in this game. In all the talk about how supercomputing helps us compete in the world economy, we often ignore one area: We cling to programming ideas that are decades out of date, and that intellectual inertia holds us back far more than the hardware does. It's not just that we keep legacy software a long time; it's that we hold onto legacy algorithms long past the time when they were a good match for their hardware. Discarding dated algorithms may be one of the most promising ways to advance our productivity.
The thing that makes algorithms go out-of-date is when the relative costs of computing resources changes dramatically. Imagine that the year is 1967. Single processors are very expensive, costing hundreds of dollars per hour. Programmers and users are relatively cheap; they cost a few dollars per hour. Memory is very expensive; you can spend a million dollars for a megabyte of RAM. Memory access is cheap... so cheap that the time it takes is hidden by the time the computer grinds away doing arithmetic. In that era, programming sophisticated algorithms to save memory and FLOPS made a lot of sense. Compare this situation to today, four decades later:
We can point to some specific places where it may be time to change our ideas about those classic "best" algorithms we learned in our computer science courses.
Sparse Equations: Go Ahead, Multiply by Zero
I recently saw a presentation by a researcher struggling to solve a sparse system of linear equations. He assumed that the many weeks he was spending to operate only on the nonzero entries in his system would lead to higher performance. Like all sparse system programmers, his intent was to spare the computer from having to multiply by zero values and add that zero result, an obvious waste of a perfectly good arithmetic unit. To his chagrin and frustration, he discovered he had made his program slower, because now he was spending much more time "pointer chasing"... and he complained bitterly that computer vendors did not know how to build computers correctly because it should not take long to get an index out of memory and use that index to look up the value.
Evidence is mounting that sparse solvers deserve reexamination. A paper published just this month by a prominent HPC researcher studied the use of FPGAs to reduce the time to solve sparse systems. A system of 4000 symmetric equations, 2 percent nonzero elements, took 67.3 seconds to solve on a dual-Xeon server using a sparse solver. After programming the FPGAs in VHDL and using other tools, the author was able to reduce the time to 33.8 seconds. Clever... but I noticed he did not mention how long it would have taken to simply solve it with the Linpack solver on that server, ignoring the sparsity and simply treating it as a dense, nonsymmetric system: Less than three seconds!
Let that soak in for a minute. Even with the unnecessary arithmetic on all those zero elements, even ignoring the symmetry, the "wasteful" dense method is over an order of magnitude faster. The researcher spent thousands of dollars of his time to force-fit an old algorithm into a modern architecture, and wound up with a solution that is over ten times slower and hundreds of times more costly to implement. We are simply optimizing the wrong things. So go ahead, multiply by zero. Complicating the control flow to save scalar operations no longer pays.
Page: 1 of 4(Digg, Technorati, more)
PGI Accelerator™ Fortran 95/03 and C99 compilers for x64+NVIDIA
Accelerate applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing Fortran and C programs. Available now for Linux, MacOS and Windows. Download a free 15 day trial.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager integrates all the cluster productivity tools you need to deploy, run and manage your HPC environment.
Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...
Mar 17 | The Register | But what about the tier ones? Read more...
Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...
Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...
Mar 16 | Bio-IT World | Biotech firm builds genetic models from patient data. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html