The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
HPC Matters is a joint blog consisting of contributors from the Tabor Communications team on their observations and insights into HPC matters.
June 05, 2008
Wile E. Coyote is doomed. Hanging in space, he is about to fall, and everyone knows it but him. We all saw it coming. Poor Coyote.
Yet strangely, he doesn't fall right away. According to the alternate-reality rules of cartoon physics, the Coyote must first look down and realize he is standing in thin air. He then has time to gather his thoughts, issue a final desperate wave, and then finally -- poof! -- he plummets body first, leaving his head in the frame for the viewers to witness a comical last-second grimace before that too disappears.
Know what else we saw coming? The crash in HPC application performance that is being brought about by the transition to multicore processors. We've been watching the race, as applications (Codus productivus) desperately chased processors (Waferii siliconium) up the performance mountain. Suddenly multicore came and -- meep! meep! -- the CPUs put on a burst of speed and zoomed around a bend, leaving application software headed for a cliff. HPC users were doomed. Everyone knew it. Poor users.
What's this? Application performance hasn't dramatically suffered? Users are satisfied with the performance they're getting? How is this possible? The answer: cartoon physics.
According to our most recent research, the reason performance hasn't plummeted is that users haven't been forced to deal with the problem yet. Rather than introducing a new level of parallelism at the socket level, most users have responded by running separate jobs on each core. Sure, they're buying a lot more memory to do that -- configured memory per core is staying relatively stable, and therefore configured memory per socket is skyrocketing -- but at least the application is scaling. For now.
We've gone off a cliff; we just don't know it yet. Because those cores aren't getting any faster, we're soon going to come to grips with the reality that new tools or programming models are needed in order to keep up the race. Look down, everybody. The ground isn't there. Now is the time to hold up a little "Oh, no!" sign and wave to the camera.
This is going to hurt, but fear not. The Coyote is resilient, and he always comes up with a new scheme. Soon he'll be back in the race and chasing right behind the Road Runner again.
The ISC conference in Dresden is coming up, and the new things I'll most want to see are tools for improving application performance yield in large-scale, multicore systems. Acme Application Optimizers, anyone?
Posted by Addison Snell - June 5 @ 8:38PM
(Digg, Technorati, more)
There are 4 discussion items posted.
Submitted by $user.username on 06/09/2008 - 6:33AM
Multicore poses fundamental challenges for industrial HPC, but what you're observing highlights the importance of real physics, not cartoon physics. I'm expected to deliver performance, but above all I have to insure the physical realism that 10 or more years of development put in the code.
Even Wile E. Coyote couldn't devise a way to make the software life cycle keep pace with hardware. What we can do is strategic action to be sure we're somewhere else by the time the falling anvil hits.
Post #1
Submitted by $user.username on 06/10/2008 - 9:09AM
Addison - good post and none too soon.
The current workaround your research describes isn't sustainable b/c the added memory per processor is eating into the all too constrained power/flop ratio.
Unfortunately, the harsh reality is that the applications themselves will have to get more efficient.
eeeeegaaads! not that perhaps we need to introduce mini the mooch into this cartoon special
Post #2
Submitted by tuccillo on 06/11/2008 - 9:43AM
While the clock speed of the cores may not be increasing (much) with the continued rise in the number of cores per chip, the availability of more cores can result in increased performance for parallel applications. Of course, if your favorite application isn't parallelized using a message passing or threads approach or it's scalability is extremely limited then you are out of luck. I would argue that there are a substantial amount of message passing codes that do exhibit good scalability and their problem sizes are only getting larger (which should allow for the application of more cores). Clearly the trend is for more available cores so the pressure is on to improve scalability of your code if you need increased performance. The upcoming Intel Nehalem should address the rather disturbing trend towards less memory bandwidth per core as the number of cores per chip has increased.
Post #3
Addison Snell is the Vice President and General Manager of Tabor Research, Inc.
More Addison Snell
Rediculous notion! by jimmymac
The benchmark is completely wrong. by Patrick LEE
The benchmark is completely wrong. by Patrick LEE
SiCortex / Betamax by KevinButerbaugh
Good Luck to Silicon Graphics by Rick_Mandahl
It's About Realism not Speed by cyberdyne
SGI, Not Alone by EricS
Re: Obama Pushes Science Agenda by lwalker701
The battleground... by rgreen1
How it went wrong for SGi by atzanov
Harder than chess by addisonsnell
Debt consolidation by EliasV
Re: Recession Takes a Bite Out of Supercomputing by CooperO
How it went wrong for SGi by shawnu
How it all went wrong for SGI by jmh900
Torn between IRIX and Linux by Merblich
Sun Microsystems by IsaacU
New Search Engine Duck Duck Go by yegg
GlobalFoundries and IBM ? by gutiea
GlobalFoundries and IBM ? by gutiea
HPC Market by Flamingo
Fusion Cloud Rendering by gary@amd
Fusion Cloud Rendering by gary@amd
Not cores, but memory! by dmpase
Are you on Intel's payroll? by jimmymac
anchos by addisonsnell
anchos? by in_the_crease
Here's to Cray accuracy over HPCwire's. by taylors
Tech community prefers Pepsi to Coke by cogsci
Petascale Computing: Algorithms and Applications, edited by David A. Bader, is the first book in CRC's Computational Science Series, edited by Horst Simon. Although the book is a collection of papers, Bader has done an excellent job of creating a compilation that holds together and covers a broad topic very well.
Read More...
Cilk++ used in parallelization of the FP-tree algorithm for pattern mining; Istanbul benchmark results posted; and the latest on the NVIDIA Tesla shortage. John West recaps those stories and more in our weekly wrap-up.
Read More...
Last week's International Supercomputing Conference (ISC'09) was a convenient excuse for vendors to announce a raft of new products, but three, in particular, stood out.
Read More...
Jul 06 | The Register | NSA looks to tap into cheap electrical power for new supercomputers. Read more...
Jul 06 | TechRadar | Breaking the exaflops barrier will help keep the nation's nuclear weapons safe. And that's just the start. Read more...
Jul 01 | GenomeWeb Daily News | The popularity of cloud computing in the life sciences community was on full display at April's Bio-IT World conference. Read more...
Jul 01 | Linux Magazine | How can getting to the ocean help with HPC computing? Read more...
Jun 29 | GCN.com | Agency issues RFI for "Ubiquitous High Performance Computing" systems. Read more...
Apr 14 | | Many HPC IT departments are feeling the rising pressure to deliver more capacity computing and performance while trying to reduce the total cost of ownership. This white paper discusses how an environmentally-friendly and open-standards HPC building block based computing system using flexible interconnect options helps address capacity computing needs.
Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell
Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.
Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.