The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
From the Editor | Main Blog Index
June 10, 2008
This week's achievement of the Linpack petaflop milestone by the IBM Roadrunner was widely predicted, but nonetheless, impressive. Last year at this time, the number one system was Lawrence Livemore's Blue Gene/L at 280 teraflops, and only two other systems -- the Cray XT4/XT3 supercomputer at Oak Ridge and the Cray Red Storm system at Sandia -- made it past 100 teraflops. In fact, the raw computation power of the Roadrunner exceeds the aggregate performance of the top 10 system in June 2007.
The nearly insatiable demand for supercomputing power has driven a remarkable increase in HPC capability over the last decade and a half. During this time the computational performance of the top systems have increased at a rate of 1000x for every 10 years. As I mentioned in Monday's Roadrunner coverage, that pace of increase is an order of magnitude greater than that reflected by Moore's Law. Today, Moore's Law is contributing relatively little to processor speed increases; it's being used to add more cores. But even if the chip real estate dedicated to cores scales proportionally as transistors shrink, (which is probably not the case since the memory bandwidth bottleneck encourages larger on-chip caches), that would only yield about a 100x increase in raw performance every 10 years.
Which explains why clusters and supercomputers are scaling both up (more processors and cores) and out (more nodes). But, even ignoring the software challenges of distributing applications over more and more CPUs, just jamming additional commodity processors into a system runs up against physical constraints like power and space, not to mention system cost. It is significant that the first petaflop system was not an x86 cluster.
All of this explains the HPC community's current obsession with hardware accelerators -- FPGA, GPU, Cell, ClearSpeed and vector processors. While not general-purpose in nature, these accelerators offer a lot of computational power in a small, cheap, and energy-efficient package.
In the Roadrunner, each AMD Opteron core is paired with a PowerXCell 8i (Cell) processor, which acts as a high-performance floating point accelerator. But the 12,240 Cell processors can barely be characterized as accelerators since they account for the vast majority of the system's performance. The 6,120 dual-core Opterons contribute only around 3 percent to the total performance. The PowerXCell 8i offers over 100 double precision gigaflops for a modest 92 watts, which is about an order of magnitude better performance and performance/watt than the dual-core Opterons in Roadrunner. So minimizing the Opteron parts was the key to maximizing FLOPS.
But there are other ways to get to a petaflop. In fact, it's not immediately apparent to me why the DOE, who bought the Roadrunner system for Los Alamos and the NNSA, didn't go the Blue Gene/P route. The latter machine represents IBM's other petaflop-capable system, which was introduced a year ago. A handful are in the field, but no one has purchased a petaflop-sized system to date.
The price tag for a petaflop Blue Gene/P would probably be just north of $100 million, in the same general vicinity as the $120 million that the DOE paid for Roadrunner. And the DOE certainly has plenty of experience with Blue Gene technology, so no red flags there. Finally, compared to Roadrunner, Blue Gene comes with a simpler and more mature software environment.
From the application point of view, the biggest difference between the two architectures is that Blue Gene needs more than twice as many processing cores to get to a petaflop than Roadrunner -- about 300K cores for Blue Gene/P versus 120K for Roadrunner (each Cell processor has 9 cores). That means your application needs to be divided into more pieces to run on the Blue Gene than on the more computationally dense Roadrunner. More parallelism might be fine for some apps, but not for others.
Energy efficiencies of the two architectures are comparable. At 376 megawatts/watt, Roadrunner is tops in this regard. But Blue Gene/P comes in at a very respectable 350 megaflops/watt. The energy efficiency of Blue Gene is the result of using low-power ASICs, based on the PowerPC, a type of processor that is more at home in embedded systems.
In general, processors for embedded application are designed for low power rather than speed, but they offer HPC vendors an alternative way to build large-scale energy-efficient systems. SiCortex, for example, is using MIPS processors to create a low-power line HPC clusters.
But as systems get into the tens of petaflops range, even commodity embedded chips won't be practical. Researchers at LBNL estimate that a Blue Gene-like system capable of running an application at 10 petaflops of sustained performance will cost over a billion dollars and require tens of megawatts to operate, even taking into account future price/performance advances. The Berkeley researchers are looking at using ultra-low-power custom processors to make these kinds of systems practical.
As energy costs and hardware costs really start to limit the kind of machines vendors can offer in a post-petaflop world, commodity processors may yield to either accelerators or low-power, homogeneous processors. Over the next ten years, a battle between these two approaches may take place on the path from petaflops to exaflops. But this week, the accelerators won the first round.
Posted by Michael Feldman - June 10 @ 8:23PM
(Digg, Technorati, more)
PGI Accelerator™ Fortran 95/03 and C99 compilers for x64+NVIDIA
Accelerate applications on x64+GPU platforms by adding OpenMP-like compiler directives to existing Fortran and C programs. Available now for Linux, MacOS and Windows. Download a free 15 day trial.
Platform HPC Workgroup Manager
Platform HPC Workgroup Manager integrates all the cluster productivity tools you need to deploy, run and manage your HPC environment.
Michael Feldman is the editor of HPCwire.
More Michael Feldman
Re: Multicore Watershed by Nastyanna
HPC? not so much by ewahl
Re: Podcast: A Trio of HPC Apps by sibat0705
Re: Podcast: A Trio of HPC Apps by sibat0705
Re: Cray Corrals Big Defense Deal by watchesuk
We think by watchesuk
Re: IBM and HPC by truly64
HPC = servers but a lot more by lawries
Lena by Nastyanna
Lena by Nastyanna
Multi core deployment becomes a memory game by truly64
Re: Venture Capital Drought? Not So Much. by Ron Van Holst
Re: AMD Confirms 12-Core Opteron Production by Nastyanna
Re: Cray Corrals Big Defense Deal by Nastyanna
Re: Podcast: Cray Awarded Defense Deal; SGI Makes Storage Buy; IBM Invents New Algorithm by Nastyanna
Painful Truth by jeffrey.mcallister
SGI = graphics + HPC by johnbarr
HPC = servers but a lot more by truly64
Oracle SPARC != Fujitsu SPARC by Alan M. Feldstein
Sun & HPC != Oracle & HPC by Merblich
a third vendor for lossless low latency 10GbE fabric by lee.fisher@hp.com
Response to GAH by KevinButerbaugh
Response to KevinButerbaugh by GAH
Response to KevinButerbaugh by GAH
Response to GAH by KevinButerbaugh
Response to bdrupp by KevinButerbaugh
Climate Crisis and Exaflops by bdrupp
Climate Crisis and Exaflops by John Hules
Climate Crisis and Exaflops by GAH
Climate Crisis by KevinButerbaugh
IBM "Brain Simulation" article is not properly presented. by Merritt
563 out of 1206 by vvolkov
Little Iron by gadunk
At least it's not "cloud" by KevinButerbaugh
Native QPI Interface? by commike
Mmmmmm by hellcats
New transistorized IC chip scales. by symmecon
Itanium at IDF by Alan M. Feldstein
Communication time by jnapper
"The financial meltdown and computing" by donpellegrino
Human Models by mdgabriel
High-End SPARC Chip for Scientific Applications by Alan M. Feldstein
RapidMind by Mr LolO
Rapidmind by dminor
Longer run times by JohnWest
re: Algo trading Angst by jshore
Results of Testing by in_the_crease
The Moscow State University supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of tripling its processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.
Read More...
Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.
Read More...
The ACM Turing Award goes to the creator of the modern personal computer; and Voltaire announces a mid-range InfiniBand switch and new technology that accelerates distributed applications. We recap those stories and more in our weekly wrapup.
Read More...
Mar 17 | The Register | But what about the tier ones? Read more...
Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...
Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...
Mar 16 | Bio-IT World | Biotech firm builds genetic models from patient data. Read more...
Mar 15 | The Register | EMC's grand vision for unified global storage. Read more...
Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.
Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.
LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html