HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Blogs

Blog: From the Editor

From the Editor | Main Blog Index

Welcome to the Post-Petaflop Era


This week's achievement of the Linpack petaflop milestone by the IBM Roadrunner was widely predicted, but nonetheless, impressive. Last year at this time, the number one system was Lawrence Livemore's Blue Gene/L at 280 teraflops, and only two other systems -- the Cray XT4/XT3 supercomputer at Oak Ridge and the Cray Red Storm system at Sandia -- made it past 100 teraflops. In fact, the raw computation power of the Roadrunner exceeds the aggregate performance of the top 10 system in June 2007.

The nearly insatiable demand for supercomputing power has driven a remarkable increase in HPC capability over the last decade and a half. During this time the computational performance of the top systems have increased at a rate of 1000x for every 10 years. As I mentioned in Monday's Roadrunner coverage, that pace of increase is an order of magnitude greater than that reflected by Moore's Law. Today, Moore's Law is contributing relatively little to processor speed increases; it's being used to add more cores. But even if the chip real estate dedicated to cores scales proportionally as transistors shrink, (which is probably not the case since the memory bandwidth bottleneck encourages larger on-chip caches), that would only yield about a 100x increase in raw performance every 10 years.

Which explains why clusters and supercomputers are scaling both up (more processors and cores) and out (more nodes). But, even ignoring the software challenges of distributing applications over more and more CPUs, just jamming additional commodity processors into a system runs up against physical constraints like power and space, not to mention system cost. It is significant that the first petaflop system was not an x86 cluster.

All of this explains the HPC community's current obsession with hardware accelerators -- FPGA, GPU, Cell, ClearSpeed and vector processors. While not general-purpose in nature, these accelerators offer a lot of computational power in a small, cheap, and energy-efficient package.

In the Roadrunner, each AMD Opteron core is paired with a PowerXCell 8i (Cell) processor, which acts as a high-performance floating point accelerator. But the 12,240 Cell processors can barely be characterized as accelerators since they account for the vast majority of the system's performance. The 6,120 dual-core Opterons contribute only around 3 percent to the total performance. The PowerXCell 8i offers over 100 double precision gigaflops for a modest 92 watts, which is about an order of magnitude better performance and performance/watt than the dual-core Opterons in Roadrunner. So minimizing the Opteron parts was the key to maximizing FLOPS.

But there are other ways to get to a petaflop. In fact, it's not immediately apparent to me why the DOE, who bought the Roadrunner system for Los Alamos and the NNSA, didn't go the Blue Gene/P route. The latter machine represents IBM's other petaflop-capable system, which was introduced a year ago. A handful are in the field, but no one has purchased a petaflop-sized system to date.

The price tag for a petaflop Blue Gene/P would probably be just north of $100 million, in the same general vicinity as the $120 million that the DOE paid for Roadrunner. And the DOE certainly has plenty of experience with Blue Gene technology, so no red flags there. Finally, compared to Roadrunner, Blue Gene comes with a simpler and more mature software environment.

From the application point of view, the biggest difference between the two architectures is that Blue Gene needs more than twice as many processing cores to get to a petaflop than Roadrunner -- about 300K cores for Blue Gene/P versus 120K for Roadrunner (each Cell processor has 9 cores). That means your application needs to be divided into more pieces to run on the Blue Gene than on the more computationally dense Roadrunner. More parallelism might be fine for some apps, but not for others.

Energy efficiencies of the two architectures are comparable. At 376 megawatts/watt, Roadrunner is tops in this regard. But Blue Gene/P comes in at a very respectable 350 megaflops/watt. The energy efficiency of Blue Gene is the result of using low-power ASICs, based on the PowerPC, a type of processor that is more at home in embedded systems.

In general, processors for embedded application are designed for low power rather than speed, but they offer HPC vendors an alternative way to build large-scale energy-efficient systems. SiCortex, for example, is using MIPS processors to create a low-power line HPC clusters.

But as systems get into the tens of petaflops range, even commodity embedded chips won't be practical. Researchers at LBNL estimate that a Blue Gene-like system capable of running an application at 10 petaflops of sustained performance will cost over a billion dollars and require tens of megawatts to operate, even taking into account future price/performance advances. The Berkeley researchers are looking at using ultra-low-power custom processors to make these kinds of systems practical.

As energy costs and hardware costs really start to limit the kind of machines vendors can offer in a post-petaflop world, commodity processors may yield to either accelerators or low-power, homogeneous processors. Over the next ten years, a battle between these two approaches may take place on the path from petaflops to exaflops. But this week, the accelerators won the first round.

Posted by Michael Feldman - June 10 @ 8:23PM

(Digg, Technorati, more)

Discussion

There are 0 discussion items posted.  

Michael Feldman

Michael Feldman is the editor of HPCwire.

More Michael Feldman



Recent Comments

Re: Multicore Watershed by Nastyanna

HPC? not so much by ewahl

Re: Podcast: A Trio of HPC Apps by sibat0705

Re: Podcast: A Trio of HPC Apps by sibat0705

Re: Cray Corrals Big Defense Deal by watchesuk

We think by watchesuk

Re: IBM and HPC by truly64

HPC = servers but a lot more by lawries

Lena by Nastyanna

Lena by Nastyanna

Multi core deployment becomes a memory game by truly64

Re: Venture Capital Drought? Not So Much. by Ron Van Holst

Re: AMD Confirms 12-Core Opteron Production by Nastyanna

Re: Cray Corrals Big Defense Deal by Nastyanna

Re: Podcast: Cray Awarded Defense Deal; SGI Makes Storage Buy; IBM Invents New Algorithm by Nastyanna

Painful Truth by jeffrey.mcallister

SGI = graphics + HPC by johnbarr

HPC = servers but a lot more by truly64

Oracle SPARC != Fujitsu SPARC by Alan M. Feldstein

Sun & HPC != Oracle & HPC by Merblich

a third vendor for lossless low latency 10GbE fabric by lee.fisher@hp.com

Response to GAH by KevinButerbaugh

Response to KevinButerbaugh by GAH

Response to KevinButerbaugh by GAH

Response to GAH by KevinButerbaugh

Response to bdrupp by KevinButerbaugh

Climate Crisis and Exaflops by bdrupp

Climate Crisis and Exaflops by John Hules

Climate Crisis and Exaflops by GAH

Climate Crisis by KevinButerbaugh

IBM "Brain Simulation" article is not properly presented. by Merritt

563 out of 1206 by vvolkov

Little Iron by gadunk

At least it's not "cloud" by KevinButerbaugh

Native QPI Interface? by commike

Mmmmmm by hellcats

New transistorized IC chip scales. by symmecon

Itanium at IDF by Alan M. Feldstein

Communication time by jnapper

"The financial meltdown and computing" by donpellegrino

Human Models by mdgabriel

High-End SPARC Chip for Scientific Applications by Alan M. Feldstein

RapidMind by Mr LolO

Rapidmind by dminor

Longer run times by JohnWest

re: Algo trading Angst by jshore

Results of Testing by in_the_crease

Feature Articles

Moscow State University Supercomputer Has Petaflop Aspirations

The Moscow State University supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of tripling its processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.
Read More...

Intel Ups Performance Ante with Westmere Server Chips

Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.
Read More...

The Week in Review

The ACM Turing Award goes to the creator of the modern personal computer; and Voltaire announces a mid-range InfiniBand switch and new technology that accelerates distributed applications. We recap those stories and more in our weekly wrapup.
Read More...

Top Headlines

AMD: OEMs primed for Opteron 6100s

Mar 17 | The Register | But what about the tier ones? Read more...

Arrival of the Desktop Supercomputer

Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...

Scheduling HPC In The Cloud

Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...

Tailoring Medicine with Supercomputers

Mar 16 | Bio-IT World | Biotech firm builds genetic models from patient data. Read more...

Gelsinger Stuns Analysts and Colleagues with Storage Pool Plan

Mar 15 | The Register | EMC's grand vision for unified global storage. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

Blogs by Topics

Blogs by Author

HPC Blogroll



Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Lab
ESC
DEISA PRACE Symposium