Aspen
NCSA
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

Run-Up to Petaflops


There is no other way to characterize this time in high performance computing: 2008 will be remembered as "the year" -- the year that one petaflops was achieved in Linpack performance. It is a milestone that has been anticipated for almost a decade and a half, and one that was accomplished through the synthesis of two big trends that have emerged as the driving forces for HPC in the last few years -- multicore and heterogeneous computing.

But there is much more to the events, technical advances, and new initiatives in HPC internationally throughout the last year than simply a single number, no matter how dramatic the milestone. The theme for this year, "Run-Up to Petaflops," has involved a series of interrelated advances in technology, component architecture, and planning for large scale systems that has inaugurated the Petaflops Era. Briefly some of these contributing events are considered here.

This year has marked the next stage in the transition to "multicore, the new Moore's Law" which was last year's theme. Four-core sockets are replacing dual-core as we enter the second generation of the multicore technology base. AMD's Barcelona quad-core chips are now available with new systems being configured to support them and some early generation systems being upgraded to exploit them for a mid-life kicker. The Intel Clovertown chip, also a quad-core Xeon processor, is now being incorporated as well. From IBM, the new Power6 architecture on 65 nanometer technology is designed to be configured with up to 16 cores and is establishing new industry clock rates from 3.5 GHz to 4.7 GHz.

The move to 45 nanometer technology has been a hallmark of 2008 with major vendor offerings being announced and prepared for delivery for the second half of this year. Intel's new fabrication line in Chandler, Ariz., will provide high-volume manufacturing of 45 nanometer components. Intel introduced its hafnium-based high-k metal gate silicon technology for unprecedented low-leakage current. The Dunnington Intel processor will be produced by this process, and will be available in the second half of this year with six cores per socket. AMD's 45 nanometer fab in Dresden, Germany, which uses full-field EUV lithography, will produce the quad-core "Shanghai" by the second half of 2008. This is to be followed by the six-core Istanbul processor in 2009. IBM is projected to release the Power7 Processor in 2010, which has been developed in part with DARPA HPCS funding.

Heterogeneous computing in its various forms has captured the imagination of the supercomputing community with the excitement of outstanding raw performance, tempered only by a realistic concern about programming methodologies. ClearSpeed has introduced its second generation SIMD attached array processor, significantly improving its interconnect bandwidth and optimizing the average power dissipation. The ClearSpeed accelerators are an important component in the Japanese TSUBAME 100 teraflops system. NVIDIA is moving toward a GPU in every PC with its GeForce series delivering 10x or better speed-ups on some application kernels. IBM has introduced its important upgrade to the original Cell architecture used in the Sony Playstation3 game product. The new PowerXCell 8i processor chip combines both heterogeneity and multicore to provide a tour de force in processor technology. But most important to the supercomputing community and market is its upgraded SPE core that includes full 64-bit floating point arithmetic units at 12.8 gigaflops peak performance. That works out to 100 gigaflops across the eight SPE cores, which are integrated with a separate PowerPC core for general services.

Over the last year, the international community has established a multi-initiative, world-wide set of programs to harness the power of these technologies to deliver petaflops capability into the hands of real-world users in science, technology, commerce, and defense applications. In the last year, the fastest general-purpose machine, Blue Gene/L at LLNL, was upgraded by IBM to exceed half a petaflops peak performance, delivering 478 teraflops of sustained Linpack performance. The fastest machine in Europe is the next generation of this family of systems, Blue Gene/P at the Julich Research Centre in Germany. Called "JUGENE," this system of almost a quarter of a petaflops peak capability has delivered 167 teraflops sustained with 32 terabytes of main memory. This new Blue Gene generation system incorporates the new 850 MHz quad-core PowerPC 450.

The trend of upgrading existing systems has proved to be an important path to extending the useful lifetime of major systems, providing superior capability at a fraction of the cost to end users and agencies. The 124 teraflops Red Storm system at Sandia National Laboratory that was the prototype for the major line of XT Cray systems is scheduled to be augmented to a peak capability of between 250 to 284 teraflops, using quad-core AMD Opterons. And the Earth Simulator, one of the most important systems on the TOP500 list is to be upgraded by NEC to a full capability of 131 teraflops by early next year.

In Japan, the new Keisoku program will be managed by Riken and will involve the collaboration of Hitachi, NEC, and Fujitsu. The goal is to build a 10 petaflops machine to be deployed in Kobe in 2012.

The U.S. National Science Foundation has selected IBM to provide its leadership-class "Blue Waters" system to be deployed at UIUC in 2011. That system is to be based on technology developed under the IBM PERCS project, which is sponsored by the DARPA HPCS Program. NSF will also install a second mid-range HPC system in Tennessee based on advanced Cray architecture.

In 2007, India deployed its first top 10 system, named "Eka," at the Computational Research Laboratories, Tata Sons. That machine uses the HP Blade Cluster Platform 3000 BL460c and delivers a peak performance of 170 teraflops. China continues its steady advance in the HPC arena with the installation of a series of significant terascale systems, including a 38 teraflops Intel Woodcrest-based IBM BladeCenter. Equally interesting is the development of their Loongson-2E CPU chip on 90 nanometer process technology.

But the big news -- well timed for ISC -- is Roadrunner, the fastest machine in the world and the first system to achieve one petaflops Linpack performance. Roadrunner, which will be deployed at Los Alamos National Laboratory, was developed under DOE contract by IBM and marks the first major system to rely principally on a heterogeneous architecture to achieve its performance. Based on the IBM PowerXCell 8i described above, and the AMD Opteron, this breakthrough machine delivers 1.3 petaflops peak performance.

Even as the achievement of a petaflops is being heralded as the entry into a new era of high performance computing, the challenges of exascale computing are being explored by the community. As reported last year, both DOE and DARPA undertook to study the application, technology, system requirements, and implications of sustained exaflops computer implementation and operation. The studies demonstrated the importance of such capability to many applications critical to science, technology, and society. But these early investigations also exposed the daunting technological challenges confronting any such endeavor.

While numbers can vary significantly depending on underlying assumptions, representative estimates from a number of sources suggest power consumption in the range of 120 megawatts (+/- 50 percent), concurrency at the multi-billion-way level of parallelism, number of cores between 100 million and 500 million, and system-wide latencies in the tens of thousands of cycles.

The expected dates for such systems are as aggressive as the middle of next decade. Extrapolation of the TOP500 list suggests a deployment at the end of the decade. With concerted effort, an ambitious but not unrealistic deployment could occur in 2018. But this will require real research investment programs to be initiated within the next year and a half. It is hard to believe, but it may be possible that the authors will be writing an HPCwire article a decade from now about the year that was the "Run-Up to Exaflops."

Sponsored Links

High-Performance Computing in Action
Businesses that want to be on the cutting edge of their industries are increasingly turning to high-performance computing (HPC) solutions to handle complex compute processes and speed up their rate of innovation. Download this Executive Brief to see how businesses in energy, life sciences and entertainment put HPC solutions to work in their operations.

Webinar: Programming Heterogeneous X64+GPU Systems Using OpenACC
Join Michael Wolfe as he compares the advantages and costs of using both low-level models and the directive-based OpenACC model for programming accelerated heterogeneous systems. Registration is free.

Accelerate your science with Seneca
One of the first HPC providers installing a 4X NVIDIA Kepler K-20 cluster. Invites you to a free evaluation on Seneca’s NVIDIA K20 Kepler cluster, pre-loaded with AMBER, NAMD, LAMMPS

May 24, 2013

May 23, 2013

May 22, 2013

May 21, 2013

May 20, 2013

May 17, 2013

May 16, 2013

May 15, 2013

May 14, 2013

May 13, 2013


Most Read Features

Most Read Around the Web

Most Read This Just In

Supermicro

Short Takes

NASA Builds 'Climate in a Box'

May 23, 2013 | The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
Read more...

Building Supercomputers with Raspberries

May 22, 2013 | At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
Read more...

Running Computational Fluid Dynamics in the Cloud

May 16, 2013 | When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...

Computing the Physics of Bubbles

May 15, 2013 | Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

SGI DMF ZeroWatt Disk Solution

In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

SC12 Editorial Feature HPCwire Soundbite sponsored by ISC

HPC Job Bank


Featured Events


  • June 16, 2013 - June 20, 2013
    ISC'13
    Leipzig,
    Germany

  • June 17, 2013 - June 18, 2013
    Forecast 2013
    San Francisco, CA
    United States





HPCwire Events