HPCwire

Leading HPC
Solution Providers
HPCwire >> Off the Wire

Horst Simon Talks Petascale


At the HPC User Forum meeting in Denver this week, Horst Simon was one of two industry experts asked to provide a larger perspective as representatives of organizations in China, Japan and the U.S. discussed their petascale initiatives. In this Q&A, Simon offers his views about the challenges of achieving usable petascale systems within the next five years.

HPCwire: The U.S. and several other countries have petascale initiatives in play. How realistic is the dual goal of achieving sustained petaflops speed and substantially boosting productivity by 2010?

Simon: There have been multiple announcements of plans for petascale machines in that timeframe, so I'm fairly confident that the sustained petaflop goal will be attained by 2010, meaning that by then there will be a Gordon Bell Prize for a sustained petaflop on a real application on a real platform. I'd be even more comfortable predicting that this will happen by 2011.

As for part two of your question, it depends on your definition of productivity. If productivity means the economic output of a country, then I think 2010 is too early for petascale computing to affect that. If productivity means making efficient use of petascale systems in industry, for example to routinely create better products at lower costs through the use of simulation, this implies that the petascale systems would have good scalability, reliable application and system software, and so on, and I think 2010 will also be too soon for this. There will be many technical challenges, because we're entering a completely new arena of scalability. Getting to productive petaflop performance will be as difficult as it has been getting to productive teraflop performance.

But if productivity means running codes faster and at higher resolution, which can enables scientific breakthroughs, then I'm confident this will happen because of the continued dramatic advances in computational technology based on commodity clusters. Ten years ago, few people thought PC clusters would have the big impact they have had. The same thing will happen with petascale systems in the future. They will become common.

HPCwire: Can anyone really afford a general-purpose sustained petaflop system in the 2009-11 timeframe that several countries are targeting? By general purpose, I mean a system that can sustain petaflop performance on a reasonably broad spectrum of codes.

Simon: Yes, to the part about affording petaflop systems in that timeframe. Petaflop systems are expensive, but not super-expensive if you look at them in relation to other large-scale scientific projects, like particle accelerators or the next-generation space telescope. It costs about $200 million to fund a petascale system today, which is not outrageously high. The bigger question is, what is the optimal time to make that investment? The Earth Simulator was a huge investment of about $400 million and made a big, immediate impact when it went live in 2002. Now, four years later, a 40-teraflop machine is much less expensive. It's important to produce significant results in the first one to two years, so Moore's Law doesn't catch up with the machine. It must be productive quickly.

As for general-purpose, if we look at the ratios of memory and disk that would be needed, and the I/O rates, then a general-purpose petascale system could become much more expensive. But we're approaching an era when the whole notion of general-purpose HPC systems may no longer apply. Instead, there will be commodity clusters for most things, along with opportunities to leverage special-purpose technologies like Blue Gene, which can run specific applications very successfully, or MDGRAPE3 from RIKEN, which arguably is the first petascale system and is highly specialized. In 2010-11, I would expect an increasing trend toward more highly specialized systems.

HPCwire: Can anything be done to alleviate the costs of petascale systems while maintaining their usefulness?

Simon: One big issue for the future is that operational costs have been increasing significantly. We're approaching the point where computers can cost more to operate than their acquisition cost. One big potential area for cost reduction is less power-consuming components to reduce overall cost of ownership. Another one is facility construction. Construction costs have also gone up substantially, and you can save a lot of money if you don't have to build a new facility or heavily modify an existing one.

HPCwire: You mentioned power consumption, and you've stressed the importance of this before in designing petascale systems. Can processors have both leading performance and low power consumption? Are there ways to conserve power that don't involve the processors?

Simon: There are different approaches to this, and some vendors are exploring ways to sharply reduce power consumption, even for entry-level systems. But we as a community have not really looked at power consumption enough. Flops-per-watt is still a relatively new term. There's room for improvement in many areas of design. For example, everyone seems to agree that HPC is moving more toward liquid cooling again.

HPCwire: Partly to address the power consumption issue, there's a growing trend to scale up by using a larger number of slower processors. How does this affect the system's breadth-of-applicability?

Simon: This is exactly the challenge we face. We had a decade of stagnation in looking at parallelism, because high-end systems stayed at the same size -- not more than 10,000 processors. ASCI Red was the first system with about 10,000 processors, and this didn't change until the arrival of Blue Gene. Many applications scale to at best 100 or 1,000 processors, so there's a challenge for applications and system software to make a big leap to hundreds or thousands of processors.

The good news is that this scaling issue is solvable, because there are no physical limitations. The limitations here have to do with creativity, our ability to scale in our thinking. We need to solve this in the next few years, but the HPC community is capable of this. Scaling up also requires significant investments in system software. Just as applications have been stagnating, so has the scalability of system software. But if we think correctly, it doesn't matter whether we use 50 or 50,000 processors. We can conquer this realm of parallelism.

HPCwire: Some engineers have complained about the trend toward slower processors. They say that because their applications don't scale beyond a handful of processors, this trend is actually setting them back instead of moving them forward. Comment?

Simon: This is probably legitimate. There's a big sense of repeating the same cycle as we started in the late 1980s, when the first massively parallel systems arrived and many applications didn't lend themselves to these systems and had to continue running on vectors or SMPs. In my community, climate modelers were reluctant to go parallel. Now, 15 years later, nearly all are massively parallel. Climate modelers have found no inherent limitations to scalability. Another example is the SciDAC program. In that program, five years of investment in application development by the DOE made the DOE community much more ready for higher scaling.

HPCwire: What's your take on the importance of heterogeneous processing?

Simon: This comes back to what I said initially. Heterogeneous processing falls under the general heading of customized architectures, where you customize an architecture for the set of applications you want to run. Heterogeneous processing will be very important and useful, because it lets you build systems that are more optimal, less power-consuming and less expensive systems for solving a given set of problems at high efficiency. Roadrunner is a good example, and ClearSpeed is having success selling their accelerators. We will see more heterogeneous architectures in the future.

HPCwire: What are your summary thoughts about the current "petascale movement"?

Simon: I'll repeat what I've said elsewhere. I'm concerned that the term "petascale" covers such a wide territory. There are multiple stages. There will probably be a peak petaflop system in the next 18 months, then a Linpack petaflop, a sustained petaflop, and later on system that sustains petaflop performance on a wide variety of applications. As HPC insiders, we all understand this, but many politicians and government funders and the general public do not. If we as a community hang our hats too much on the first system with peak or Linpack petaflop performance, people outside the HPC community may conclude that we've conquered the petaflop challenge and it's time to move on to something else. The truth is, it will probably take six to eight more years to sustain a petaflop on a large number of applications, and in the meantime funding might fade away. It's good to have this petascale movement and initiatives in multiple countries to generate enthusiasm and to set a goal to move toward, but we need to keep moving after the first milestone.

-----

Horst Simon is the founding director of Berkeley Lab's Computational Research Division, which conducts applied research and development in computer science, computational science, and applied mathematics. In 1988, he was awarded the Gordon Bell prize for his parallel processing research. He was also a co-developer of the NAS Parallel Benchmarks, a standard for evaluating the performance of massively parallel systems. Currently, Simon is the associate laboratory director for Computing Sciences at Berkeley Lab and the director of NERSC.


Article Tools

  • Print This Article

Share & Save Options

Discussion

There are 0 discussion items posted.  



Feature Articles

The Week in Review

UPenn adds third state to nanowire storage; and UIUC is named the first CUDA Center of Excellence. John West recaps those stories and more in our weekly wrap-up.
Read More...

IBM Looks to Tap Massive Data Streams

Modern civilization is positively drenched in data, some of which needs to be dealt with in real time to be of any value. Businesses, especially in the financial industry, have long recognized this, and have been building custom systems to collect, analyze, and react to information as it is captured. IBM thinks the time is right to generalize these approaches into a new field of computing -- and a new business -- it calls stream computing.
Read More...

Gravity Attracts a GigE HPC Cluster

Not all supercomputing rides on InfiniBand or proprietary interconnects. For technical applications that decompose neatly into loosely-coupled threads, a big cluster with vanilla Gigabit Ethernet does just fine. The top Ethernet system on the TOP500 list -- at number 58 -- is the new ATLAS cluster at the Max Planck Institute for Gravitational Physics in Germany.
Read More...

Top Headlines

San Diego Gets Set for Storage Explosion

Jul 03 | Byte and Switch | The San Diego Supercomputer Center, which provides much of the core storage for the TeraGrid, is overhauling its 28 petabyte storage system to support tremendous data growth. Read more...

Intel's Gelsinger Predicts Intel Inside Everything

Jul 03 | ExtremeTech | Intel exec Pat Gelsinger said he sees the Intel Architecture permeating virtually every segment of computing, as the company's microprocessors expand into more and more cores. Read more...

A Massively Parallel Future

Jul 03 | Bangkok Post | The latest programmable GPUs are starting to steal application cycles from CPUs. Read more...

UCSD Researchers Identify Potential Bird Flu Drugs

Jul 02 | UC San Diego News Center | With the help of resources at the San Diego Supercomputer Center, UCSD scientists have isolated more than two dozen promising compounds from which new “designer drugs” might be developed to combat the avian flu virus. Read more...

Implementing Multi-Core: The Devil Is in the Detail

Jul 02 | Chip Design Magazine | Dual- and quad-core processors barely scratch the surface of the potential of multi-core systems. Read more...

Featured Whitepapers

New HPC White Paper: Star-P® Performance on IBM Linux Clusters

Jul 03 | | The paper explores some of the performance benefits of Star-P on commodity scalable systems such as IBM's Linux clusters based on multi-core Intel Xeon processors. The results demonstrate substantial performance gains with almost no programmer effort-roughly a 24-fold speed improvement for solving linear matrix equations. An overview of parallel computing with Star-P, a description of the performance test cases and description of IBM cluster configurations used for testing are also addressed.

Fast N-Body Simulation with CUDA C Compiler

Apr 17 | | An N-body simulation numerically approximates the evolution of a system of bodies in which each body continuously interacts with every other body, and it arises in many other computational science problems as well.

Improving Performance and Manageability for Seismic Processing and Imaging Applications with Parallel Storage

Jun 05 | | As pressure increases on the upstream seismic processing community to deliver ever-higher levels of productivity and efficiency, a new generation of storage solutions will be required that allow the maximum utilisation of high-performance computing (HPC) Linux cluster resources, together with the minimum of management overhead.

Multimedia

Podcast: Interview with Ben Bennett of ClearSpeed Technology

Today, HPC organizations are requiring substantially more floating point performance to solve real-world problems. In this podcast, Ben Bennett, ClearSpeed General Manager, discusses how acceleration technology can improve the overall performance of standard x86-based systems...

ISC'08

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.

Get updates and insights on the High Productivity Computing industry delivered driectly to your inbox.






Featured Events

HPC Job Bank