CSCS Top Right Frontpage
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud
Green Computing Report

Tabor Communications
Corporate Video

Sometimes Accomplishment Is Starting Something New Rather than Finishing Something Old


So perhaps it was of this last year of the first decade of the first century of the new millennium in the field of high performance computing. Not to minimize the continued progression of petaflops computing as we enter Year 3 AP (After Petaflops).  With the addition of new machines both deployed and planned, petaflops-scale applications, as acknowledged by the Gordon Bell Prize, steady increase in the number of cores per socket, and the uncomfortable marriage of GPUs in heterogeneous structures -- the last year has been marked by continued and demonstrable advances. As petaflops computing has become truly international in scope and application, this emerging system class is no longer an ethereal fringe, but rather has gained firm traction at such power houses (yes, meant in more than one way) as Oak Ridge National Laboratory, where they now serve humanity as the heavy lifters in computational methods addressing the challenges of the modern world.

But one potentially important accomplishment in the last twelve months is not something that has been completed; instead, it is something that has been just initiated. Even as we gain a footing in the era of petaflops computing, we have set in motion the exploration of the undiscovered domain of exaflops computing. This year has seen the launching of multiple programs to develop the concepts, architectures, software stack, programming models, and new families of parallel algorithms necessary to enable the practical realization of exaflops capability prior to the end of this decade. These have involved unprecedented cooperation and coordination within government agencies and laboratories, industry, academia, and internationally. At the dawn of the petaflops era, the emerging focus on the performance regime three orders of magnitude beyond is unlike anything before it and in stark contrast to the grass-roots workshops towards petaflops back in the relaxed days of the run-up to teraflops in the 1990s.

There are good reasons for this. The challenges facing the continued delivered sustained performance across a broad range of application domains are dramatic and reflect a corner turning on the trends that have driven us forward, ultimately due to Moore's Law and the semiconductor revolution. These, somewhat over simplistically, can be summarized as: concurrency, power, reliability, and productivity.

In the past, the double-whammy of increases in clock rate and increases in processor core complexity delivered two decades of sustained exponential growth in processor core performance which when integrated in clusters of SMP nodes has given us the iconic images of straight lines on semi-log graphs with respect to the passage of time. Now the S-curve is bending for a second time, and not in a good way. Power has hit the threshold of pain, and the architecture tricks have been largely exhausted. Increased resources have been dedicated to confronting the egregious impact of the memory wall and the latencies and blocking incurred. Ever decreasing efficiencies (single digit not uncommon) by several normalization factors (e.g., FLOPS, utilization, per transistor, per joule, per hectare) have exposed the soft underbelly of an ultimately unsustainable golden age: exponentials cannot go on forever.

Indeed, the authors have projected that "we will never achieve sustained zettaflops computing" using the hardware paradigm of Boolean logic gates and binary data storage. Due to the speed of light, Boltzmann's Constant, and atomic granularity it is predicted that the wall, which is more like a very steep hill will occur at about 32 exaflops. But we are not there yet; indeed, there are a good four orders of magnitude to go. And that will be hard.

Three major activities can be cited that have just been created during the last year to engage the talents of the international community including experts in: hardware, software, algorithms, and domain science. These have resulted from at least two years of preliminary workshops and studies sponsored by diverse entities and internal industry planning as well. These are: IESP, DOE X-Stack, and DARPA UHPC. There are many smaller activities as well.

The International Exascale Software Project (IESP) has brought together the interests, talents, and resources of the international community to cooperate and coordinate long-term development of the necessary software infrastructure required to enable effective exaflops-scale performance before the end of this decade. Learning from past experiences where software always appeared to lag behind the hardware, this world-straddling endeavor is driven by the recognition that to succeed, the software needs to be there when the hardware is. More importantly, the hardware designs must be informed by the needs of the software so that there is minimum mismatch and the concomitant ensuing generations of unsatisfactory patches. But there is an even more critical imperative: the realization that without the right software, exaflops may not be achievable at all (except in special cases) and that no one nation can go it alone; the HPC community is just too small for multiple conflicting paths of a top to bottom software refactoring. In the last year, four multi-day meetings in France, Japan, and the UK among representatives of all of the major HPC nations have provided an emerging roadmap to inform future planning of the joint development of the full supporting software infrastructure for Exascale systems' operation and programming.

The US DOE has also begun a new program of research with the release of its recent RFP to develop the components of the "X-Stack," the software required to enable a new generation of science and technology applications with the advent of future exaflops capable systems. These elements include operating systems, runtime systems, programming models and tools, and methods for reliability and mass storage and I/O. The winners, not yet announced, will represent a new wave of research in the US combining partners in the national laboratories, industry, and academia driven by the requirements of major mission-critical applications. This and other related DOE programs were developed in part from an extensive series of community workshops through the preceding year on application domains, hardware and software systems, and mathematical algorithms. This research will join other programs around the world in the first concerted effort to turn the corner and set a new trajectory for future HPC system software architecture, design, and implementation.

Perhaps most dramatic and at the same time risky undertaking is the new DARPA Ubiquitous High Performance Computing (UHPC) research program. UHPC is intended to attack the above challenges through nothing less than revolutionizing HPC system design. Through a lengthy program development process that involved three separate studies in technology, software, and resiliency engaging the talents of experts throughout the US, UHPC evolved an energetic research charter to reinvent computing prior to the end of this decade. The program was not explicitly targeted to exascale but rather to the mid-range of one or some unspecified number of interconnected and interoperable racks, each capable of approximately 1 petaflops sustained performance with a power budget of less than 60 kilowatts.

At the foundation of this program is the call for a new model of parallel computation to replace the venerable and highly successful message-passing model that has dominated for the last two decades. A major emphasis is on power reduction with an average energy of 25 Pico-Joules per floating point operation. A thousand such racks if sufficiently efficient would deliver 1 exaflops for 20 megawatts.

Emphasis is placed on the co-design of both hardware and software components in response to challenge problems that will span the applications domains from some of the largest STEM problems to heavy real time I/O streaming to knowledge management graph problems. Scaling down is as important as scaling up to UHPC, with single modules capable of multiple teraflops (and in mobile modules this is an important operating point).

The program may run eight or nine years and result in one or more prototypes of fully-operational systems. The first half of the program, Phases 1 and 2 spanning four years, will begin this summer with the winning teams to be announced in a month's time. Atypical of such programs is the expectation of strong cooperation among competing teams and the delivery of much of the techniques and technology to the research community throughout the four phases of the program.

This year has indeed been a very productive year, both for its accomplishments in the deployment and application of petaflops-scale systems and for its forward-looking inauguration of the exaflops era.

Sponsored Links

Accelerate your science with Seneca
One of the first HPC providers installing a 4X NVIDIA Kepler K-20 cluster. Invites you to a free evaluation on Seneca’s NVIDIA K20 Kepler cluster, pre-loaded with AMBER, NAMD, LAMMPS

Webinar: Programming Heterogeneous X64+GPU Systems Using OpenACC
Join Michael Wolfe as he compares the advantages and costs of using both low-level models and the directive-based OpenACC model for programming accelerated heterogeneous systems. Registration is free.

High-Performance Computing in Action
Businesses that want to be on the cutting edge of their industries are increasingly turning to high-performance computing (HPC) solutions to handle complex compute processes and speed up their rate of innovation. Download this Executive Brief to see how businesses in energy, life sciences and entertainment put HPC solutions to work in their operations.

May 24, 2013

May 23, 2013

May 22, 2013

May 21, 2013

May 20, 2013

May 17, 2013

May 16, 2013

May 15, 2013

May 14, 2013

May 13, 2013


Most Read Features

Most Read Around the Web

Most Read This Just In


Short Takes

NASA Builds 'Climate in a Box'

May 23, 2013 | The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
Read more...

Building Supercomputers with Raspberries

May 22, 2013 | At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
Read more...

Running Computational Fluid Dynamics in the Cloud

May 16, 2013 | When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...

Computing the Physics of Bubbles

May 15, 2013 | Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...

Sponsored Whitepapers

Best Practices in Big Data Storage

05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.

Progress in Parallel: the Bull Parallel Programming Center

04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.

Sponsored Multimedia

SGI DMF ZeroWatt Disk Solution

In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.

Cray CS300-AC Cluster Supercomputer Air Cooling Technology Video

The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.

SC12 Editorial Feature HPCwire Soundbite sponsored by ISC

HPC Job Bank


Featured Events


  • June 16, 2013 - June 20, 2013
    ISC'13
    Leipzig,
    Germany

  • June 17, 2013 - June 18, 2013
    Forecast 2013
    San Francisco, CA
    United States





HPCwire Events