Exascale Expectations

By Michael Feldman

November 20, 2009

During Al Gore’s SC09 keynote speech on Thursday, he correctly observed that “Moore’s Law is not a law of physics, it’s a law of self-fulfilling expectations.” So it is with the unwritten law of supercomputer power, which tells us that system performance will increase about 1,000-fold per decade. If that pace continues, we should see the first exaflop systems show up in the 2018-2019 timeframe.

Because that expectation is so ingrained in the HPC mindset, a subset of the community has already coalesced to make sure we hit the mark. In fact, one of the last sessions of SC09 on Friday morning was a discussion about the road to exascale. Some of the heavy hitters in HPC were on the panel, including Jack Dongarra, Peter Kogge, Marc Snir, and Steve Scott. Intel’s Bill Camp was the moderator.

As you might expect, there was general agreement about the big exascale challenges: software scalability and models; memory and storage bandwidth; system resiliency; and power and cooling. All of these issues really stem from the fact that processing horsepower is outrunning the capabilities of all the surrounding technologies. And this is because the processor core count and the number of processors per HPC system are going to continue to increase faster than the software and other system components can match. As a result, much of the panel discussion tended to drift into a sort of “the sky is falling” narrative.

Cray CTO Steve Scott, representing the only vendor on the panel, had a somewhat different take. From his perspective, there aren’t any real showstoppers on the way to exaflop computing; there just aren’t any ideal solutions. He predicted that the first machine will arrive 2017 and will be based on 16nm process technology. Scott estimated each socket will deliver about 8 teraflops per socket, which works out to 125,000 sockets for one exaflop. The entire system will draw 31 MW and span about 10,000 square feet. All doable.

What Cray is counting on though is a shift to heterogeneous processing. “Most of the FLOPS are going to be in the so-called accelerators, whether they’re GPUs or SIMD vector units,” said Scott. “We’ll have some fast threads for performance on serial codes, coupled to large numbers of efficient, low control overhead, more efficient FLOPS. This is absolutely necessary to get both good performance and energy efficiency.”

But, he said, the memory bandwidth per FLOP is going to have to be a lot lower than it is today. The result is that some apps will be left behind performance-wise. Codes that do big matrix multiplies will be fine, but ones that need to do lots of memory references will be “SOL,” according to Scott. In fact, this has already occurred. Many applications are able to use only a small fraction of the potential performance on supercomputers, and that’s been going on for years.

Scott also concedes that some problems, such as reliability, will have to be accounted for in new ways. The commodity parts upon which these systems are based won’t have built-in fault-tolerance, since the volume market for these components (i.e., client devices and large datacenter servers) don’t require it. But good system design should take care of the problem at the hardware level.

“We can make the systems resilient,” claimed Scott. “I’m not too worried about that. It’s the applications that are the hard part.” He said checkpoint-restart can be used as a temporary solution, but as mean time between failure (MTBF) approaches the time to do a checkpoint, that model breaks down. What will be needed is application-side help that is able to deal with frequent failures. Scott thinks there’s some potential for automatic application resiliency, via the compiler and runtime, but it’s likely that the user application model will have to change to handle full resiliency. Again though, there are no showstoppers.

It’s worth noting average HPC systems follow the same relative performance pace as the top machines. For example, the bottom-ranked supercomputer on the TOP500 list also increased its Linpack performance 1,000-fold each decade. That means when the first exaflop system appears, the 500th fastest computer in the world will be 10 petaflops or so. Thus, when the exascale era is inaugurated in eight or nine years, most HPC users will be booting up their first petaflop machines — and will be thrilled to do so.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

TACC Supercomputing Powers Climate Modeling for Fisheries

January 28, 2023

A tremendous portion of the world depends on the output of the oceans’ major fisheries, which have, in recent decades, found themselves under near-constant threat from mismanagement (e.g. overfishing). Climate change, Read more…

PFAS Regulations, 3M Exit to Impact Two-Phase Cooling in HPC

January 27, 2023

Per- and polyfluoroalkyl substances (PFAS), known as “forever chemicals,” pose a number of health risks to humans, with more suspected but not yet confirmed – and, as a result, PFAS are coming under increasing regu Read more…

Sweden Plans Expansion for Nvidia-Powered Berzelius Supercomputer

January 26, 2023

The Atos-built, Nvidia SuperPod-based Berzelius supercomputer – housed in and operated by Sweden’s Linköping-based National Supercomputer Centre (NSC) – is already no slouch. But now, Nvidia and NSC have announced Read more…

Multiverse, Pasqal, and Crédit Agricole Tout Progress Using Quantum Computing in FS

January 26, 2023

Europe-based quantum computing pioneers Multiverse Computing and Pasqal, and global bank Crédit Agricole CIB today announced successful conclusion of a 1.5-year POC study “to evaluate the contribution of an algorithmi Read more…

Critics Don’t Want Politicians Deciding the Future of Semiconductors

January 26, 2023

The future of the semiconductor industry was partially being decided last week by a mix of politicians, policy hawks and chip industry executives jockeying for influence at the World Economic Forum. Intel CEO Pat Gels Read more…

AWS Solution Channel

Shutterstock_1687123447

Numerix Scales HPC Workloads for Price and Risk Modeling Using AWS Batch

  • 180x improvement in analytics performance
  • Enhanced risk management
  • Decreased bottlenecks in analytics
  • Unlocked near-real-time analytics
  • Scaled financial analytics

Overview

Numerix, a financial technology company, needed to find a way to scale its high performance computing (HPC) solution as client portfolios ballooned in size. Read more…

Microsoft/NVIDIA Solution Channel

Shutterstock 1453953692

Microsoft and NVIDIA Experts Talk AI Infrastructure

As AI emerges as a crucial tool in so many sectors, it’s clear that the need for optimized AI infrastructure is growing. Going beyond just GPU-based clusters, cloud infrastructure that provides low-latency, high-bandwidth interconnects and high-performance storage can help organizations handle AI workloads more efficiently and produce faster results. Read more…

Riken Plans ‘Virtual Fugaku’ on AWS

January 26, 2023

The development of a national flagship supercomputer aimed at exascale computing continues to be a heated competition, especially in the United States, the European Union, China, and Japan. What is the value to be gained Read more…

PFAS Regulations, 3M Exit to Impact Two-Phase Cooling in HPC

January 27, 2023

Per- and polyfluoroalkyl substances (PFAS), known as “forever chemicals,” pose a number of health risks to humans, with more suspected but not yet confirmed Read more…

Critics Don’t Want Politicians Deciding the Future of Semiconductors

January 26, 2023

The future of the semiconductor industry was partially being decided last week by a mix of politicians, policy hawks and chip industry executives jockeying for Read more…

Riken Plans ‘Virtual Fugaku’ on AWS

January 26, 2023

The development of a national flagship supercomputer aimed at exascale computing continues to be a heated competition, especially in the United States, the Euro Read more…

Shutterstock 1134313550

Semiconductor Companies Create Building Block for Chiplet Design

January 24, 2023

Intel's CEO Pat Gelsinger last week made a grand proclamation that chips will be for the next few decades what oil and gas was to the world over the last 50 years. While that remains to be seen, two technology associations are joining hands to develop building blocks to stabilize the development of future chip designs. The goal of the standard is to set the stage for a thriving marketplace that fuels... Read more…

Royalty-free stock photo ID: 1572060865

Fujitsu Study Says Quantum Decryption Threat Still Distant

January 23, 2023

Global computer and chip manufacturer Fujitsu today reported that a new study performed on its 39-qubit quantum simulator suggests it will remain difficult for Read more…

At ORNL, Jeff Smith Becomes Interim Director, as Search for Permanent Lab Chief Continues

January 20, 2023

UT-Battelle, which manages Oak Ridge National Laboratory (ORNL) for the U.S. Department of Energy, has appointed Jeff Smith as interim director for the lab as t Read more…

Top HPC Players Creating New Security Architecture Amid Neglect

January 20, 2023

Security of high-performance computers is being neglected in the pursuit of horsepower, and there are concerns that the ignorance may be costly if safeguards ar Read more…

Ohio Supercomputer Center Debuts ‘Ascend’ GPU Cluster

January 19, 2023

Less than 10 months after it was announced, the Columbus-based Ohio Supercomputer Center (OSC) has debuted its Dell-built GPU cluster, “Ascend.” Designed to Read more…

Leading Solution Providers

Contributors

SC22 Booth Videos

AMD @ SC22
Altair @ SC22
AWS @ SC22
Ayar Labs @ SC22
CoolIT @ SC22
Cornelis Networks @ SC22
DDN @ SC22
Dell Technologies @ SC22
HPE @ SC22
Intel @ SC22
Intelligent Light @ SC22
Lancium @ SC22
Lenovo @ SC22
Microsoft and NVIDIA @ SC22
One Stop Systems @ SC22
Penguin Solutions @ SC22
QCT @ SC22
Supermicro @ SC22
Tuxera @ SC22
Tyan Computer @ SC22
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire