The post-petascale era is marked by systems with far greater parallelism and architectural complexity. Failing some game-changing innovation, crossing the next 1000x performance barrier will be more challenging than previous efforts. At the 2014 Argonne National Laboratory Training Program on Extreme Scale Computing (ATPESC), held in August, Professor Pete Beckman delivered a talk on “Exascale Architecture Trends” and their impact on the programming and executing of computational science and engineering applications.
It’s a unique point in time, says Beckman, director of the Exascale Technology and Computing Institute. While we can’t completely future-proof code, there are trends that will impact programming best practices.
When it comes to the current state of HPC, Beckman shares a chart from Peter Kogge of Notre Dame detailing three major trends, which can be traced back to 2004.
- The power ceiling.
- The clock ceiling.
- Sockets and cores are growing.
As Kogge illustrates, there was a fundamental shift in 2004. Computing reached a point where the chips can’t get any hotter, the clock stopped scaling and there was no more free performance lunch.
“Now the parallelism in your application is increasing dramatically with every generation,” says Beckman. “We have this problem, we can’t make things take much more power per package, we’ve hit the clock ceiling, we’re now scaling by adding parallelism, and there’s a power problem at the heart of this, which translates into all sorts of other problems, with memory and so on.”
To illustrate the power issue, Beckman compares the IBM Blue Gene/Q system to its predecessor the Blue Gene/P system. Blue Gene/Q is about 20 times faster and uses four times more power, making it five times more power efficient. This seems like very good progress. But with further extrapolation, it is evident that an exascale system built on this 5x trajectory would consume 64MW of power. To add further perspective, consider a MW costs about $1 million a year in electricity, putting this cost at $64 million a year.
Beckman emphasizes the international nature of this problem. Japan, for example, has set an ambitious target of 2020 for its exascale computing strategy, which is being led by RIKEN Advanced Institute for Computational Science. Although they have not locked down all the necessary funding, they estimate a project cost of nearly $1.3 billion.
Regions around the world have come to the conclusion that the exascale finish line is unlike previous 1000x efforts and will require international collaboration. Beckman points to TOP500 list stagnation has indicative of the difficulty of this challenge. In light of this, Japan and the US have signed a formal agreement to collaborate on HPC system software development. The agreement signed at ISC includes significant collaboration.
Europe is likewise pursuing similar agreements with the US and Japan. As part of its Horizon 2020 program, Europe is planning to invest 700 million Euros between 2014 and 2020 to fund next-generation systems. Part of this initiative includes a special interest in establishing a Euro-centric HPC vendor base.
No discussion of the global exascale race would be complete without mentioning China, which has operated the fastest computer in the world, Tianhe-2, for the last three iterations of the TOP500 list. Tianhe-2 is energy-efficient for its size with a power draw of 24MW power including cooling, however the expense has resulted in it’s not being turned on all the time.
Principally an Intel-powered system, Tianhe-2 also contains homegrown elements developed by China’s National University of Defense Technology (NUDT), including SPARC-derived CPUs, a high-speed interconnect, and its operating system, which is a Linux variant. China continues to invest heavily in HPC technology. Beckman says we can expect to see one of the next machine’s from China – likely in the top 10 – comprised entirely of native technology.
Can the exponential progress continue?
Looking at the classic History of Supercomputing chart, it looks like systems will continue to hit their performance marks if their massive power footprints are tolerable. At the device level, there is stress with regard to feature sizes nearing some fundamental limits. “Unless there is a revolution of some sort, we really can’t get off the curve that is heading towards a 64MW supercomputer,” says Beckman. “It’s about power, both in the number of chips and the total dissipation of each of chips.”
Beckman cites some of the forces of change with regard to software, including memory, threads, messaging, resilience and power. At the level of the programming model and the OS interface, Beckman suggests the need for coherence islands as well as persistence.
With increased parallelism, the notion that equal work is equal time is going away, and variability (noise, jitter) is the new norm. “The architecture will begin to show even more variability between components and your algorithms and your approaches, whether it’s tasks or threads, will address that in the future,” Beckman tells his audience, “and as we look toward exascale, the programmer who can master this feature well, will do well.”
Attracting and training the next generation of HPC users is a top priority for premier HPC centers like Argonne National Laboratory. One way that Argonne tackles this challenge is by holding an intensive summer school in extreme-scale computing. Tracing its summer program back to the 1980s, the presentations are worthwhile not just for the target audience – a select group of mainly PhD students and postdocs – but for anyone who is keenly interested in the state of HPC, where it’s come from and where it’s going.