One of the leading contenders in the race to establish an exascale supercomputer has published the results of a feasibility study began in 2012, exploring ways to achieve high-bandwidth sufficient for coming memory-intensive applications.
Japan’s Feasibility Study of Future HPCI (High Performance Computing Infrastructure) systems, launched by MEXT (Ministry of Education, Culture, Sports, Science and Technology), set out with these primary objectives:
+ Discuss future high-end systems capable of satisfying the social and scientific demands for HPC in the next five to ten years in Japan.
+ Investigate hardware and software technologies for developing future high-end systems available around the year 2018 that satisfy these demands.
The project spans three systems teams and one application team:
+ The team comprised of Tohoku University, NEC and JAMSTEC are investigating the feasibility of a multi-vector core architecture with a high-memory bandwidth.
+ The University of Tokyo and Fujitsu team are exploring the feasibility of a K-Computer-compatible many-core architecture.
+ The Tsukuba and Hitachi team are studying the feasibility of an accelerator-based architecture.
+ The Riken-TokyoTech Application Team is analyzing the direction of social and scientific demands, and designing the roadmap of R&D on target applications for the 2020 time frame.
With power as the primary challenge, the project is focusing not only on peak computational performance but also sustained performance per watt. They are also zeroing in on the need for mid-range HPC systems.
The recently-published overview paper offers clarified design specifications to support the creation of a high-end computing system in the 2018 timeframe, slightly ahead of the target exascale date of 2020 or later being bandied about by other race contenders, including the US and the EU. The current reining TOP500 champ for four iterations running – China – has a head start, but the distance is not insurmountable if the will and funding are there.
Figure one from the paper looks at the memory requirements of a small but important cross-section of applications in the context of Bytes per FLOP (B/F), defined by the team as a ratio of memory throughput in bytes/s to a computing performance in flop/s.
Writes the team: “If we continue to develop high-end computing systems by concentrating on increasing flop/s rates, simply targeting toward exa-flop/s in 2020, rather than memory bandwidth, their applicable areas are becoming narrowed, i.e., there will be a high probability that only a few percentage points of peak performance of exa-flop/s would be effective in the execution of practical applications because lots of arithmetic units are stalled due to waiting for the arrival of data and end up wasted.”
The answer, according to the team, is a design that emphasizes “the quality of parallel processing rather than the quantity.” Specifically they are aiming for 100x more sustained performance with 10x peak performance.