While there is a universal desire in the HPC community build the world’s exascale system, the achievement will require a major breakthrough in not only chip design and power utilization but programming methods, NVIDIA chief scientist Bill Dally said in a keynote address at ISC 2013 last week in Leipzig, Germany.
In last Monday’s speech, titled “Future Challenges of Large-scale Computing,” Dally outlined what needs to happen to achieve an exascale system in the next 10 years. According to Dally, who is also a senior vice president of research at NVIDIA and a professor at Stanford University, it boils down to two issues: power and programming.
Power may present the biggest dilemma to building an exascale system, which is defined as delivering 1 exaflop (or 1,000 petaflops) of floating point operations per second. The world’s largest rated supercomputer is the new Tianhe-2, which recorded 33.8 petaflops of computing capacity in the latest Top 500 list of the world’s largest supercomputers, while consuming nearly 18 megawatts of electricity. It has a theoretical peak of nearly 55 petaflops.
Theoretically, an exascale system could be built using only x86 processors, Dally said, but it would require as much as 2 gigawatts of power. That’s equivalent to the entire output of the Hoover Dam, Dally said, according to an NVIDIA blog post on the keynote.
Using GPUs in addition to X86 processors is a better approach to exascale, but it only gets you part of the way. According to Dally, an exascale system built with NVIDIA Kepler K20 co-processors would consume about 150 megawatts. That’s nearly 10 times the amount consumed by Tianhe-2, which is composed of 32,000 Intel Ivy Bridge sockets and 48,000 Xeon Phi boards.
Instead, HPC system developers need to take an entirely new approach to get around the power crunch, Dally said. The NVIDIA chief scientist said reaching exascale will require a 25x improvement in energy efficiency. So the 2 gigaflops per watt that can be squeezed from today’s systems needs to improve to about 50 gigaflops per watt in the future exascale system.
Relying on Moore’s Law to get that 25x improvement is probably not the best approach either. According to Dally, advances in manufacturing processes will deliver about a 2.2x improvement in performance per watt. That leaves an energy efficiency gap of 12x that needs to be filled in by other means.
Dally sees a combination of better circuit design and better processor architectures to close the gap. If done correctly, these advances could deliver 3x and 4x improvements in performance per watt, respectively.
According to NVIDIA’s blog, Dally is overseeing several programs in the engineering department that could deliver energy improvements, including: utilizing hierarchical register files; two-level scheduling; and optimizing temporal SIMT.
Improving the arithmetic capabilities of processors will only get you so far in solving the power crunch, he said. “We’ve been so fixated on counting flops that we think they matter in terms of power, but communication inside the system takes more energy than arithmetic,” Dally said. “Power goes into moving data around. Power limits all computing and communication dominates power.”
Besides addressing the power crunch, the way that supercomputers are programmed today also serves as an impediment to exascale systems.
Programmers today are overburdened and try to do too much with a limited array of tools, Dally said. A strict division of labor should be instituted among the triumvirate of programmers, tools, and the architecture to drive efficiency into HPC systems.
The best result is delivered when each group “plays their positions,” he said. Programmers ought to spend their time writing better algorithms and implementing parallelism instead of worrying about optimization or mapping, which are better off handled by programming tools. The underlying architecture should just provide the underlying compute power, and otherwise “stay out of the way,” Dally said according to the NVIDIA blog.
Dally and his team are investigating the potential for items such as collection-oriented programming methods to make programming supercomputers easier. Exascale-sized HPC systems are possible in the next decade if these limitations are addressed, he said.