While discussions of HPC architectures have long centered on performance gains, that is not the only measure of success, according to Petteri Laakso of Vector Fabrics. Spurred by ever-proliferating core counts, programmability is taking on new prominence. Vector Fabrics is a Netherlands-based company that specializes in multicore software parallelization tools, so programmability is high on their list of priorities.
In a recent blog post, the first article in a three-part series, Laakso contends that in the current paradigm, writing the software is the programmer’s problem, not the silicon maker’s. He thinks this is an approach that is losing ground.
“The question is,” writes Laakso, “does peak performance and performance/power ratios alone determine the success of an architecture, or does programmability impact the initial adoption or success of a silicon architecture?”
He turns to the field of HPC as a test case of sorts and sets out the following hypothesis:
“If programmability does not impact the success of a silicon architecture, we should be see the best-performing architectures win.”
To examine the issue in more detail, the Vector Fabrics team looked at the following software programmable accelerator technologies: CUDA GPGPUs, OpenCL GPGPUs, FPGAs, and Xeon Phis. For sample data, they turned to the TOP500 list of world’s fastest supercomputers. They looked at which systems used these accelerators, and then mapped the adoption rates from the debut of each technology.
“The difference is remarkable, since the performance figures are not very different between ATI/AMD and NVIDIA GP-GPUs,” writes Laakso.
“One clear difference between ATI/AMD and NVIDIA can be found in their investment in tooling and the programming paradigm. NVIDIA spent a considerable amount in developing the CUDA programming paradigm and accompanying tools. AMD’s investment in OpenCL and development tooling has been much more limited and leaning more towards the community to provide the improvements.”
To recap: Laakso traces NVIDIA’s significantly higher adoption rates directly to a well-supported programming ecosystem.
He continues:
“Neither of GP-GPUs’ programming paradigms can be called simple due to architectural limitations of GP-GPUs. But NVIDIA’s CUDA programming environment is much more developed than OpenCL’s. Looking at the relative adoption rates of the products, it’s hard to ignore the sentiment that the lack of good tooling has really hurt the chances of AMD GP-GPUs and OpenCL, regardless of its benefits over CUDA of openness and portability.”
So where does Intel’s accelerator play, the Xeon Phi, fit in?
Laakso: “When comparing to recent GP-GPUs, Xeon Phi offers comparable if slightly inferior performance and power characteristics. The main selling point of Xeon Phi is that you can use the same programming paradigm and tooling as you are using for normal node programming. While the reality does not carry quite as far as the marketing claims go, you can execute your existing applications on the Xeon Phi using MPI or OpenMP. You don’t have to port your code to a accelerator-specific programming paradigm.”
As for FPGAs, there are no FPGA systems on the TOP500, a data point that Laakso maintains further strengthens his conclusion that “[as] coding gets easier, adoption in the TOP500 is faster.”
The blog covers a lot of ground and makes a lot of claims. It also provides a nice counterpoint to our analysis of accelerator trends on the TOP500 list. What do you think?