Heterogeneous Computing in Firing Range
There is nothing static for scientific and technical computing developers on the horizon as the industry pushes an ever-expanding bevy of specialized co-processors, GPUs, FPGAs and other elements into the mix. The problem, however, is how far users are willing to push themselves to wrap around new code for the simple sake of performance gains.
Despite developer hassle, this is a great problem from the perspective of companies who are finding ways to tailor clean layers around complex code for heterogeneous computing.
Take, for example, Atlanta-based AccelerEyes, which is seeing booming business because of the demand for GPU acceleration and interest in kicking the Xeon Phi co-processor tires. The company’s emphasis falls right in line with the needs of technical computing folks, with acceleration focused on C, C++ and Fortran codes, and it has a firm foothold for a company its size in the “big, small world” of HPC.
The Atlanta-based company was initially founded in 2007 to give MATLAB a performance boost with GPUs via their Jacket product, which is still a core offering that zips over some of the dev complexity. They recently worked with MathWorks again on bringing its Parallel Toolbox to life, which targets heterogeneous computing for scientific and technical users.
Their real emphasis going forward is likely to be on the multi-GPU and co-processor ArrayFire offering, which they unveiled in 2011 to extend a much longer arm to technical computing users interested in dipping their toes into GPU waters. As the company’s CEO, John Melonakos detailed for us in a recent conversation, ArrayFire aims at making CUDA, OpenCL and now Phi within closer developer range. ArrayFire has a broad library of functions for CUDA and to a lesser extent, OpenCL and now several users kicking the new tires on Phi.
So far ArrayFire has offered some significant speedups to research at NASA, which uses AccelerEyes to boost Mars Rover image compression via GPUs and genetic algorithms to the tune of 5x. They’ve also worked with financial services firms on the quant side for 37x speedups, geolocation for government agencies at a 17x boost, and a number of oil and gas companies, which are seeing big speedups for everything from ground water simulations to 3D mantle geodynamics applications.
The use cases cited were all using GPU acceleration, but new innovations in heterogeneous computing will unleash some new examples of high-gear performance. In particular, he says Phi is a promising technology, even if Intel has a lot of work to do to catch up with the multi-year development of other vendors, especially NVIDIA. “There will be some advantages in terms of Intel’s software stack, which already has a solid user base…I think the Intel tools will be a big advantage for Phi but NVIDIA has taken a strong lead.”
Melonakos strongly believes that heterogeneous computing will be one of the biggest trends in the next decade of computing. “The parallelism that exists in workloads today can’t be ignored and GPUs really are a great immediate architecture to attack data parallel workloads in an energy efficiency manner. However, other heterogeneous options are on the rise and will play out over the coming decade. For instance, Phi hold great promise in terms of usability and technology roadmap, but Intel won’t be the immediate leader by any means—there are serious investments they need to make.”
At this point, however, CUDA is the real star of the GPU computing show because of NVIDIA’s commitment to seeing it through. Melonakos says that his company is still working on an OpenCL release for ArrayFire as they “wait for that software stack to mature a bit.” He notes that AMD has done a decent job of pushing OpenCL but they’re “tapped out on their push and have become just one of the players” but it has been getting a fresh amount of attention because of the buzz around Phi.
The AccelerEyes CEO says that the number of discussions about OpenCL his team has noted from around the ecosystem has definitely picked up in the wake of Phi. He expects this to continue since OpenCL lives something of a double life between the worlds of HPC and mobile, consumer computing. What drives both of these areas is that they’re equally committed to solving core problems around heterogeneous computing since they’re basically the same aside from the levels of computation and power considerations. In the end, whether for HPC or mobile users, the goal is to make use of every micrometer of hardware available.
Melonakos says that the industry is very fluid and kept healthy through active competition. However, he notes that for a small company like his, there’s no way that they can influence what is currently a dominant proprietary approach to GPU computing that makes developers have to commit to one or another. But open, portable standards are needed for developers to fully tap into the power of heterogeneous computing.
On that note, if you have a little time and don’t need a lot of eye candy with your media, there’s a pretty meaty presentation below on some of the tradeoffs between OpenCL and CUDA from AccelerEyes’ view. It also provides some solid points of comparison when it comes to using an abstraction layer over the two.
Although small now, AccelerEyes is set to grow in the wake of the rise of GPU and co-processor-driven projects. Melonakos said that they’re only a 20-person shop for now, but they’re adding new people to work on the ArrayFire offering weekly and are actively seeking new developers to join their ranks.