The mainstream adoption of accelerator-based computing in HPC is driving the most significant change to software since the arrival of MPI almost twenty years ago. Faced with competing “similar but different” approaches to heterogeneous computing, developers and computational scientists need to tackle their software challenges quickly. They are rapidly discovering that a single unified development toolkit able to both debug and profile is the key to results – whichever platform they choose.
As developers choose to co-process or to accelerate – the vocabulary and implementation may be different for Intel Xeon Phi or NVIDIA CUDA – there is agreement that more parallelism, using more threads and vectorization are key changes to make to existing codes, or to consider in new codes. Often codes must support more than one platform – something ISVs and community code authors are very familiar with – and so today’s modern HPC developer needs a breadth of knowledge to provide the right solution today, and an eye on the ball for the future.
This is a hard ask – but a critical piece has fallen into place with the arrival of unified software development tools that provide for the workflows needed to adopt these key platforms.
Allinea Software’s version 4.0 release of its development suite provides debugging and profiling tools that can be used to port and maintain code on both Intel Xeon Phi and NVIDIA CUDA, through one unified interface and workflow, moving fluidly from a pure x86 solution to the new architectures:
- Allinea MAP quickly identifies loops suitable for offloading to a coprocessor or accelerator and highlights any other bottlenecks that would outweigh porting benefits:
- Allinea DDT helps follow through the execution of the modified program both on the host and the coprocessor / accelerator hardware, keeping bug-fixing and development close to current productivity levels
- Once the code is running correctly the focus moves back to performance, with Allinea MAP showing the speedup and communication overheads when running the new version, identifying opportunities for improving asynchronous communication to make best use of the typically powerful host CPU in parallel
To make this workflow even smoother Allinea announced a new combined license – with a 1024-process Allinea Unified Supercomputing License, for example, those 1024 tokens can be freely shared between debugging and profiling across all nodes and users on the system, making sure owners get the most use of the license at all times.
Two powerful systems with two different technologies – NICS Beacon II – the Intel Xeon Phi Green500 #1 system – and Oak Ridge’s Titan – the Cray XK7 with NVIDIA CUDA Top500 #1 system – exemplify today’s diversity, and convergence alike, as Allinea’s tools are at the center of their software environment.
“We could identify and resolve issues that I don’t think we would have been able to without Allinea DDT” – Josh Ladd, Tools Project Technical Officer at Oak Ridge National Laboratories during the OLCF3 project.
So, whichever adventure you choose, at least there’s one thing you don’t need to worry about: providing a productive, well-supported development environment. Allinea’s unified development tools are ready to go from the very first moment your system powers up.
If you haven’t already, take the new, unified tools for a spin.