A new word in automatic optimization on computing accelerators was spelled by Intel when they released the Intel Xeon Phi coprocessor in November 2012, representing what is now known as the Many Integrated Core (MIC) architecture.
The MIC architecture can be programmed with the standard Fortran, C and C++ languages, and it understands common HPC parallel frameworks such as OpenMP and MPI. But most importantly, the Intel compiler suite knows how to compile Fortran, C or C++ code written by a ”mere mortal” to run on the coprocessor as if it had been optimized by a ”ninja”.
This automatic optimization capability was highlighted in this paper published by Colfax Research. The authors demonstrate step by step how to construct a library of special functions and make it offloadable to an Intel Xeon Phi coprocessor. Using a C++ language extension, they inform the compiler that certain functions are candidates for automatic vectorization in user applications.
Finally, they brush up the high-language code of the function to allow the compiler to do its best with optimization. As a result, their implementation of the Gauss error function performs on par with the highly optimized vendor implementation.
Demonstrated automatic optimization capabilities open doors to scientists and engineers wishing to boost the performance of their general-purpose functions using the MIC architecture. Be it a special mathematical function, an empirical functional relationship, or a solution of a differential equation, it is possible to express it in a high-level language and trust to the compiler to do the optimization. Additionally, the implementation of a library function in a high-level language will scale forward to future computing architectures in a blink of an eye. That is, in a swing of the compiler’s ”ninjato”.