Strip-Mining for Vectorization is the focus of the second installment of a 3-part educational series from Colfax International introducing select topics on optimization of applications for Intel’s multi-core and manycore architectures (Intel® Xeon® processors and Intel® Xeon Phi™ coprocessors).
This paper discusses data parallelism with a focus on automatic vectorization and exposing vectorization opportunities to the compiler.
For a practical illustration, the paper shows how to construct and optimize a micro-kernel for particle binning. Similar workloads occur in Monte Carlo simulations, particle physics software, and statistical analysis.
The optimization technique discussed in this paper leads to code vectorization, which results in an order of magnitude performance improvement on an Intel Xeon processor. Performance on the Intel Xeon Phi compared to that on a high-end Xeon is 1.4x greater in single precision and 1.6x greater in double precision.
Access the Colfax Optimization Techniques for the Intel MIC Architecture. Part 2 of 3: Strip-Mining for Vectorization white paper here >