On Tuesday Intel announced that it has updated its compiler and cluster suites ahead of the introduction of the Core i7 (Nehalem) processors. The Fortran and C/C++ compilers have been revved to version 11, and this release includes version 3.2 of the Intel Cluster Toolkit. The releases are part of Intel’s 18-24 month update strategy in its workhorse programmer tools suites, and are aimed at the wide spectrum of developers who need parallelism on scales ranging from the single socket to thousands of processors. HPCwire spoke with James Reinders, chief product evangelist and director of marketing for Intel’s Developer Products Division, to find out what’s new this time around.
This week’s announcement follows Intel’s August announcement of Parallel Studio, a massive investment aimed at enabling Windows software developers to leverage the power of multiple cores into mainstream, desktop applications. This week Intel returns the focus back to tools for programmers in HPC, with many features that will resonate with parallel application developers on the traditional high end of the performance spectrum as well as the growing audience of developers aiming at smaller scale clusters.
Intel’s C++ Compiler 11.0 and Fortran Compiler 11.0 have been released, as well as a new Math Kernel Library (10.1), Integrated Performance Primitives (6.0), and version 2.1 of Threading Building Blocks. Each of these is available immediately for Windows, Mac, and Linux. Version 3.2 of the Cluster Toolkit (Compiler Edition) for Windows and Linux includes the compilers and the math library (including ScaLAPACK), as well as Intel’s MPI Library (3.2), Intel Trace Analyzer and Collector (7.2) with the new MPI correctness checker, MPI Benchmarks, and the cluster installer.
The MPI correctness checker is particularly interesting, and Reinders indicated beta users were enthusiastic about the product. Developers turn on performance analysis at compile time and link with an instrumented version of Intel’s MPI library. Post-run they can analyze what’s going on inside MPI calls using the Trace Analyzer, with automated support for catching errors such as mismatched send and receive buffers, and other insidious bugs that can be difficult to ferret out in development.
This version of Intel’s developer tool suite also includes support for new lambda functions and other features of the C++ 0x draft standard, parallel lint for static parallel application analysis, and features from Fortran 2003. OpenMP 3.0 is included with function-level parallelism for both data and task parallel models.
The diversity of Intel’s investment in developer tools reflects two key concepts: incremental investment and forward scaling. The breadth of Intel’s tools allow developers to introduce parallelism into their applications gradually, using TBB or OpenMP before perhaps going “all the way” with an MPI implementation. “Forward scaling” is the conceptual handle for the idea that an investment made in an application today shouldn’t turn into a dead end when the company doubles the number of cores in its hardware tomorrow. For example, the latest version of the C++ compiler takes advantage of the semantic properties of valarrays, which already existed in the language standard, to automatically parallelize operations on them for higher performance. Developers that take advantage of valarrays today can expect future versions of Intel’s compilers to “do the right thing” when it comes to scaling that performance on Intel’s future, more highly parallel, hardware.
There are differences between the interfaces and options that developers on different ends of the HPC spectrum need, a challenge that Intel has been dealing with for some time. “Multicore parallelism more strongly demands programmer productivity and more loosely demands efficiency from the implementation than traditional HPC,” says Reinders. The practical implication of this is a more polished interface with less exposed complexity for developers in tools meant for the low-end software developer. At the very high end, however, developers demand access to every flag and option to control what the compiler is doing and how it does it. Intel provides this access through its compiler “black belt” guides that detail hundreds of switches and flags for very fine-grained control. Happily for Intel, the core technology is the same, and small teams focus on targeting the technology with interfaces and APIs or by exposing (and
documenting) the arcane options in the compilers for the two audiences.
Intel’s product presentation includes about 60 slides on the performance of its compilers compared to the previous product release and to competing compilers. While admittedly very few companies present information that makes their products look bad, the sheer volume and technical detail in Intel’s performance studies reveal the confidence the company has in its new products. Performance is nearly uniformly better for this release on a very wide range of benchmarks.
You can find more details of these results on Intel’s developer software pages.
Reinders says that the final key piece of Intel’s release this week is stability. Reinders says that “There simply isn’t wiggle room to have anything about these products be unstable” for Intel’s customers — the companies that build the software we use every day.