NVIDIA today announced availability of its newest PGI Accelerator Fortran, C and C++ compilers (version 15.10) now with support for OpenACC directives-based parallel programming standard on x86 architecture multicore microprocessors. The new compilers allow OpenACC-enabled source code to be compiled for parallel execution on a multicore CPU or a GPU accelerator.
“Our goal is to enable HPC developers to easily port applications across all major CPU and accelerator platforms with uniformly high performance using a common source code base,” said Douglas Miles, director of PGI Compilers & Tools at NVIDIA (NASDAQ: NVDA). “This capability will be particularly important in the race towards exascale computing in which there will be a variety of system architectures requiring a more flexible application programming approach.”
This newest PGI feature compiles OpenACC compute regions for parallel execution across all of the cores in an x86 processor or multi-socket server. The cores are treated in aggregate as a shared-memory accelerator, eliminating all data movement overhead in the resulting OpenACC programs. By default the compiler generates code that uses all the available cores in the system, and several methods exist for programmers to control and fine-tune this behavior.
While this release targets current x86 machines, NVIDIA also outlined the timing for PGI OpenACC compiler support for IBM POWER, Intel’s Knight’s Landing, and ARM architectures.
“We’re already shipping preproduction POWER compilers to DOE. customers. You’ll see a beta release of those compilers early in 2016 and a production release in mid-to-late 2016,” said Miles. “For Knights Landing, we don’t have hardware yet and need it to work with it first. We’re running today on dual socket 36-core Haswell servers and seeing very good performance, but there are aspects [to Xeon Phi] like AVX-512 support that we expect will take some adaptations.” NVIDIA’s roadmap indicates Xeon Phi support sometime in 2016 and ARM in 2017.
As part of the product rollout NVIDIA provided two substantial HPC community customer testimonials.
“We were extremely impressed that we can run OpenACC on a CPU with no code change and get equivalent performance to our OpenMP/MPI implementation, and get 4x faster performance when running on a GPU,” said Wayne Gaudin of the U.K.’s Atomic Weapons Establishment. “From the perspective of performance portability and code future proofing, this is an excellent result.”
“Porting HPC applications from one platform to another is one of the most significant costs in the adoption of breakthrough hardware technologies,” said Buddy Bland, project director at Oak Ridge National Laboratory. “OpenACC for multicore x86 CPUs provides continuity and code portability from existing CPU-only and GPU-enabled applications from machines like Titan to all of DOE’s upcoming major systems as well as portability among those systems.”
Clearly program and performance portability are critical in the march towards exascale and compilers able to support multiple processor architectures are essential. Key benefits of running OpenACC on multicore CPUs include:
- Effective utilization of all cores of a multicore CPU or multi-socket server for parallel execution
- Common programming model across CPUs and GPUs in Fortran, C and C++
- Rapid exploitation of existing multicore parallelism in a program using the KERNELS directive, which enables incremental optimization for parallel execution
- Scalable performance across multicore CPUs and GPUs
Growing Momentum for OpenACC
There are now more than 10,000 developers using OpenACC according to NVIDIA. Miles cited recent hackathons across a variety of scientific including where applications have been accelerated with OpenACC in such diverse fields as MRI image reconstruction (PowerGrid), computational fluid dynamics (INCOMP3D, HiPSTAR and Numeca), cosmology and astrophysics (RAMSES, CASTRO and MAESTRO), quantum chemistry (LSDALTON), computational physics (NekCEM) and more.
In addition, Gaussian, Inc. has announced that it is using OpenACC to port the GAUSSIAN computational chemistry application to accelerators. At the recent iCAS2 conference on climate and weather in Annecy, France, Meteosuisse, the Swiss Federal Office of Meteorology and Climatology, announced the deployment of a GPU-accelerated version of COSMO, the world’s first production weather forecasting application running on GPU accelerators.
Miles cited results from a recent poll of 150 OpenACC developers in which 94 percent of the respondents reported getting a speedup when running on an accelerator, and over 90 percent of the users would recommend OpenACC. PGI version 15.10 is available for download starting today said Miles.
NVIDIA also released a few performance metrics for accelerating popular application using the new PGI compilers.