Data locality plays a critical role in energy-efficiency and performance in parallel programs. For data-parallel algorithms where locality is abundant, it is a relatively straightforward task to map and optimize for architectures with user-programmable local caches. However, for irregular algorithms such as Breadth First Search (BFS), exploiting locality is a non-trivial task. Guang Gao, a Read more…
To capitalize on the computational potential of parallel processors, programmers must identify bottlenecks that limit their application. These bottlenecks typically chain performance preventing an application from reaching its full potential. Performance analysis typically provides the data and insight necessary to identify opportunities for program optimization. Researchers in the Inderprastha Engineering College identify general bottlenecks for Read more…
<img src=”http://media2.hpcwire.com/hpcwire/Cloud_Storage_and_Bioinformatics_in_a_private_cloud_Fig._3_150x.png” alt=”” width=”95″ height=”95″ />The top research stories of the week include an evaluation of sparse matrix multiplication performance on Xeon Phi versus four other architectures; a survey of HPC energy efficiency; performance modeling of OpenMP, MPI and hybrid scientific applications using weak scaling; an exploration of anywhere, anytime cluster monitoring; and a framework for data-intensive cloud storage.
<img style=”float: left;” src=”http://media2.hpcwire.com/hpcwire/OpenMP_logo_small.bmp” alt=”” width=”112″ height=”36″ />OpenMP, the popular parallel programming standard for high performance computing, is about to come out with a new version incorporating a number of enhancements, the most significant one being support for HPC accelerators. Version 4.0 will include the functionality that was implemented in OpenACC, the accelerator API that splintered off from the OpenMP work, as well as offer additional support beyond that. The new standard is expected to become the the law of the land sometime in early 2013.
<img style=”float: left;” src=”http://media2.hpcwire.com/hpcwire/OpenACC_logo.bmp” alt=”” width=”139″ height=”47″ />PGI, Cray, and CAPS enterprise are moving quickly to get their new OpenACC-supported compilers into the hands of GPGPU developers. At NVIDIA’s GPU Technology Conference this week, there was plenty of discussion around the new HPC accelerator framework, and all three OpenACC compiler makers, as well as NVIDIA, were talking up the technology.
<img style=”float: left;” src=”http://media2.hpcwire.com/hpcwire/knights_corner_small.JPG” alt=”” width=”105″ height=”87″ />As NVIDIA’s upcoming Kepler-grade Tesla GPU prepares to do battle with Intel’s Knight Corner, the companies are busy formulating their respective HPC accelerator stories. While NVIDIA has enjoyed the advantage of actually having products in the field to talk about, Intel has managed to capture the attention of some fence-sitters with assurances of high programmability, simple recompiles, and transparent scalability for its Many Integrated Core (MIC) coprocessors. But according to NVIDIA’s Steve Scott, such promises ignore certain hard truths about how accelerator-based computing really works.
This week Intel unveiled its upmarket version of its Cluster Studio offering aimed at performance-minded MPI application developers. Called Cluster Studio XE, the jazzed-up developer suite adds Intel analysis tools to make it easier for programmers to optimize and tune codes for maximum performance. It also includes the latest compilers, runtimes, and MPI library to keep pace with the new developments in parallel programming.
NERSC director Kathy Yelick shares insights on programming petascale systems like the Hopper system.
CUDA versus OpenMP for GPUs. What’s a developer to do?
Cray has released the details of its GPU-equipped supercomputer: the XK6. The machine is a derivative of the XE6, an AMD Opteron-based machine that the company announced a year ago. Although Cray is calling this week’s announcement the XK6 launch, systems will not be available until the second half of the year.