Visit additional Tabor Communication Publications
December 07, 2011
Lost in the flotilla of vendor news at the Supercomputing Conference (SC11) in Seattle last month was the announcement of a new directives-based parallel programming standard for accelerators. Called OpenACC, the open standard is intended to bring GPU computing into the realm of the average programmer, while making the resulting code portable across other accelerators and even multicore CPUs.
For obvious reasons, OpenACC is being heavily promoted and supported by NVIDIA, but it is The Portland Group (PGI) and Cray who are driving the early effort to commercialize the technology. PGI already has implemented a very similar a set of accelerator directives, which became part of the foundation for the OpenACC standard. Cray is developing its own OpenACC compiler and its XK6 customers, like Oak Ridge National Lab and the Swiss National Supercomputing Centre, are expected to be among the first supercomputer users of the technology
In a nutshell, OpenACC directives work much the same as OpenMP directives, but are specifically applicable to highly data parallel codes. They can be inserted into standard C, C++ and Fortran programs to direct the compiler to parallelize certain code sections. The compiler takes care of the logistics of moving data back and forth between the CPU and the GPU (or whatever) and mapping the computation onto the appropriate processor.
The idea is to enable developers to make relatively small modifications to existing (or new) code in order to expose parallel regions for acceleration. Since the directives are designed to apply to a generic parallel processor, the same code can run on a multicore CPU, GPU, or any other type of parallel hardware that is supported by the compiler. This hardware independence is especially important to the HPC community, which is loathe to adopt vendor-specific, non-portable programming environments.
From NVIDIA's perspective, the overriding goal is to bring GPU computing into the post-CUDA age. CUDA C and Fortran are the most widely used programming languages for GPU programming today, but the underlying technology is proprietary to NVIDIA and offers a relatively low-level software model of GPU computing. As a result, the use of CUDA today tends to be restricted to computer science types, rather than the average programmer or researcher.
OpenCL, which is supported by NVIDIA, AMD and many others, also provides a parallel programming framework for GPUs and other accelerators, and unlike CUDA, is a bona fide open standard (under the direction of the Khronos Group -- the same organization that brought us OpenGL). But like CUDA, OpenCL is relatively low-level, requiring a fairly intimate knowledge of the inner workings of the target processor. Therefore, like CUDA, use of OpenCL is mostly confined to computer scientists.
NVIDIA estimates there are over 100,000 CUDA programmers on the planet and a substantially smaller number of OpenCL developers, but they see a much larger potential audience if they can make GPU programming more open and developer-friendly. Essentially they believe OpenACC will be able to make GPU technology accessible to the millions of scientists and researchers who don't care to dabble in the low-level intricacies of processor architectures and chip-to-chip communications.
Steve Scott, CTO of NVIDIA's Tesla business unit, sums up the goal of OpenACC thusly: "What we'd like to do at this point is to substantially increase the breadth of applicability and the number of people using GPUs."
According to Scott, the high-level nature of OpenACC is not going to impact execution performance significantly. While in his previous CTO role at Cray, he encountered accelerator directives-based codes that were getting within 5 or 10 percent of the performance of hand-coded CUDA. According to him, that was fairly typical. Some applications, Scott says, were even doing better than their CUDA alternates, thanks to the ability of the compiler to optimize certain codes beyond what mere mortals could achieve. In any case, OpenACC is designed to be interoperable with CUDA, so hand-tuned kernels can work seamlessly with directives-based code if need be.
Besides PGI and Cray, CAPS enterprise, a French developer of multicore software tools, has also signed up to support the new directives. All three vendors are expected to have compilers with OpenACC support ready in the first half of 2012. Notably missing from the list of OpenACC supporters are Intel and AMD, although both have processors (multicore x86, AMD APUs and GPUs, and the Intel MIC) that would certainly be capable targets. That wouldn't necessarily stop PGI, CAPS, or Cray from building OpenACC-enabled compilers for Intel and AMD hardware, however.
PGI and NVIDIA are in the process of running a free 30-day trial for developers interested in kicking the tires on PGI's current accelerator directive compiler. The claim is that the technology will at least double application performance with less than 4 weeks of developer effort. Hundreds of researchers have already registered for the trial and this week NVIDIA has reported some initial results. At least one developer was able to get a 5X performance boost on his application after just a single day of tweaking the code.
But the real end game for OpenACC supporters is for the directives to be incorporated into the OpenMP standard. Since OpenACC was derived from work done within the OpenMP Working Group on Accelerators, it stands to reason that this will indeed happen. Although there is no timeline for when the technology will be folded into OpenMP, it's most likely to be occur in conjunction with the release of OpenMP 4.0, which is expected to be launched sometime in 2012.
May 23, 2013 |
The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.