OpenACC Reviews Latest Developments and Future Plans

By Tiffany Trader

November 11, 2015

This week during the lead up to SC15 the OpenACC standards group announced several new developments including the release and ratification of the 2.5 version of the OpenACC API specification, member support for multiple new OpenACC targets, and other progress with the standard.

“The 2.5 specification addresses an essential challenge of profiling code where a few simple directives transform serial instructions and spread the work across thousands of cores,” said Duncan Poole, president of OpenACC-Standard.org and director of platform alliances, accelerated computing, at NVIDIA. “Using tools that support OpenACC, developers have an important lead in creating code that performs well across a variety of multi-core host devices and accelerators, including Titan, and the upcoming DOE Coral systems.”

OpenACC simplifies the programming of accelerated computing systems through the use of directives, which identify compute intensive code to a compiler for acceleration or offload, while preserving a single code base. The key aim of OpenACC is to enable performance portability across a growing number of HPC processor types, including GPUs, manycore coprocessors and multicore CPUs. In addition to being available from different compiler vendors, the standard is supported by an expanding range of debuggers, profilers and other programming tools.

This year brought the addition of ARM CPU and x86 CPU code generation using OpenACC, noted Poole. The PGI compiler can now accelerate applications across multiple x86 cores, while the PathScale compiler now supports acceleration across the 48 cores of a Cavium ThunderX ARM processor. In 2016, the standards body and its partners will be focusing on OpenACC running on POWER with GPUs, ARM with GPUs and Xeon Phi. “Basically we’re fulfilling the mission that we set out,” said Poole, “which was to be able to build portable code and work with all of the relevant architectures that are either here or emerging.”

Poole also talked up the OpenACC Toolkit, which was introduced in July to give scientists and researchers the tools and documentation they need to be successful with OpenACC. “If you want to see some pickup by the academic and research community, you need a free, but robust compiler,” he said. “NVIDIA put together a combination of the PGI compiler coupled with some key profiling tools and other information that would help a new academic get started and created a toolkit that is free for academic and student use.”

This is one of the ways that OpenACC is extending its developer base. Adoption has grown to some 10,000+ OpenACC developers, a conservative estimate according to the team. Training courses are well-received too, with Cal Poly, for example, having added the OpenACC curriculum as a four-credit course. Further, around 1,850 participants have registered for the OpenACC online course and OpenACC Hackathons are over-subscribed.

“We’re seeing a mix of direct site enthusiasm coupled with support coming from labs and compiler developers,” Poole said.

Key codes being ported during the 2015 Hackathons span a variety of disciplines, including computational fluid dynamics (INCOM3D, HiPSTAR and Numeca), cosmology and astrophysics (CASTRO and MAESTRO), quantum chemistry (LSDALTON), computational physics (Nek-CEM) and many more.

At the NCSA Hackathon, a team successfully accelerated an advanced MRI reconstruction model using NVIDIA GPUs. The challenge for the team was to take serial code and get it running on Blue Waters. Naturally, runtime took a dip at first, but speedups ensued as directives were added. In just a few days, the team managed to reduce reconstruction time for a single high-resolution MRI scan from 40 days to a couple of hours.

“Now that we’ve seen how easy it is to program the GPU using OpenACC and the PGI compiler, we’re looking forward to translating more of our projects,” said Brad Sutton, associate professor of bioengineering and technical director of the Biomedical Imaging Center University of Illinois at Urbana-Champaign. The implementation may even be suitable for powering clinical work, an exciting idea for the staff at Blue Waters.

OpenACC 2.5 and Beyond

Michael Wolfe, PGI compiler engineer and OpenACC technical committee chair, characterized the just-ratified 2.5 spec as somewhat of an interim release. The group has been working on both highly significant features in addition to a number of minor features, he said, and the aim of 2.5 was to take all the features that they could complete for this deadline and put those out. Beyond making some clarifications and fixing some spelling errors, the new release adds the following features:

• Asynchronous Data Movement
• Queue Management Routines
• Kernel Construct Size Clauses
• Profile and Trace Interface

The biggest feature in OpenACC 2.5, according to Wolfe, is the final bullet point — profile and trace interface — which will allow third-party tools vendors to tie into OpenACC runtime so they can access and present the performance related information. The functionality was initially part of a prototype implementation in the PGI OpenACC compiler and now with some minor changes based on feedback from that effort, the standards body has added this capability to the specification with the expectation that it will start being supported this year. TU Dresden has been using the PGI interface for profiling work and will be running a demonstration in their booth (#1351) at SC next week.

The following major features, however, are still going to take some extra effort, said Wolfe:

• Deep Copy – Nested Dynamic Data Structures
+ Substantial User Feedback
+ Builds on 2014 Tech Report
• Exposed Memory Hierarchy Management
• Multiple Device Support

According to Wolfe, deep copy is the signature feature that OpenACC is pushing to have ready for the 3.0 release. He said users have been asking for this for years and there are current machines that could really benefit, such as Piz Daint and Titan. “It’s a way to handle nested dynamic data structures,” Wolfe explained. “If you have a large array and you’re going to compute this on a device like a GPU, you’d want to move the array over to the device. But what if the array is actually an array of struct and each element of that array is a struct that has a sub-member that’s another allocatable array and each one of those sub-members is a different size and you want to move this whole deep data structure and maybe each of those sub-members is an array that is another struct that has another allocatable sub-member? And yes, we’ve seen this at least three levels deep in real scientific applications, weather applications in particular.

“So we need a way to manage the data traffic between host and device for basically pointer following on these deep data structures. It’s the number one most-requested features we’ve had from users from the past couple of years and in particular at the Hackathons,” said Wolfe. “What we’ve been working on is a way to express this in a way that provides all the functionality that we need, so we capture the cases that we know about in a way that is general enough to capture those cases but simple enough that it’s relatively easy to express and to use.”

Moving on to the next bullet point, Wolfe said OpenACC is getting ready for the exposed memory hierarchies associated with upcoming systems. “Think Knights Landing, Xeon Phi with near and far memory, or AMD high-bandwidth memory systems, or NVIDIA in the Pascal Volta timeframe where you’ve got a true unified address space but separate physical memories,” Wolfe prodded, “There is a need to manage data movement across what is basically an exposed memory hierarchy in a way that respects performance but is also as natural and portable as possible across all the different various systems.”

Another upcoming feature slated for a future release is improved multiple device support. While OpenACC already supports multiple devices, it’s not as convenient to use as people would like, said Wolfe. “We’ve had success with people doing multiple MPI ranks and multiple OpenMP threads and having each process, or each thread attach to a different device, and that works, but maybe there’s a better way to make it within a single thread use multiple devices,” Wolfe stated. “Compiling for an x86 multicore or an ARM multicore — if we treat that like a device and you have GPUs, now you have heterogeneous devices. Can we spread the work across all the devices counting the host multicore as a device in itself?”

“The challenge there is more about the data than it is about the compute,” he clarified. “It’s relatively easy to spread the computation across resources, it’s more of a challenge to make sure the data’s in the right place so we get the performance we want. That’s coming up in the next release or following releases.”

Comparisons to OpenMP

We also had a chance to discuss the relationship between OpenMP and OpenACC, and the potential for merging in light of the fact that there are members that are common to both organizations. “If you think about the act of parallelizing your code, just figuring out where those directives should go, is the hard part, and there is overlap in terms of location and placement of these directives,” said Poole. “Because of the cross-over in membership, there’s some value in having developers getting real-world experience, giving their feedback and having real production compilers implement a standard. I think all of that is goodness flowing back into OpenMP. In some ways OpenACC may be the best thing that ever happened to OpenMP in terms of giving that real feedback ahead of time.”

“From a technical perspective, there are certainly ways that OpenMP, MPI, OpenACC, even CUDA can interoperate, so these are not insurmountable challenges; it doesn’t have to be only a single way of programming as I would call the Highlander approach,” he added.

We’ll be diving deeper into this topic in a future piece, but as a teaser, here is a slide that was shared during the briefing:

OpenACC and OpenMP 4 - Jan 2015 slide

For those of you headed to Austin for SC15 next week, OpenACC members will be participating in number of presentations, talks and discussions — more info is available here.

 

Browse News From SC15

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Insights from Optimized Codes on Cineca’s Marconi

February 15, 2019

What can you do with 381,392 CPU cores? For Cineca, it means enabling computational scientists to expand a large part of the world’s body of knowledge from the nanoscale to the astronomic, from calculating quantum effe Read more…

By Ken Strandberg

What Will IBM’s AI Debater Learn from Its Loss?

February 14, 2019

The utility of IBM’s latest man-versus-machine gambit is debatable. At the very least its Project Debater got us thinking about the potential uses of artificial intelligence as a way of helping humans sift through al Read more…

By George Leopold

ClusterVision in Bankruptcy, Fate Uncertain

February 13, 2019

ClusterVision, European HPC specialists that have built and installed over 20 Top500-ranked systems in their nearly 17-year history, appear to be in the midst of bankruptcy proceedings. According to Dutch news site Drimb Read more…

By Tiffany Trader

HPE Extreme Performance Solutions

HPE Systems With Intel Omni-Path: Architected for Value and Accessible High-Performance Computing

Today’s high-performance computing (HPC) and artificial intelligence (AI) users value high performing clusters. And the higher the performance that their system can deliver, the better. Read more…

IBM Accelerated Insights

Medical Research Powered by Data

“We’re all the same, but we’re unique as well. In that uniqueness lies all of the answers….”

  • Mark Tykocinski, MD, Provost, Executive Vice President for Academic Affairs, Thomas Jefferson University

Getting the answers to what causes some people to develop diseases and not others is driving the groundbreaking medical research being conducted by the Computational Medicine Center at Thomas Jefferson University in Philadelphia. Read more…

South African Weather Service Doubles Compute and Triples Storage Capacity of Cray System

February 13, 2019

South Africa has made headlines in recent years for its commitment to HPC leadership in Africa – and now, Cray has announced another major South African HPC expansion. Cray has been awarded contracts with Eclipse Holdings Ltd. to upgrade the supercomputing system operated by the South African Weather Service (SAWS). Read more…

By Oliver Peckham

Insights from Optimized Codes on Cineca’s Marconi

February 15, 2019

What can you do with 381,392 CPU cores? For Cineca, it means enabling computational scientists to expand a large part of the world’s body of knowledge from th Read more…

By Ken Strandberg

ClusterVision in Bankruptcy, Fate Uncertain

February 13, 2019

ClusterVision, European HPC specialists that have built and installed over 20 Top500-ranked systems in their nearly 17-year history, appear to be in the midst o Read more…

By Tiffany Trader

UC Berkeley Paper Heralds Rise of Serverless Computing in the Cloud – Do You Agree?

February 13, 2019

Almost exactly ten years to the day from publishing of their widely-read, seminal paper on cloud computing, UC Berkeley researchers have issued another ambitious examination of cloud computing - Cloud Programming Simplified: A Berkeley View on Serverless Computing. The new work heralds the rise of ‘serverless computing’ as the next dominant phase of cloud computing. Read more…

By John Russell

Iowa ‘Grows Its Own’ to Fill the HPC Workforce Pipeline

February 13, 2019

The global workforce that supports advanced computing, scientific software and high-speed research networks is relatively small when you stop to consider the magnitude of the transformative discoveries it empowers. Technical conferences provide a forum where specialists convene to learn about the latest innovations and schedule face-time with colleagues from other institutions. Read more…

By Elizabeth Leake, STEM-Trek

Trump Signs Executive Order Launching U.S. AI Initiative

February 11, 2019

U.S. President Donald Trump issued an Executive Order (EO) today launching a U.S Artificial Intelligence Initiative. The new initiative - Maintaining American L Read more…

By John Russell

Celebrating Women in Science: Meet Four Women Leading the Way in HPC

February 11, 2019

One only needs to look around at virtually any CS/tech conference to realize that women are underrepresented, and that holds true of HPC. SC hosts over 13,000 H Read more…

By AJ Lauer

IBM Bets $2B Seeking 1000X AI Hardware Performance Boost

February 7, 2019

For now, AI systems are mostly machine learning-based and “narrow” – powerful as they are by today's standards, they're limited to performing a few, narro Read more…

By Doug Black

Assessing Government Shutdown’s Impact on HPC

February 6, 2019

After a 35-day federal government shutdown, the longest in U.S. history, government agencies are taking stock of the damage -- and girding for a potential secon Read more…

By Tiffany Trader

Quantum Computing Will Never Work

November 27, 2018

Amid the gush of money and enthusiastic predictions being thrown at quantum computing comes a proposed cold shower in the form of an essay by physicist Mikhail Read more…

By John Russell

Cray Unveils Shasta, Lands NERSC-9 Contract

October 30, 2018

Cray revealed today the details of its next-gen supercomputing architecture, Shasta, selected to be the next flagship system at NERSC. We've known of the code-name "Shasta" since the Argonne slice of the CORAL project was announced in 2015 and although the details of that plan have changed considerably, Cray didn't slow down its timeline for Shasta. Read more…

By Tiffany Trader

The Case Against ‘The Case Against Quantum Computing’

January 9, 2019

It’s not easy to be a physicist. Richard Feynman (basically the Jimi Hendrix of physicists) once said: “The first principle is that you must not fool yourse Read more…

By Ben Criger

AMD Sets Up for Epyc Epoch

November 16, 2018

It’s been a good two weeks, AMD’s Gary Silcott and Andy Parma told me on the last day of SC18 in Dallas at the restaurant where we met to discuss their show news and recent successes. Heck, it’s been a good year. Read more…

By Tiffany Trader

Intel Reportedly in $6B Bid for Mellanox

January 30, 2019

The latest rumors and reports around an acquisition of Mellanox focus on Intel, which has reportedly offered a $6 billion bid for the high performance interconn Read more…

By Doug Black

US Leads Supercomputing with #1, #2 Systems & Petascale Arm

November 12, 2018

The 31st Supercomputing Conference (SC) - commemorating 30 years since the first Supercomputing in 1988 - kicked off in Dallas yesterday, taking over the Kay Ba Read more…

By Tiffany Trader

Looking for Light Reading? NSF-backed ‘Comic Books’ Tackle Quantum Computing

January 28, 2019

Still baffled by quantum computing? How about turning to comic books (graphic novels for the well-read among you) for some clarity and a little humor on QC. The Read more…

By John Russell

Contract Signed for New Finnish Supercomputer

December 13, 2018

After the official contract signing yesterday, configuration details were made public for the new BullSequana system that the Finnish IT Center for Science (CSC Read more…

By Tiffany Trader

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
NVIDIA @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

Deep500: ETH Researchers Introduce New Deep Learning Benchmark for HPC

February 5, 2019

ETH researchers have developed a new deep learning benchmarking environment – Deep500 – they say is “the first distributed and reproducible benchmarking s Read more…

By John Russell

ClusterVision in Bankruptcy, Fate Uncertain

February 13, 2019

ClusterVision, European HPC specialists that have built and installed over 20 Top500-ranked systems in their nearly 17-year history, appear to be in the midst o Read more…

By Tiffany Trader

IBM Quantum Update: Q System One Launch, New Collaborators, and QC Center Plans

January 10, 2019

IBM made three significant quantum computing announcements at CES this week. One was introduction of IBM Q System One; it’s really the integration of IBM’s Read more…

By John Russell

Nvidia’s Jensen Huang Delivers Vision for the New HPC

November 14, 2018

For nearly two hours on Monday at SC18, Jensen Huang, CEO of Nvidia, presented his expansive view of the future of HPC (and computing in general) as only he can do. Animated. Backstopped by a stream of data charts, product photos, and even a beautiful image of supernovae... Read more…

By John Russell

HPC Reflections and (Mostly Hopeful) Predictions

December 19, 2018

So much ‘spaghetti’ gets tossed on walls by the technology community (vendors and researchers) to see what sticks that it is often difficult to peer through Read more…

By John Russell

IBM Bets $2B Seeking 1000X AI Hardware Performance Boost

February 7, 2019

For now, AI systems are mostly machine learning-based and “narrow” – powerful as they are by today's standards, they're limited to performing a few, narro Read more…

By Doug Black

The Deep500 – Researchers Tackle an HPC Benchmark for Deep Learning

January 7, 2019

How do you know if an HPC system, particularly a larger-scale system, is well-suited for deep learning workloads? Today, that’s not an easy question to answer Read more…

By John Russell

Intel Confirms 48-Core Cascade Lake-AP for 2019

November 4, 2018

As part of the run-up to SC18, taking place in Dallas next week (Nov. 11-16), Intel is doling out info on its next-gen Cascade Lake family of Xeon processors, specifically the “Advanced Processor” version (Cascade Lake-AP), architected for high-performance computing, artificial intelligence and infrastructure-as-a-service workloads. Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This