OpenCL Update

By Michael D. McCool and Stefanus Du Toit

November 21, 2008

OpenCL (the Open Computing Language) is under development by the Khronos Group as an open, royalty-free standard for parallel programming of heterogeneous systems. It provides a common hardware abstraction layer to expose the computational capabilities of systems that include a diverse mix of multicore CPUs, GPUs and other parallel processors such as DSPs and the Cell, for use in accelerating a variety of compute-intensive applications. The intent of the OpenCL initiative is to provide a common foundational layer for other technologies to build upon. The OpenCL standard will also have the effect of coordinating the basic capabilities of target processors. In particular, in order to be conformant with OpenCL, processors will have to meet minimum capability, resource and precision requirements. This article reviews the organizations and process behind the OpenCL standard proposal, gives a brief overview of the nature of the proposal itself, and then discusses the implications of OpenCL for the high-performance software development community.

The Khronos organization supports the collaborative development and maintenance of several royalty-free open standards, including OpenGL, OpenGL ES, COLLADA, and OpenMAX. OpenCL is not yet ratified, but the member companies involved have already arrived at a draft specification of version 1.0, which is currently under review. The OpenCL effort was initiated by Apple, and the development of the draft specification has included the active involvement of AMD, ARM, Barco, Codeplay, Electronic Arts, Ericsson, Freescale, Imagination Technologies, IBM, Intel, Motorola, Movidia, Nokia, NVIDIA, RapidMind, and Texas Instruments.

The OpenCL specification consists of three main components: a platform API, a language for specifying computational kernels, and a runtime API. The platform API allows a developer to query a given OpenCL implementation to determine the capabilities of the devices that particular implementation supports. Once a device has been selected and a context created, the runtime API can be used to queue and manage computational and memory operations for that device. OpenCL manages and coordinates such operations using an asynchronous command queue. OpenCL command queues can include computational kernels as well as memory transfer and map/unmap operations. Asynchronous memory operations are included in order to efficiently support the separate address spaces and DMA engines used by many accelerators.

The parallel execution model of OpenCL is based on the execution of an array of functions over an abstract index space. The abstract index spaces driving parallel execution consists of n-tuples of integers with each element starting at 0. For instance, 16 parallel units of work could be associated with an index space from 0 to 15. Alternatively, using 2-tuples, those 16 units of work could be associated with (0,0) to (3,3). Three-dimensional index spaces are also supported. Computational kernels invoked over these index spaces are based on functions drawn from programs specified in OpenCL C. OpenCL C is a subset of C99 with extensions for parallelism. These extensions include support for vector types, images and built-in functions to read and write images, and memory hierarchy qualifiers for local, global, constant, and private memory spaces. The OpenCL C language also currently includes some restrictions relative to C99, particularly with regards to dynamic memory allocation, function pointers, writes to byte addresses, irreducible control flow, and recursion. Programs written in OpenCL C can either be compiled at runtime or in advance. However, OpenCL C programs compiled in advance may only work on specific hardware devices.

Each instance of a kernel is able to query its index, and then do different work and access different data based on that index. The index space defines the “parallel shape” of the work, but it is up to the kernel to decide how the abstract index will translate into data access and computation. For example, to add two arrays and place the sum in an a third output array, a kernel might access its global index, from this index compute an address in each of two input arrays, read from these arrays, perform the addition, compute the address of its result in an output array, and write the result.

A hierarchical memory model is also supported. In this model, the index space is divided into work groups. Each work-item in a work-group, in addition to accessing its own private memory, can share a local memory during the execution of the work-group. This can be used to support one additional level of hierarchical data parallelism, which is useful to capture data locality in applications such as video/image compression and matrix multiplication. However, different work-groups cannot communicate or synchronize with one another, although work items within a work-group can synchronize using barriers and communicate using local memory (if supported on a particular device). There is an extension for atomic memory operations but it is optional (for now).

OpenCL uses a relaxed memory consistency model where the local view of memory from each kernel is only guaranteed to be consistent after specified synchronization points. Synchronization points include barriers within kernels (which can only be used to synchronize the view of local memory between elements of a work-group), and queue “events.” Event dependencies can be used to synchronize commands on the work queue. Dependencies between commands come in two forms: implicit and explicit. Command queues in OpenCL can run in two modes: in-order and out-of-order. In an in-order queue, commands are implicitly ordered by their position in the queue, and the result of execution must be consistent with this order. In the out-of-order mode, OpenCL is free to run some of the commands in the queue in parallel. However, the order can be constrained explicitly by specifying event lists for each command when it is enqueued. This will cause some commands to wait until the specified events have completed. Events can be based on the completion of memory transfer operations and explicit barriers as well as kernel invocations. All commands return an event handle which can be added to a list of dependencies for commands enqueued later.

In addition to encouraging standardization between the basic capabilities of different high-performance processors, OpenCL will have a few other interesting effects. One of these will be to open up the embedded and handheld spaces to accelerated computing. OpenCL supports an embedded profile that differs primarily from the full OpenCL profile in resource limits and precision requirements. This means that it will be possible to use OpenCL to access the computational power of embedded multicore processors, including embedded GPUs, in mobile phones and set-top boxes in order to enable high-performance imaging, vision, game physics, and other applications. Applications, libraries, middleware and high-level languages based on OpenCL will be able to access the computational power of these devices.

In summary, OpenCL is an open, royalty-free standard that will enable portable, parallel programming of heterogeneous CPUs, GPUs and other processors. OpenCL is designed as a foundational layer for low-level access to hardware and also establishes a level of consistency between high-performance processors. This will give high-performance application and library writers, as well as high-level language, platform, and middleware developers, the ability to focus on higher-level concerns rather than dealing with variant semantics and syntax for the same concepts from different vendors. OpenCL will allow library, application and middleware developers to focus their efforts on providing greater functionality, rather than redeveloping code or lower-level interfaces to each new processor and accelerator.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

GTC 2019: Chief Scientist Bill Dally Provides Glimpse into Nvidia Research Engine

March 22, 2019

Amid the frenzy of GTC this week – Nvidia’s annual conference showcasing all things GPU (and now AI) – William Dally, chief scientist and SVP of research, provided a brief but insightful portrait of Nvidia’s rese Read more…

By John Russell

ORNL Helps Identify Challenges of Extremely Heterogeneous Architectures

March 21, 2019

Exponential growth in classical computing over the last two decades has produced hardware and software that support lightning-fast processing speeds, but advancements are topping out as computing architectures reach thei Read more…

By Laurie Varma

Interview with 2019 Person to Watch Jim Keller

March 21, 2019

On the heels of Intel's reaffirmation that it will deliver the first U.S. exascale computer in 2021, which will feature the company's new Intel Xe architecture, we bring you our interview with our 2019 Person to Watch Jim Keller, head of the Silicon Engineering Group at Intel. Read more…

By HPCwire Editorial Team

HPE Extreme Performance Solutions

HPE and Intel® Omni-Path Architecture: How to Power a Cloud

Learn how HPE and Intel® Omni-Path Architecture provide critical infrastructure for leading Nordic HPC provider’s HPCFLOW cloud service.

powercloud_blog.jpgFor decades, HPE has been at the forefront of high-performance computing, and we’ve powered some of the fastest and most robust supercomputers in the world. Read more…

IBM Accelerated Insights

Insurance: Where’s the Risk?

Insurers are facing extreme competitive challenges in their core businesses. Property and Casualty (P&C) and Life and Health (L&H) firms alike are highly impacted by the ongoing globalization, increasing regulation, and digital transformation of their client bases. Read more…

What’s New in HPC Research: TensorFlow, Buddy Compression, Intel Optane & More

March 20, 2019

In this bimonthly feature, HPCwire highlights newly published research in the high-performance computing community and related domains. From parallel programming to exascale to quantum computing, the details are here. Read more…

By Oliver Peckham

GTC 2019: Chief Scientist Bill Dally Provides Glimpse into Nvidia Research Engine

March 22, 2019

Amid the frenzy of GTC this week – Nvidia’s annual conference showcasing all things GPU (and now AI) – William Dally, chief scientist and SVP of research, Read more…

By John Russell

At GTC: Nvidia Expands Scope of Its AI and Datacenter Ecosystem

March 19, 2019

In the high-stakes race to provide the AI life-cycle solution of choice, three of the biggest horses in the field are IBM, Intel and Nvidia. While the latter is only a fraction of the size of its two bigger rivals, and has been in business for only a fraction of the time, Nvidia continues to impress with an expanding array of new GPU-based hardware, software, robotics, partnerships and... Read more…

By Doug Black

Nvidia Debuts Clara AI Toolkit with Pre-Trained Models for Radiology Use

March 19, 2019

AI’s push into healthcare got a boost yesterday with Nvidia’s release of the Clara Deploy AI toolkit which includes 13 pre-trained models for use in radiolo Read more…

By John Russell

It’s Official: Aurora on Track to Be First US Exascale Computer in 2021

March 18, 2019

The U.S. Department of Energy along with Intel and Cray confirmed today that an Intel/Cray supercomputer, "Aurora," capable of sustained performance of one exaf Read more…

By Tiffany Trader

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

Oil and Gas Supercloud Clears Out Remaining Knights Landing Inventory: All 38,000 Wafers

March 13, 2019

The McCloud HPC service being built by Australia’s DownUnder GeoSolutions (DUG) outside Houston is set to become the largest oil and gas cloud in the world th Read more…

By Tiffany Trader

Quick Take: Trump’s 2020 Budget Spares DoE-funded HPC but Slams NSF and NIH

March 12, 2019

U.S. President Donald Trump’s 2020 budget request, released yesterday, proposes deep cuts in many science programs but seems to spare HPC funding by the Depar Read more…

By John Russell

Nvidia Wins Mellanox Stakes for $6.9 Billion

March 11, 2019

The long-rumored acquisition of Mellanox came to fruition this morning with GPU chipmaker Nvidia’s announcement that it has purchased the high-performance net Read more…

By Doug Black

Quantum Computing Will Never Work

November 27, 2018

Amid the gush of money and enthusiastic predictions being thrown at quantum computing comes a proposed cold shower in the form of an essay by physicist Mikhail Read more…

By John Russell

The Case Against ‘The Case Against Quantum Computing’

January 9, 2019

It’s not easy to be a physicist. Richard Feynman (basically the Jimi Hendrix of physicists) once said: “The first principle is that you must not fool yourse Read more…

By Ben Criger

ClusterVision in Bankruptcy, Fate Uncertain

February 13, 2019

ClusterVision, European HPC specialists that have built and installed over 20 Top500-ranked systems in their nearly 17-year history, appear to be in the midst o Read more…

By Tiffany Trader

Intel Reportedly in $6B Bid for Mellanox

January 30, 2019

The latest rumors and reports around an acquisition of Mellanox focus on Intel, which has reportedly offered a $6 billion bid for the high performance interconn Read more…

By Doug Black

Why Nvidia Bought Mellanox: ‘Future Datacenters Will Be…Like High Performance Computers’

March 14, 2019

“Future datacenters of all kinds will be built like high performance computers,” said Nvidia CEO Jensen Huang during a phone briefing on Monday after Nvidia revealed scooping up the high performance networking company Mellanox for $6.9 billion. Read more…

By Tiffany Trader

Looking for Light Reading? NSF-backed ‘Comic Books’ Tackle Quantum Computing

January 28, 2019

Still baffled by quantum computing? How about turning to comic books (graphic novels for the well-read among you) for some clarity and a little humor on QC. The Read more…

By John Russell

Contract Signed for New Finnish Supercomputer

December 13, 2018

After the official contract signing yesterday, configuration details were made public for the new BullSequana system that the Finnish IT Center for Science (CSC Read more…

By Tiffany Trader

Deep500: ETH Researchers Introduce New Deep Learning Benchmark for HPC

February 5, 2019

ETH researchers have developed a new deep learning benchmarking environment – Deep500 – they say is “the first distributed and reproducible benchmarking s Read more…

By John Russell

Leading Solution Providers

SC 18 Virtual Booth Video Tour

Advania @ SC18 AMD @ SC18
ASRock Rack @ SC18
DDN Storage @ SC18
HPE @ SC18
IBM @ SC18
Lenovo @ SC18 Mellanox Technologies @ SC18
NVIDIA @ SC18
One Stop Systems @ SC18
Oracle @ SC18 Panasas @ SC18
Supermicro @ SC18 SUSE @ SC18 TYAN @ SC18
Verne Global @ SC18

It’s Official: Aurora on Track to Be First US Exascale Computer in 2021

March 18, 2019

The U.S. Department of Energy along with Intel and Cray confirmed today that an Intel/Cray supercomputer, "Aurora," capable of sustained performance of one exaf Read more…

By Tiffany Trader

IBM Quantum Update: Q System One Launch, New Collaborators, and QC Center Plans

January 10, 2019

IBM made three significant quantum computing announcements at CES this week. One was introduction of IBM Q System One; it’s really the integration of IBM’s Read more…

By John Russell

IBM Bets $2B Seeking 1000X AI Hardware Performance Boost

February 7, 2019

For now, AI systems are mostly machine learning-based and “narrow” – powerful as they are by today's standards, they're limited to performing a few, narro Read more…

By Doug Black

The Deep500 – Researchers Tackle an HPC Benchmark for Deep Learning

January 7, 2019

How do you know if an HPC system, particularly a larger-scale system, is well-suited for deep learning workloads? Today, that’s not an easy question to answer Read more…

By John Russell

HPC Reflections and (Mostly Hopeful) Predictions

December 19, 2018

So much ‘spaghetti’ gets tossed on walls by the technology community (vendors and researchers) to see what sticks that it is often difficult to peer through Read more…

By John Russell

Arm Unveils Neoverse N1 Platform with up to 128-Cores

February 20, 2019

Following on its Neoverse roadmap announcement last October, Arm today revealed its next-gen Neoverse microarchitecture with compute and throughput-optimized si Read more…

By Tiffany Trader

Move Over Lustre & Spectrum Scale – Here Comes BeeGFS?

November 26, 2018

Is BeeGFS – the parallel file system with European roots – on a path to compete with Lustre and Spectrum Scale worldwide in HPC environments? Frank Herold Read more…

By John Russell

France to Deploy AI-Focused Supercomputer: Jean Zay

January 22, 2019

HPE announced today that it won the contract to build a supercomputer that will drive France’s AI and HPC efforts. The computer will be part of GENCI, the Fre Read more…

By Tiffany Trader

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This