OpenCL Update

By Michael D. McCool and Stefanus Du Toit

November 21, 2008

OpenCL (the Open Computing Language) is under development by the Khronos Group as an open, royalty-free standard for parallel programming of heterogeneous systems. It provides a common hardware abstraction layer to expose the computational capabilities of systems that include a diverse mix of multicore CPUs, GPUs and other parallel processors such as DSPs and the Cell, for use in accelerating a variety of compute-intensive applications. The intent of the OpenCL initiative is to provide a common foundational layer for other technologies to build upon. The OpenCL standard will also have the effect of coordinating the basic capabilities of target processors. In particular, in order to be conformant with OpenCL, processors will have to meet minimum capability, resource and precision requirements. This article reviews the organizations and process behind the OpenCL standard proposal, gives a brief overview of the nature of the proposal itself, and then discusses the implications of OpenCL for the high-performance software development community.

The Khronos organization supports the collaborative development and maintenance of several royalty-free open standards, including OpenGL, OpenGL ES, COLLADA, and OpenMAX. OpenCL is not yet ratified, but the member companies involved have already arrived at a draft specification of version 1.0, which is currently under review. The OpenCL effort was initiated by Apple, and the development of the draft specification has included the active involvement of AMD, ARM, Barco, Codeplay, Electronic Arts, Ericsson, Freescale, Imagination Technologies, IBM, Intel, Motorola, Movidia, Nokia, NVIDIA, RapidMind, and Texas Instruments.

The OpenCL specification consists of three main components: a platform API, a language for specifying computational kernels, and a runtime API. The platform API allows a developer to query a given OpenCL implementation to determine the capabilities of the devices that particular implementation supports. Once a device has been selected and a context created, the runtime API can be used to queue and manage computational and memory operations for that device. OpenCL manages and coordinates such operations using an asynchronous command queue. OpenCL command queues can include computational kernels as well as memory transfer and map/unmap operations. Asynchronous memory operations are included in order to efficiently support the separate address spaces and DMA engines used by many accelerators.

The parallel execution model of OpenCL is based on the execution of an array of functions over an abstract index space. The abstract index spaces driving parallel execution consists of n-tuples of integers with each element starting at 0. For instance, 16 parallel units of work could be associated with an index space from 0 to 15. Alternatively, using 2-tuples, those 16 units of work could be associated with (0,0) to (3,3). Three-dimensional index spaces are also supported. Computational kernels invoked over these index spaces are based on functions drawn from programs specified in OpenCL C. OpenCL C is a subset of C99 with extensions for parallelism. These extensions include support for vector types, images and built-in functions to read and write images, and memory hierarchy qualifiers for local, global, constant, and private memory spaces. The OpenCL C language also currently includes some restrictions relative to C99, particularly with regards to dynamic memory allocation, function pointers, writes to byte addresses, irreducible control flow, and recursion. Programs written in OpenCL C can either be compiled at runtime or in advance. However, OpenCL C programs compiled in advance may only work on specific hardware devices.

Each instance of a kernel is able to query its index, and then do different work and access different data based on that index. The index space defines the “parallel shape” of the work, but it is up to the kernel to decide how the abstract index will translate into data access and computation. For example, to add two arrays and place the sum in an a third output array, a kernel might access its global index, from this index compute an address in each of two input arrays, read from these arrays, perform the addition, compute the address of its result in an output array, and write the result.

A hierarchical memory model is also supported. In this model, the index space is divided into work groups. Each work-item in a work-group, in addition to accessing its own private memory, can share a local memory during the execution of the work-group. This can be used to support one additional level of hierarchical data parallelism, which is useful to capture data locality in applications such as video/image compression and matrix multiplication. However, different work-groups cannot communicate or synchronize with one another, although work items within a work-group can synchronize using barriers and communicate using local memory (if supported on a particular device). There is an extension for atomic memory operations but it is optional (for now).

OpenCL uses a relaxed memory consistency model where the local view of memory from each kernel is only guaranteed to be consistent after specified synchronization points. Synchronization points include barriers within kernels (which can only be used to synchronize the view of local memory between elements of a work-group), and queue “events.” Event dependencies can be used to synchronize commands on the work queue. Dependencies between commands come in two forms: implicit and explicit. Command queues in OpenCL can run in two modes: in-order and out-of-order. In an in-order queue, commands are implicitly ordered by their position in the queue, and the result of execution must be consistent with this order. In the out-of-order mode, OpenCL is free to run some of the commands in the queue in parallel. However, the order can be constrained explicitly by specifying event lists for each command when it is enqueued. This will cause some commands to wait until the specified events have completed. Events can be based on the completion of memory transfer operations and explicit barriers as well as kernel invocations. All commands return an event handle which can be added to a list of dependencies for commands enqueued later.

In addition to encouraging standardization between the basic capabilities of different high-performance processors, OpenCL will have a few other interesting effects. One of these will be to open up the embedded and handheld spaces to accelerated computing. OpenCL supports an embedded profile that differs primarily from the full OpenCL profile in resource limits and precision requirements. This means that it will be possible to use OpenCL to access the computational power of embedded multicore processors, including embedded GPUs, in mobile phones and set-top boxes in order to enable high-performance imaging, vision, game physics, and other applications. Applications, libraries, middleware and high-level languages based on OpenCL will be able to access the computational power of these devices.

In summary, OpenCL is an open, royalty-free standard that will enable portable, parallel programming of heterogeneous CPUs, GPUs and other processors. OpenCL is designed as a foundational layer for low-level access to hardware and also establishes a level of consistency between high-performance processors. This will give high-performance application and library writers, as well as high-level language, platform, and middleware developers, the ability to focus on higher-level concerns rather than dealing with variant semantics and syntax for the same concepts from different vendors. OpenCL will allow library, application and middleware developers to focus their efforts on providing greater functionality, rather than redeveloping code or lower-level interfaces to each new processor and accelerator.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

African Supercomputing Center Inaugurates ‘Toubkal,’ Most Powerful Supercomputer on the Continent

February 25, 2021

Historically, Africa hasn’t exactly been synonymous with supercomputing. There are only a handful of supercomputers on the continent, with few ranking on the global stage. Now, the Mohammed VI Polytechnic University (U Read more…

By Oliver Peckham

Supercomputer-Powered Machine Learning Supports Fusion Energy Reactor Design

February 25, 2021

Energy researchers have been reaching for the stars for decades in their attempt to artificially recreate a stable fusion energy reactor. If successful, such a reactor would revolutionize the world’s energy supply over Read more…

By Oliver Peckham

Japan to Debut Integrated Fujitsu HPC/AI Supercomputer This Spring

February 25, 2021

The integrated Fujitsu HPC/AI Supercomputer, Wisteria, is coming to Japan this spring. The University of Tokyo is preparing to deploy a heterogeneous computing system, called "Wisteria/BDEC-01," that will tackle simulati Read more…

By Tiffany Trader

President Biden Signs Executive Order to Review Chip, Other Supply Chains

February 24, 2021

U.S. President Biden signed an executive order late today calling for a 100-day review of key supply chains including semiconductors, large capacity batteries, pharmaceuticals, and rare-earth elements. The scarcity of ch Read more…

By John Russell

Xilinx Launches Alveo SN1000 SmartNIC

February 24, 2021

FPGA vendor Xilinx has debuted its latest SmartNIC model, the Alveo SN1000, with integrated “composability” features that allow enterprise users to add their own custom networking functions to supplement its built-in networking. By providing deep flexibility... Read more…

By Todd R. Weiss

AWS Solution Channel

Introducing AWS HPC Tech Shorts

Amazon Web Services (AWS) is excited to announce a new videos series focused on running HPC workloads on AWS. This new video series will cover HPC workloads from genomics, computational chemistry, to computational fluid dynamics (CFD) and more. Read more…

ASF Keynotes Showcase How HPC and Big Data Have Pervaded the Pandemic

February 24, 2021

Last Thursday, a range of experts joined the Advanced Scale Forum (ASF) in a rapid-fire roundtable to discuss how advanced technologies have transformed the way humanity responded to the COVID-19 pandemic in indelible ways. The roundtable, held near the one-year mark of the first... Read more…

By Oliver Peckham

Japan to Debut Integrated Fujitsu HPC/AI Supercomputer This Spring

February 25, 2021

The integrated Fujitsu HPC/AI Supercomputer, Wisteria, is coming to Japan this spring. The University of Tokyo is preparing to deploy a heterogeneous computing Read more…

By Tiffany Trader

Xilinx Launches Alveo SN1000 SmartNIC

February 24, 2021

FPGA vendor Xilinx has debuted its latest SmartNIC model, the Alveo SN1000, with integrated “composability” features that allow enterprise users to add their own custom networking functions to supplement its built-in networking. By providing deep flexibility... Read more…

By Todd R. Weiss

ASF Keynotes Showcase How HPC and Big Data Have Pervaded the Pandemic

February 24, 2021

Last Thursday, a range of experts joined the Advanced Scale Forum (ASF) in a rapid-fire roundtable to discuss how advanced technologies have transformed the way humanity responded to the COVID-19 pandemic in indelible ways. The roundtable, held near the one-year mark of the first... Read more…

By Oliver Peckham

IBM’s Prototype Low-Power 7nm AI Chip Offers ‘Precision Scaling’

February 23, 2021

IBM has released details of a prototype AI chip geared toward low-precision training and inference across different AI model types while retaining model quality within AI applications. In a paper delivered during this year’s International Solid-State Circuits Virtual Conference, IBM... Read more…

By George Leopold

IBM Continues Mainstreaming Power Systems and Integrating Red Hat in Pivot to Cloud

February 23, 2021

As IBM continues its massive pivot to the cloud, its Power-microprocessor-based products are being mainstreamed and realigned with the corporate-wide strategy. Read more…

By John Russell

Livermore’s El Capitan Supercomputer to Debut HPE ‘Rabbit’ Near Node Local Storage

February 18, 2021

A near node local storage innovation called Rabbit factored heavily into Lawrence Livermore National Laboratory’s decision to select Cray’s proposal for its CORAL-2 machine, the lab’s first exascale-class supercomputer, El Capitan. Details of this new storage technology were revealed... Read more…

By Tiffany Trader

ENIAC at 75: Celebrating the World’s First Supercomputer

February 15, 2021

With little fanfare, today’s computer revolution was arguably born and announced through a small, innocuous, two-column story at the bottom of the front page of The New York Times on Feb. 15, 1946. In that story and others, the previously classified project, ENIAC... Read more…

By Todd R. Weiss

Microsoft, HPE Bringing AI, Edge, Cloud to Earth Orbit in Preparation for Mars Missions

February 12, 2021

The International Space Station will soon get a delivery of powerful AI, edge and cloud computing tools from HPE and Microsoft Azure to expand technology experi Read more…

By Todd R. Weiss

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

By John Russell

Esperanto Unveils ML Chip with Nearly 1,100 RISC-V Cores

December 8, 2020

At the RISC-V Summit today, Art Swift, CEO of Esperanto Technologies, announced a new, RISC-V based chip aimed at machine learning and containing nearly 1,100 low-power cores based on the open-source RISC-V architecture. Esperanto Technologies, headquartered in... Read more…

By Oliver Peckham

Azure Scaled to Record 86,400 Cores for Molecular Dynamics

November 20, 2020

A new record for HPC scaling on the public cloud has been achieved on Microsoft Azure. Led by Dr. Jer-Ming Chia, the cloud provider partnered with the Beckman I Read more…

By Oliver Peckham

NICS Unleashes ‘Kraken’ Supercomputer

April 4, 2008

A Cray XT4 supercomputer, dubbed Kraken, is scheduled to come online in mid-summer at the National Institute for Computational Sciences (NICS). The soon-to-be petascale system, and the resulting NICS organization, are the result of an NSF Track II award of $65 million to the University of Tennessee and its partners to provide next-generation supercomputing for the nation's science community. Read more…

Programming the Soon-to-Be World’s Fastest Supercomputer, Frontier

January 5, 2021

What’s it like designing an app for the world’s fastest supercomputer, set to come online in the United States in 2021? The University of Delaware’s Sunita Chandrasekaran is leading an elite international team in just that task. Chandrasekaran, assistant professor of computer and information sciences, recently was named... Read more…

By Tracey Bryant

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

By Doug Black

Top500: Fugaku Keeps Crown, Nvidia’s Selene Climbs to #5

November 16, 2020

With the publication of the 56th Top500 list today from SC20's virtual proceedings, Japan's Fugaku supercomputer – now fully deployed – notches another win, Read more…

By Tiffany Trader

Gordon Bell Special Prize Goes to Massive SARS-CoV-2 Simulations

November 19, 2020

2020 has proven a harrowing year – but it has produced remarkable heroes. To that end, this year, the Association for Computing Machinery (ACM) introduced the Read more…

By Oliver Peckham

Leading Solution Providers

Contributors

Texas A&M Announces Flagship ‘Grace’ Supercomputer

November 9, 2020

Texas A&M University has announced its next flagship system: Grace. The new supercomputer, named for legendary programming pioneer Grace Hopper, is replacing the Ada system (itself named for mathematician Ada Lovelace) as the primary workhorse for Texas A&M’s High Performance Research Computing (HPRC). Read more…

By Oliver Peckham

At Oak Ridge, ‘End of Life’ Sometimes Isn’t

October 31, 2020

Sometimes, the old dog actually does go live on a farm. HPC systems are often cursed with short lifespans, as they are continually supplanted by the latest and Read more…

By Oliver Peckham

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

By Oliver Peckham

Intel Xe-HP GPU Deployed for Aurora Exascale Development

November 17, 2020

At SC20, Intel announced that it is making its Xe-HP high performance discrete GPUs available to early access developers. Notably, the new chips have been deplo Read more…

By Tiffany Trader

Intel Teases Ice Lake-SP, Shows Competitive Benchmarking

November 17, 2020

At SC20 this week, Intel teased its forthcoming third-generation Xeon "Ice Lake-SP" server processor, claiming competitive benchmarking results against AMD's second-generation Epyc "Rome" processor. Ice Lake-SP, Intel's first server processor with 10nm technology... Read more…

By Tiffany Trader

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

By John Russell

It’s Fugaku vs. COVID-19: How the World’s Top Supercomputer Is Shaping Our New Normal

November 9, 2020

Fugaku is currently the most powerful publicly ranked supercomputer in the world – but we weren’t supposed to have it yet. The supercomputer, situated at Japan’s Riken scientific research institute, was scheduled to come online in 2021. When the pandemic struck... Read more…

By Oliver Peckham

MIT Makes a Big Breakthrough in Nonsilicon Transistors

December 10, 2020

What if Silicon Valley moved beyond silicon? In the 80’s, Seymour Cray was asking the same question, delivering at Supercomputing 1988 a talk titled “What’s All This About Gallium Arsenide?” The supercomputing legend intended to make gallium arsenide (GaA) the material of the future... Read more…

By Oliver Peckham

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire