HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Features

Programming Models for Scalable Multicore Programming


Page:  1  of  3
1 | 2 | 3   All  »  

Multicore devices will quickly evolve in both architecture and core count. This will motivate software developers to decouple the code from the hardware, in order to enable applications to move between different architectures and automatically scale as new processor generations are introduced. An appropriate programming model can enable this decoupling while maintaining -- and even enhancing -- performance.

Moore's Law is a statement about transistor density increasing over time. It has become harder and harder to squeeze extra performance out of a single core by using more transistors, and the fact that power consumption increases rapidly and nonlinearly with clock rate blocks further increases in performance by scaling to higher gigahertz ratings. Therefore, all major processor vendors have now switched to an explicitly parallel, multicore processor strategy. By combining multiple small, efficient cores onto a single chip, it is possible to get much higher overall performance and simultaneously improve power efficiency.

Unfortunately, only parallelized applications can exploit this additional performance. In fact, since the individual cores on a processor are often slower than the large single-core processors of the past, non-parallelized applications may in fact be slower on multicore processors. Also, since the number of cores will grow exponentially over time (under the new interpretation of Moore's Law), any application, in order to grow in performance, must be written to use any number of cores in a scalable fashion.

Autoparallelization tools are unlikely to help. Modern processors already exploit internally much of the implicit parallelism in an application, in the form of low-level instruction level parallelism (ILP). It has been shown that most applications have relatively small amounts of such implicit parallelism, and that this is already nearly fully utilized by modern processors.

However, there are further complications. The memory system is actually the chief bottleneck in many applications. In order to take advantage of the increased computational performance of a processor, the data must be moved onto the chip and off again as efficiently as possible. If the data rate cannot keep pace with the computational performance, than any increase in on-chip computational performance is useless.

In a multicore processor, all cores on a processor must share a finite off-chip bandwidth, making memory access even more of a bottleneck. Also, accessing main memory from the processor, for data that is not in cache, can take hundreds of processor clock cycles to complete. This latency can severely degrade performance since in the worst case the processor must stall while waiting for the memory access to complete.

There is a solution to this: even more parallelism! If the processor has extra, independent work to do while waiting for long-latency operations to complete, then it can run more efficiently. Single-core simultaneous multithreading, also called hyperthreading, is really a mechanism to hide latency. By having multiple concurrent tasks on a single core, it is possible to switch from one to another when one task encounters a long-latency operation, such as a memory access.

Little's Law states that for efficient execution, the number of concurrent tasks "in flight" at any point in time should be equal to the latency times the parallelism. A modern four-core processor with the ability to issue four floating-point operations (using SSE instructions or some other form of instruction-level-parallelism) at once has a total parallelism of 16, since it can issue 16 operations per clock. Suppose in general that we access main memory for every 8 numerical operations, which is an optimistic value. With a main memory latency of 128 cycles -- again optimistic -- we need 256 separate, independent tasks in order to fully utilize the processor.

In other words, multicore processing is only exacerbating an already challenging problem. Most software today is grossly inefficient, because it is not written with sufficient parallelism in mind. Breaking up an application into a few tasks is not a long-term solution. First, lots and lots of parallelism is actually needed for efficient execution: much more than the number of cores, actually. Second, with the number of cores increasing exponentially, more and more parallelism will be needed over time.

The solution to this dilemma is data parallelism. In data parallelism, the structure of the data is used to drive the creation of more and more parallel tasks as needed. Since larger problems with more data naturally result in more parallel tasks, a data-parallel approach results in a scalable solution that can automatically take advantage of more and more cores. Data parallel programming models, since they also focus on the data and its movement, also result in predictable memory access patterns and this can also be used to improve the efficiency of memory access.

Page:  1  of  3
1 | 2 | 3   All  »  

HPCwire on Twitter

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  

HPC in the Cloud Part 2
People to Watch 2010


Top Headlines

Australia Commissions Cray Supercomputer

Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...

Intel Partners See 'Easy' Upgrade Path With Xeon 5600 Chips

Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...

AMD: OEMs primed for Opteron 6100s

Mar 17 | The Register | But what about the tier ones? Read more...

Arrival of the Desktop Supercomputer

Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...

Scheduling HPC In The Cloud

Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

SC09 HPC in the Cloud

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Lab
ESC
DEISA PRACE Symposium