HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Special Features >> HPWS08 >> HPWS08 Features

Accelerating Financial Computations on Multicore and Manycore Processors


Page:  1  of  2
1 | 2   All  »  

High-performance computation is a necessity in modern finance. In general, the current value of a financial instrument, such as a stock option, can only be estimated through a complex mathematical simulation that weighs the probability of a range of future possible scenarios. Computing the value at risk in a portfolio of such instruments requires running a large number of such simulations, and optimizing a portfolio to maximize return or minimize risk requires even more computation. Finally, these computations need to be run continuously to keep up with constantly changing market data.

Although a large amount of computation is a necessity, doing it efficiently is crucial since financial datacenters are under severe power and cooling constraints. Multicore processors promise improved computational efficiency within a fixed power and cooling budget. However, achieving high efficiency execution on these processors is non-trivial. In the case of finance, new algorithms are constantly being developed by application specialists called quantitative analysts (or "quants"). Time is literally money in finance, and so high-productivity software development is just as important as efficient execution.

In this article, we will discuss high-productivity strategies for developing efficient financial algorithms that can take advantage of multicore processors, including standard x86 processors but also manycore processors such as GPUs and the Cell BE processor. These strategies can lead to one and even two orders of magnitude improvement in performance per processor.

Multicore processors allow for higher performance at the same power level by supporting multiple lightweight processing elements or "cores" per processor chip. Scaling performance by increasing the clock speed of a single processor is inefficient since the power consumed is proportional to (at least) the square of the clock rate. At some point, it is not practical to increase the clock rate further, as the power consumption and cooling requirements would be excessive. The air-cooling limit in particular was reached several years ago, and clock rates are now on a plateau. In fact, clock rates on individual cores have been decreasing slightly as processor vendors have backed away from the ragged edge in order to improve power efficiency. However, achievable transistor density is still increasing exponentially, following Moore's Law. This is now translating into an exponentially growing number of cores on each processor chip.

Processors from Intel and AMD supporting the x86 instruction set are now available with four cores, but six and eight core processors are expected soon. Manycore processors such as GPUs and the Cell BE can support significantly more cores, from eight to more than sixteen. In addition, in modern multicore processors each core also supports vector processing, where one instruction can operate on a short array (vector) of data. This is another efficient way to increase performance via parallelism. Vector lengths can vary significantly, with current x86 processors and the Cell BE supporting four-way vectors and GPUs supporting anywhere from five to thirty-two. Vector lengths are also set to increase significantly on x86 processors, with the upcoming Intel AVX instruction set supporting 8-way vectors and the Intel Larrabee architecture supporting 16-way vectors.

Developing software for multicore vectorized processors requires fine-grained parallel programming. A fine-grained approach is needed because the product of the number of cores and the vector length in each core, which defines the number of numerical computations that can be performed in each clock cycle, can easily be in the hundreds. The other difference between modern multicore processors and past multi-processor parallel computers is that all the cores on a multicore processor must share a finite off-chip bandwidth. In order to achieve significant scalability on multicore processors, optimizing the use of this limited resource is absolutely necessary. In fact, in order to hide the latency of memory access it may be necessary to expose and exploit even more algorithmic parallelism, so one part of a computation can proceed while another is waiting for data.

The financial community has significant experience with parallel computing in the form of MPI and other cluster workload distribution frameworks. However, MPI in particular is too heavyweight for the lightweight processing elements in multicore processors (not to mention manycore processors) and cannot, by itself, optimize memory usage or take advantage of the performance opportunities made available through vectorization. Some alternative strategies are needed to get the maximum performance out of multicore processors.

We will now discuss financial workloads. Option pricing is one of the most fundamental operations in financial analytics workloads. More generally, the current value of an "instrument," of which an option is one example, needs to be evaluated through probabilistic forecasting.

Monte Carlo methods are often used to estimate the current value of such instruments in the face of uncertainty. In a Monte Carlo simulation, random numbers are used to generate a large set of future scenarios. Each instrument can then be priced under each given future scenario, the value discounted back to the current time using an interest calculation (made complicated by the fact that interest rates can also vary with time), and the results averaged (weighted by the probability of the scenario) to estimate the current value.

Simple versions of Monte Carlo seem to be trivially parallelizable, since each simulation can run independently of any other. However, even "simple" Monte Carlo simulations have complications. First, high-quality random numbers need to be generated and we must ensure that each batch of parallel work gets a unique set of independent, high-quality random numbers. This is harder than it sounds. The currently accepted pseudo-random number generators such as Mersenne Twister are intrinsically sequential algorithms, and may involve hundreds of bytes of state.

Page:  1  of  2
1 | 2   All  »  

HPCwire on Twitter

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  

HPC in the Cloud Part 2
People to Watch 2010


Feature Articles

The Week in Review

C-DAC announces plans for a petaflop system; IBM researchers are working on vertical integration techniques to extend Moore's Law another 15 years. We recap those stories and more in our weekly wrapup.
Read More...

Moscow State University Supercomputer Has Petaflop Aspirations

The Moscow State University supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of tripling its processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.
Read More...

Intel Ups Performance Ante with Westmere Server Chips

Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.
Read More...

Top Headlines

Australia Commissions Cray Supercomputer

Mar 19 | OfficialWire | New super to support intelligence work Down Under. Read more...

Intel Partners See 'Easy' Upgrade Path With Xeon 5600 Chips

Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...

AMD: OEMs primed for Opteron 6100s

Mar 17 | The Register | But what about the tier ones? Read more...

Arrival of the Desktop Supercomputer

Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...

Scheduling HPC In The Cloud

Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

SC09 HPC in the Cloud

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Lab
ESC
DEISA PRACE Symposium