HPCwire

Leading HPC
Solution Providers
HPCwire >> Industry >> Financial Services

A New Engine for Financial Analysis


Ten years ago, symmetric multiprocessing (SMP) and massively parallel processing (MPP) systems were the most common architectures for high performance computing. The popularity of these architectures has decreased with the emergence of a more cost-effective approach: cluster computing. According to the Top500 Supercomputer Sites project, the cluster systems are now the most common type of architecture for the world's highest performing computer systems.

Cluster computing has achieved this prominence because it is being widely applied in the financial analysis worldwide. Rather than relying on the custom processor elements and proprietary data pathways of SMP and MPP architectures, cluster computing employs commodity standard processors, such as those from Intel and AMD, and uses industry-standard interconnects such as Gigabit Ethernet and InfiniBand.

Applications for this cluster architecture are those that can be "parallelized," or broken into sections and independently handled by one or more program threads running in parallel on multiple processors. Such applications are widespread in many areas; including the finance sector, where application software is now routinely being delivered in "cluster aware" forms that can take advantage of high performance computing (HPC) architectures.

The Rise of HPC Cluster Computing

Clusters are becoming the preferred HPC architecture because of their cost effectiveness; however, these systems are starting to face challenges. The single-core processors used in these systems are becoming denser and faster, but they are running into memory bottlenecks and dissipating ever-increasing amounts of power. The presence of memory bottlenecks has two components: a limited number of I/O pins on the processor package that can be dedicated for memory access and an increasing latency due to the use of multi-layered memory caches. Higher power consumption is a direct result of increasing clock speeds, and is forcing a need for extensive cooling of the processors.

The increasing power demand of processors is of particular concern in financial datacenters where calculations per watt are increasingly important. System cooling requirements are limiting cluster sizes, and therefore limiting achievable performance levels. Their existing cooling systems are simply running out of capacity and increasing that capacity carries a high price.

The industry's current solution to these growing problems is a move towards multicore processors. Increasing the number of processor cores in a single package offers increased node performance at somewhat lower power dissipation than that of an equivalent number of single-core processors. But the multicore approach does not address the memory bottlenecks inherent in the packaging.

An alternative approach has arisen: using FPGA-based coprocessors to accelerate the execution of key steps in the application software. This approach is similar to coding an inner loop of a C++ application in assembly language to increase overall execution speed.

FPGAs typically run at slower clock speeds than the latest CPUs, yet they can make up for this with superior memory bandwidth, a high degree of parallelization, and the customization that is possible. An FPGA coprocessor programmed to hardware-execute key application tasks can typically provide a 2X to 3X system performance boost while simultaneously reducing power requirements 40 percent as compared to adding a second processor or even a second core. Fine tuning of the FPGA to application needs can achieve performance increases greater than 10X.

FPGA Acceleration Opportunities

Any solution that boosts HPC applications must cover a wide spectrum, with widely differing computational needs. Specific applications demand a particular combination of mathematical and logical operators, coupled with efficient memory access.

It is difficult for general-purpose CPUs or specialized processor solutions such as graphics processing units (GPUs) or network processors to provide an optimal solution for the broad spectrum of HPC applications. FPGAs, however, are a re-configurable engine. They can be optimized under software control to meet the particular requirements of an HPC application. This allows a single hardware solution to address many HPC applications with equal efficiency.

FPGAs accelerate HPC applications by exploiting the parallelism inherent in the algorithms employed. There are several levels of parallelism to address. A starting point is to structure the HPC application for multi-threaded execution suitable for parallel execution across a grid of processors. This is task-level parallelism, exploited by cluster computing. There are software packages available that can take legacy applications and transform them into a structure suitable for parallel execution.

A second level of parallelism lies at the instruction level. Conventional processors support the simultaneous execution of a limited number of instructions. FPGAs offer deeper pipelining, and therefore can support a larger number of simultaneously executing "in-flight" instructions.

Data parallelism is a third level that FPGAs can exploit. The devices have a fine-grained architecture designed for parallel execution, thus, can be configured to perform a set of operations on a large number of data sets simultaneously. This parallel execution performs the equivalent work of numerous conventional processors all in a single device.

By exploiting all three levels of parallelism, an FPGA operating at 200 MHz can outperform a 3 GHz processor by an order of magnitude or more, while requiring only a quarter of the power. Commonly used signal processing algorithms, such as FFTs (Fast Fourier Transforms) show performance increases of 10X over the fastest CPUs.

Opportunities in Financial Analytics

One of the major markets where computing speed is an extremely important asset is financial analytics. A key application within this market space is the analysis of "derivatives": financial instruments such as options, futures, forwards, and interest-rate swaps. Derivatives analysis is a critical on-going activity for financial institutions, allowing them to manage pricing, risk hedging, and the identification of arbitrage opportunities. The worldwide derivatives market has tripled in size in the last five years.


The numerical method for derivatives analysis uses Monte Carlo simulations in a Black-Scholes world. The algorithm makes heavy usage of floating-point math operations such as logarithm, exponent, square-root, and division. In addition, these computations must be repeated over millions of iterations. The numerical Black-Scholes solution is typically used within a Monte Carlo simulation, where the value of a derivative is estimated by computing the expected value, or average, of the values from a large number different scenarios, each represent a different market condition.

The key point is the need for "a large number." Because Monte Carlo simulation is based on the generation of a finite number of realizations using a series of random numbers (to model the movement of key market variables), the value of an option derived in this way will vary each time the simulations are run. The error between the Monte Carlo estimate and the correct option price is of the order of the inverse square root of the number of simulations. To improve the accuracy by a factor of 10, 100 times as many simulations must be performed.

With so many iterations of intense computation required, derivatives analysis is clearly a prime candidate for acceleration of HPC performance. Performance is not the only concern for this application, however. The ideal solution must also address some critical operational issues.

Accurate price estimates are critical for financial institutions. Inaccurate or compromised models create arbitrage opportunities for other players in the market. The algorithms and parameters used by the analysts thus can vary widely for different financial instruments and are constantly being tweaked and refined. For this reason and for reasons of maintainability, financial analysts ("quants") typically develop their algorithms in a high-level language, such as C, Java, or MATLAB.

Because the accuracy of the analysis represents an edge in the market, a high degree of secrecy shrouds the exact algorithms employed by the quants. Disclosure of the algorithm details could expose billions of dollars to arbitrage risk. In addition, there are regulatory (SEC) requirements for verification and validation of risk-return claims made on financial instruments. It is therefore often not practical or advisable to modify, transform, re-factor, or optimize the application codes in order to speed execution.

Requirements for financial analytics are notably stringent: high-precision, intense math computation with millions of iterations, programmed in a high-level language that should not be re-factored or altered. Can FPGA coprocessors meet these challenges? Absolutely, but only if a complete solution is provided! An FPGA hardware platform, a high-level programming environment, and a library of key FPGA functions are the keys. These are now available as a new tool for accelerating and improving financial analysis.

About the Author

Bryce Mackin is strategic marketing manager for the Altera's computer and storage business unit focusing on FPGA co-processing for the high performance computing market. In that role he has investigated and developed methods for accelerating HPC applications. Bryce has been in the computer and storage market for over 10 years responsible for both product and technical marketing. Previously he worked for three years in a similar capacity with Xilinx. Prior to that, he was in several product marketing roles at Adaptec Inc. and was also marketing chairman for the Storage Networking Industry Association (SNIA) IP Storage forum, an industry association. Bryce has spent his career focusing on achieving optimum performance for computing and storage applications.

Article Tools

  • Print This Article
  • Contact the Author

Share & Save Options

Discussion

There are 0 discussion items posted.  



Feature Articles

Sun, Fujitsu Deliver Quad-Core SPARC64 Servers

Those of you looking forward to Rock -- Sun's much anticipated 16-core processor originally scheduled for release later this year but now pushed to the second half of 2009 -- don't have to wait for those chips to come out to experience that launch party euphoria. This week Sun and Fujitsu announced the latest of their enterprise line of SPARC-based servers, sporting the new SPARC64 VII chip.
Read More...

The Week in Review

The UK makes a multi-million pound investment in science and computing; the Defense Department funds a HPC software project; and TACC's Ranger shows off its new Opterons. John West recaps those stories and more in our weekly wrap-up.
Read More...

Sun's Fortress Language: Parallelism by Default

If anyone knows how to introduce a new programming language, it's Sun Microsystems. The company's highly successful Java language, which was introduced in 1991, has become ubiquitous in network-centric and embedded computing. Today, there's a whole research team at Sun Labs devoted to programming languages, and the big project there in recent years has been the development of the Fortress programming language. The end game is to "do for Fortran what Java did for C."
Read More...

Top Headlines

Biology Enters 'The Matrix' Through New Computer Language

Jul 22 | Harvard Medical School | A team of Harvard Medical School researchers have developed a computer programming language that can be used to model the biomolecular behavior of proteins. Read more...

Nvidia: Larrabee is a Reaction to CUDA

Jul 21 | Custom PC | Nvidia responds to Pat Gelsinger’s comments about CUDA being just a ‘footnote’ in computing history. Read more...

PlayStation Processor Dominates Green Supercomputing

Jul 21 | ElectronicsWeekly.com | Computers based on the Cell processor dominate the world ranking for energy efficient supercomputers, according to the just-published Green500 list. Read more...

More Power7 Details Emerge, Thanks to Blue Waters Super

Jul 21 | IT Jungle | Rumors have been circulating about IBM's future Power7 processor and how the chip fits into NCSA's upcoming "Blue Waters" supercomputer. Read more...

Hello AMD Socket G34

Jul 17 | DailyTech | AMD's 12-core and 8-core processors will get a new home in 2010. Read more...

Featured Whitepapers

Improving Performance and Manageability for Seismic Processing and Imaging Applications with Parallel Storage

Jun 05 | | As pressure increases on the upstream seismic processing community to deliver ever-higher levels of productivity and efficiency, a new generation of storage solutions will be required that allow the maximum utilisation of high-performance computing (HPC) Linux cluster resources, together with the minimum of management overhead.

Multimedia

Podcast: Interview with Ben Bennett of ClearSpeed Technology

Today, HPC organizations are requiring substantially more floating point performance to solve real-world problems. In this podcast, Ben Bennett, ClearSpeed General Manager, discusses how acceleration technology can improve the overall performance of standard x86-based systems...

ISC'08

Newsletters

Stay informed! Subscribe to HPCWire email Newsletters.

Get updates and insights on the High Productivity Computing industry delivered driectly to your inbox.






Featured Events

eTech
2008 HPC on Wall Street
Enabling Grids for E-sciencE
Managing the Grid
Harvard Summit 2008

HPC Job Bank