The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
May 25, 2007
Ten years ago, symmetric multiprocessing (SMP) and massively parallel processing (MPP) systems were the most common architectures for high performance computing. The popularity of these architectures has decreased with the emergence of a more cost-effective approach: cluster computing. According to the Top500 Supercomputer Sites project, the cluster systems are now the most common type of architecture for the world's highest performing computer systems.
Cluster computing has achieved this prominence because it is being widely applied in the financial analysis worldwide. Rather than relying on the custom processor elements and proprietary data pathways of SMP and MPP architectures, cluster computing employs commodity standard processors, such as those from Intel and AMD, and uses industry-standard interconnects such as Gigabit Ethernet and InfiniBand.
Applications for this cluster architecture are those that can be "parallelized," or broken into sections and independently handled by one or more program threads running in parallel on multiple processors. Such applications are widespread in many areas; including the finance sector, where application software is now routinely being delivered in "cluster aware" forms that can take advantage of high performance computing (HPC) architectures.
The Rise of HPC Cluster Computing
Clusters are becoming the preferred HPC architecture because of their cost effectiveness; however, these systems are starting to face challenges. The single-core processors used in these systems are becoming denser and faster, but they are running into memory bottlenecks and dissipating ever-increasing amounts of power. The presence of memory bottlenecks has two components: a limited number of I/O pins on the processor package that can be dedicated for memory access and an increasing latency due to the use of multi-layered memory caches. Higher power consumption is a direct result of increasing clock speeds, and is forcing a need for extensive cooling of the processors.

The increasing power demand of processors is of particular concern in financial datacenters where calculations per watt are increasingly important. System cooling requirements are limiting cluster sizes, and therefore limiting achievable performance levels. Their existing cooling systems are simply running out of capacity and increasing that capacity carries a high price.
The industry's current solution to these growing problems is a move towards multicore processors. Increasing the number of processor cores in a single package offers increased node performance at somewhat lower power dissipation than that of an equivalent number of single-core processors. But the multicore approach does not address the memory bottlenecks inherent in the packaging.
An alternative approach has arisen: using FPGA-based coprocessors to accelerate the execution of key steps in the application software. This approach is similar to coding an inner loop of a C++ application in assembly language to increase overall execution speed.
FPGAs typically run at slower clock speeds than the latest CPUs, yet they can make up for this with superior memory bandwidth, a high degree of parallelization, and the customization that is possible. An FPGA coprocessor programmed to hardware-execute key application tasks can typically provide a 2X to 3X system performance boost while simultaneously reducing power requirements 40 percent as compared to adding a second processor or even a second core. Fine tuning of the FPGA to application needs can achieve performance increases greater than 10X.
FPGA Acceleration Opportunities
Any solution that boosts HPC applications must cover a wide spectrum, with widely differing computational needs. Specific applications demand a particular combination of mathematical and logical operators, coupled with efficient memory access.
It is difficult for general-purpose CPUs or specialized processor solutions such as graphics processing units (GPUs) or network processors to provide an optimal solution for the broad spectrum of HPC applications. FPGAs, however, are a re-configurable engine. They can be optimized under software control to meet the particular requirements of an HPC application. This allows a single hardware solution to address many HPC applications with equal efficiency.
FPGAs accelerate HPC applications by exploiting the parallelism inherent in the algorithms employed. There are several levels of parallelism to address. A starting point is to structure the HPC application for multi-threaded execution suitable for parallel execution across a grid of processors. This is task-level parallelism, exploited by cluster computing. There are software packages available that can take legacy applications and transform them into a structure suitable for parallel execution.
A second level of parallelism lies at the instruction level. Conventional processors support the simultaneous execution of a limited number of instructions. FPGAs offer deeper pipelining, and therefore can support a larger number of simultaneously executing "in-flight" instructions.
Data parallelism is a third level that FPGAs can exploit. The devices have a fine-grained architecture designed for parallel execution, thus, can be configured to perform a set of operations on a large number of data sets simultaneously. This parallel execution performs the equivalent work of numerous conventional processors all in a single device.
By exploiting all three levels of parallelism, an FPGA operating at 200 MHz can outperform a 3 GHz processor by an order of magnitude or more, while requiring only a quarter of the power. Commonly used signal processing algorithms, such as FFTs (Fast Fourier Transforms) show performance increases of 10X over the fastest CPUs.
Opportunities in Financial Analytics
One of the major markets where computing speed is an extremely important asset is financial analytics. A key application within this market space is the analysis of "derivatives": financial instruments such as options, futures, forwards, and interest-rate swaps. Derivatives analysis is a critical on-going activity for financial institutions, allowing them to manage pricing, risk hedging, and the identification of arbitrage opportunities. The worldwide derivatives market has tripled in size in the last five years.

Appro Xtreme-X1 Supercomputer is Intel® Cluster Ready Certified
Appro adopts the Intel Cluster Ready program to help simplify deployment, usage and management of high performance computing clusters to achieve faster and more accurate time-to-results. Learn how.
Those of you looking forward to Rock -- Sun's much anticipated 16-core processor originally scheduled for release later this year but now pushed to the second half of 2009 -- don't have to wait for those chips to come out to experience that launch party euphoria. This week Sun and Fujitsu announced the latest of their enterprise line of SPARC-based servers, sporting the new SPARC64 VII chip.
Read More...
The UK makes a multi-million pound investment in science and computing; the Defense Department funds a HPC software project; and TACC's Ranger shows off its new Opterons. John West recaps those stories and more in our weekly wrap-up.
Read More...
If anyone knows how to introduce a new programming language, it's Sun Microsystems. The company's highly successful Java language, which was introduced in 1991, has become ubiquitous in network-centric and embedded computing. Today, there's a whole research team at Sun Labs devoted to programming languages, and the big project there in recent years has been the development of the Fortress programming language. The end game is to "do for Fortran what Java did for C."
Read More...
Jul 22 | Harvard Medical School | A team of Harvard Medical School researchers have developed a computer programming language that can be used to model the biomolecular behavior of proteins. Read more...
Jul 21 | Custom PC | Nvidia responds to Pat Gelsinger’s comments about CUDA being just a ‘footnote’ in computing history. Read more...
Jul 21 | ElectronicsWeekly.com | Computers based on the Cell processor dominate the world ranking for energy efficient supercomputers, according to the just-published Green500 list. Read more...
Jul 21 | IT Jungle | Rumors have been circulating about IBM's future Power7 processor and how the chip fits into NCSA's upcoming "Blue Waters" supercomputer. Read more...
Jul 17 | DailyTech | AMD's 12-core and 8-core processors will get a new home in 2010. Read more...
Jun 05 | | As pressure increases on the upstream seismic processing community to deliver ever-higher levels of productivity and efficiency, a new generation of storage solutions will be required that allow the maximum utilisation of high-performance computing (HPC) Linux cluster resources, together with the minimum of management overhead.
Today, HPC organizations are requiring substantially more floating point performance to solve real-world problems. In this podcast, Ben Bennett, ClearSpeed General Manager, discusses how acceleration technology can improve the overall performance of standard x86-based systems...
Get updates and insights on the High Productivity Computing industry delivered driectly to your inbox.