The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
May 25, 2007
Ten years ago, symmetric multiprocessing (SMP) and massively parallel processing (MPP) systems were the most common architectures for high performance computing. The popularity of these architectures has decreased with the emergence of a more cost-effective approach: cluster computing. According to the Top500 Supercomputer Sites project, the cluster systems are now the most common type of architecture for the world's highest performing computer systems.
Cluster computing has achieved this prominence because it is being widely applied in the financial analysis worldwide. Rather than relying on the custom processor elements and proprietary data pathways of SMP and MPP architectures, cluster computing employs commodity standard processors, such as those from Intel and AMD, and uses industry-standard interconnects such as Gigabit Ethernet and InfiniBand.
Applications for this cluster architecture are those that can be "parallelized," or broken into sections and independently handled by one or more program threads running in parallel on multiple processors. Such applications are widespread in many areas; including the finance sector, where application software is now routinely being delivered in "cluster aware" forms that can take advantage of high performance computing (HPC) architectures.
The Rise of HPC Cluster Computing
Clusters are becoming the preferred HPC architecture because of their cost effectiveness; however, these systems are starting to face challenges. The single-core processors used in these systems are becoming denser and faster, but they are running into memory bottlenecks and dissipating ever-increasing amounts of power. The presence of memory bottlenecks has two components: a limited number of I/O pins on the processor package that can be dedicated for memory access and an increasing latency due to the use of multi-layered memory caches. Higher power consumption is a direct result of increasing clock speeds, and is forcing a need for extensive cooling of the processors.

The increasing power demand of processors is of particular concern in financial datacenters where calculations per watt are increasingly important. System cooling requirements are limiting cluster sizes, and therefore limiting achievable performance levels. Their existing cooling systems are simply running out of capacity and increasing that capacity carries a high price.
The industry's current solution to these growing problems is a move towards multicore processors. Increasing the number of processor cores in a single package offers increased node performance at somewhat lower power dissipation than that of an equivalent number of single-core processors. But the multicore approach does not address the memory bottlenecks inherent in the packaging.
An alternative approach has arisen: using FPGA-based coprocessors to accelerate the execution of key steps in the application software. This approach is similar to coding an inner loop of a C++ application in assembly language to increase overall execution speed.
FPGAs typically run at slower clock speeds than the latest CPUs, yet they can make up for this with superior memory bandwidth, a high degree of parallelization, and the customization that is possible. An FPGA coprocessor programmed to hardware-execute key application tasks can typically provide a 2X to 3X system performance boost while simultaneously reducing power requirements 40 percent as compared to adding a second processor or even a second core. Fine tuning of the FPGA to application needs can achieve performance increases greater than 10X.
FPGA Acceleration Opportunities
Any solution that boosts HPC applications must cover a wide spectrum, with widely differing computational needs. Specific applications demand a particular combination of mathematical and logical operators, coupled with efficient memory access.
It is difficult for general-purpose CPUs or specialized processor solutions such as graphics processing units (GPUs) or network processors to provide an optimal solution for the broad spectrum of HPC applications. FPGAs, however, are a re-configurable engine. They can be optimized under software control to meet the particular requirements of an HPC application. This allows a single hardware solution to address many HPC applications with equal efficiency.
FPGAs accelerate HPC applications by exploiting the parallelism inherent in the algorithms employed. There are several levels of parallelism to address. A starting point is to structure the HPC application for multi-threaded execution suitable for parallel execution across a grid of processors. This is task-level parallelism, exploited by cluster computing. There are software packages available that can take legacy applications and transform them into a structure suitable for parallel execution.
A second level of parallelism lies at the instruction level. Conventional processors support the simultaneous execution of a limited number of instructions. FPGAs offer deeper pipelining, and therefore can support a larger number of simultaneously executing "in-flight" instructions.
Data parallelism is a third level that FPGAs can exploit. The devices have a fine-grained architecture designed for parallel execution, thus, can be configured to perform a set of operations on a large number of data sets simultaneously. This parallel execution performs the equivalent work of numerous conventional processors all in a single device.
By exploiting all three levels of parallelism, an FPGA operating at 200 MHz can outperform a 3 GHz processor by an order of magnitude or more, while requiring only a quarter of the power. Commonly used signal processing algorithms, such as FFTs (Fast Fourier Transforms) show performance increases of 10X over the fastest CPUs.
Opportunities in Financial Analytics
One of the major markets where computing speed is an extremely important asset is financial analytics. A key application within this market space is the analysis of "derivatives": financial instruments such as options, futures, forwards, and interest-rate swaps. Derivatives analysis is a critical on-going activity for financial institutions, allowing them to manage pricing, risk hedging, and the identification of arbitrage opportunities. The worldwide derivatives market has tripled in size in the last five years.

FREE Download: "Going Parallel - An Implementation Guide"
Breakthrough performance for MATLAB®, Python and other desktop apps... Get 100X speedups, with less than 10% of the development time. Focus is on enabling familiar desktop tools to virtually execute on parallel servers, clusters, and grids.
Podcast: John West talks with John Lee, Appro’s VP of Advanced Technology Solutions at SC08
Learn more about Appro, what they showcased at SC08 this year, and where the company is headed. Also joining the conversation was Graeme Hackland, the IT manager for the Renault F1 team, one of Appro's big HPC customers.
Jan 07 | Seeking Alpha | The advent of 10 Gigabit Ethernet created major opportunities for equipment startups, but this may not be the case for the future 40 and 100 Gigabit Ethernet markets. Read more...
Jan 05 | Design News | While manufacturers may never be able to completely avoid physical prototyping, best-in-class players are continuing to expand their use of virtual product development. Read more...
Jan 05 | CXOtoday.com | The Centre for Development of Advanced Computing (C-DAC) has inaugurated its PARAM Sheersh Supercomputing Facility at North Eastern Hill University in India. Read more...
Jan 02 | People's Daily Online | A 200 teraflop Chinese supercomputer is scheduled to be installed in Shanghai in April. Read more...
Dec 31 | Houston Chronicle | A Houston-based company is using supercomputing power to reveal how gas and oil flows through underground rocks. Read more...
Dec 15 | | Engineers, scientists, and other domain experts depend on the productivity enabled by very high-level language (VHLL) tools like MATLAB® and Python. However, as datasets grow larger and programs get more sophisticated, ordinary desktop computers can no longer keep up. The paper explores how to run VHLL programs on high-performance platforms without low-level reprogramming. Work with large datasets and complex algorithms without sacrificing ease of use or reducing productivity.
Dec 02 | | Learn about Appro’s experience providing a supercomputer for the use of Computational Fluid Dynamics at the ING Renault F1 Team’s technical centre in the United Kingdom. The Appro Xtreme-X™ Supercomputer provides a balanced HPC architecture for scalable performance, power and cooling efficiency, and high availability computing in a redundant and manageable framework.
Sun Studio Compilers and Tools and Sun HPC ClusterTools allow you to create high performance parallel applications for OpenSolaris, Solaris and Linux. Sun Studio Express 11/08 includes MPI performance analysis capabilities and full OpenMP 3.0 compiler support. Learn about all this and the latest in Sun HPC ClusterTools 8.1.
Source: Addison Snell, GM/VP, Tabor Research; sponsored by Dell
Many organizations that could benefit from the use of HPC clusters find that it is complicated to get the systems up and running because of limited IT resources or the complexities of the clusters themselves. Learn how the Intel Cluster Ready program, for which Dell was an original partner, seeks to address this challenge for entry level and mid-range HPC users.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.