Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them

Language Flags
June 8, 2007

High Frequency Traders Get Boost From FPGA Acceleration

by Michael Feldman

Greed may or may not be good, but it is certainly latency-sensitive. In today's capital markets, whoever can collect and process real-time market data the quickest has a tremendous advantage. Being able to execute financial transactions before your competition is an extremely powerful incentive to acquire systems with the lowest possible latency.

For electronic traders, the machines that do this are called ticker plants. Ticker plants are computer systems that capture real-time market messages — quotes and trades — for financial institutions. The messages are collected from different types of financial exchanges, normalized (exchanges use different message protocols), and then stored in what's called a last-value cache. The cache represents the most recent market data available from the feeds. Traditionally, ticker plants consist of racks of servers or blades that run software that performs the market data processing. Ticker plants are usually located in close proximity to the major exchanges (New York, Tokyo, London, etc.) so that message latencies due to geographic location are minimized.

Using the collected data, traders execute automated trading applications for various types of financial instruments. The types of clients that are extremely dependent on low message latencies are the high frequency traders. They typically will execute arbitrage-type trades, which take advantage of an asset's price differential between two or more markets. The trader makes money by buying the asset at the lower price and selling it at the higher price. Usually the price differentials are quite small, and because everyone has access to the same market data, the differences are transient. The key to success is getting the market data as quickly as possible, using the smartest algorithms, and running those algorithms on the fastest platform you can find.

The volumes of data are such that even racks of servers have trouble getting message latencies under a millisecond. And because market data volumes are increasing even faster than Moore's Law, larger and larger racks of servers must be deployed to keep pace. Dealing with increasing data throughput while trying to reduce latencies is causing traders to look for alternative computing platforms.

That is the motivation behind Exegy's ticker plant appliance. Exegy is a company that introduced its ticker plant in October of 2006, claiming 150 microsecond latencies and throughput of 1 million exchange messages per second. The company claims its newest boards can achieve 100 microsecond latencies and 2 million messages per second.

“It's the case where fast isn't good enough; you have to be the fastest,” says Rod Arbaugh, Exegy's Chief Operating Officer. “If you get that data even a millisecond before everyone else, you can execute a trade before your competition even gets a look at it.”
 
By essentially condensing a multi-server ticker plant into a single 3U appliance, Exegy is able to reduce latencies, while increasing throughput. The system includes an 8-socket Opteron board connected to a reconfigurable hardware board, containing three Xilinx FPGAs. Instead of relying on software to do the message processing, Exegy implements this capability on the FPGAs. The Opterons perform some of the message processing for exchange feeds that don't require additional acceleration. The company claims that by taking advantage of the parallel hardware nature of the FPGAs, the latency for processing individual market data messages is independent of the message rate. So as message rates continue to increase in the future, the average latency will remain constant.

“To us, FPGAs appear to be the best hardware architecture for building ticker plant algorithms in hardware and reducing the latency and increasing the throughput,” says Arbaugh. “For instance, we can handle 2 million messages a second, a software solution based on a rack of servers will be able to handle maybe 300 to 400 thousand.”

The second aspect to the Exegy appliance is that they've also incorporated some of the downstream index calculations, options calculations, and exchange traded funds computations (ETF) into the hardware. These tend to be highly parallel dataflow-type algorithms that lend themselves very well to co-processor acceleration. Since this is being implemented on an FPGA coprocessor, the calculations take place at hardware speeds. For users who want their own trading algorithms accelerated, Exegy is willing to provide customized solutions.

This week Exegy announced they would be developing a version of their ticker plant that would host a DRC reconfigurable computing module in one or more of the Opteron sockets. Even though trading computations can be mapped to Exegy's own FPGA board, the DRC integration will allow for additional scalability and computational density. The company is also investigating other acceleration platforms such as graphics processing units (GPUs) and multi-chip modules (MCMs) to try to match different automated trading algorithms with the most appropriate hardware.

ACTIV Financial is another company that provides an accelerated ticker appliance using FPGA technology to achieve ultra-low latency data feeds. Apparently they're also looking into incorporating trader applications into their platform.

Some analysts in the financial industry believe hardware acceleration is a budding trend for these latency-sensitive, high throughput market data systems. But just as in technical computing, few vendors in the financial sector have accumulated the expertise to exploit FPGAs or other hardware accelerators. Exegy believes their hardware acceleration approach represents the initial stages of a technology shift that will permeate this type of market trading.

“I think the whole high frequency trading sector is going to become more and more high performance computing-centric,” concludes Arbaugh. “There's going to be a real trend to get these algorithms implemented in hardware to continue to reduce the latency. So you're going to see some pretty exciting technology being developed here.”