A recent benchmark conducted by the Securities Technology Analysis Center (STAC) is helping to dispel the myth that GPUs are the primary drivers responsible for boosting computational speeds to new levels. In fact, the study indicates these GPU “accelerators” are actually slowing things down.
Intel recently asked STAC to use its STAC-A2 Benchmark suite to test two dual-socket white box servers, both with Intel Xeon Processor E5-2600 v3 series processors (code named Haswell EP), one of them with and the other one without an Intel Xeon Phi coprocessor card.
STAC-A2 is a benchmark standard developed by quants and technologists from some of the world’s largest banks. Based on financial risk analysis, the benchmark reports the performance, scaling, quality and resource efficiency of any technology stack able to handle the massive workloads involved in these transactions. That includes calculating the “Greeks,” a series of risk sensitivities designated by Greek letters that measure the sensitivity of the price of options to changes in underlying markets – such as interest rates or volatility.
The largest banks have HPC grids continually running these calculations. Traditionally run in batch mode overnight, the financial institutions are pushing to achieve the goal of real-time processing – response times that are a second or less.
A combination of advanced CPU technology and parallelized software is the answer.
The dual-socket Haswell (HSW) processor used in the STAC benchmark is one generation newer than a previous system running the STAC A2 benchmark that used a dual-socket Xeon EP processor (Ivy Bridge or IVB). The new HSW-based system ran the benchmark 30% faster than the previously measured IVB-based system running the same source code. Both the HSW-based system and the IVB based system, which used no accelerator card, run the STAC A2 benchmark faster than a system that used a dual socket CPU and one K20 GPU. The HSW based system was only 12% slower than the dual socket, two GPU system, while demonstrating a 46% higher asset capacity.
The STAC published results included a white box with two Intel Xeon E5-2699 v3 at 2.30 GHz (Haswell EP) CPUs and one Intel Xeon Phi 7120A coprocessor (Knights Corner) card. This was the fastest of any system published to date running the end-to-end Greeks benchmark. In the same benchmark, this system was 22% faster than a system with a dual socket CPU plus two GPUs. It also had 46% higher asset capacity and 53% higher paths capacity than the GPU-based system.
“The performance improvements made possible by running the Haswell CPU with or without the Xeon Phi coprocessor are very impressive,” says Robert Geva, Intel Principal Engineer and manager of Intel’s “Wall Street Lab”. “The secret is to parallelize the software. In some cases, we have realized performance gains of up to 180X. This means that a computation that now takes three hours can be done in one minute or less, meeting the real-time criteria demanded by the financial markets industry.
“The fact that a system with a CPU and a single Xeon Phi coprocessor can outperform a system with a CPU and two GPUs is a testimonial to the power of the programming model – in particular, to the commonality of programming between the Xeon and Xeon Phi,” he continues. “STAC-A2 benchmark is a non trivial application, designed to perform several algorithms in parallel, with parallel computation within each of them. This complexity provides the CPU with an opportunity to shine. The Intel developers chose Intel threading building blocks (TBB) as the substrate for parallelization. TBB is designed for nested parallelism, and its availability on both Xeon and Xeon Phi allowed the developers flexibility in coding.”
Geva points out that parallelizing arcane code is a difficult and expensive process, but well worth it to realize the outstanding computational results that can make all the difference in the highly competitive financial industry.
But, the benefits of code parallelization combined with the power of the Haswell CPU extend beyond the financial applications explored in the STAC-A2 benchmark test. This approach can be applied to many other financial applications, as well as HPC applications in other domains such as the life sciences and digital content creation.
The myth that GPUs are required to accelerate computation is being debunked by actual data. Says Geva, “The reality is that a CPU built to run parallelized software delivers tremendous performance. And the use of the Xeon Phi coprocessor makes the computation even faster.”
For more information:
- STAC Report: https://stacresearch.com/intel/haswell)
- Intel performance on a variety of financial applications
- Intel processor performance on applications in various industry segments