Continuing our coverage of the new Intel “Haswell” Xeon E5 v3 series chips, we turn our attention to a recent blog post from software acceleration specialists Xcelerit to see how the Haswell fares on a popular Monte-Carlo financial application.
The Haswell E5 family offers significant performance gains over the previous Ivy Bridge processors but to see how this impacts real-world applications requires benchmark testing. The Xcelerit team chose the popular Monte-Carlo LIBOR Swaption Porfolio pricer to do its tests on.
Jörg Lotze, Xcelerit technical lead and co-founder, writes that “from a hardware perspective, the main changes between the two processor generations are the new AVX-2 instructions (including a fused multiply-add instruction), higher memory and cache bandwidths, and more processor cores.”
The table below gives a run-down on the two SKUs used in this experiment: Xeon E5-2697 v2 and the Xeon E5-2697 v3.
As Lotze explains, a Monte-Carlo simulation is used to price a portfolio of LIBOR swaptions, a very common type of financial derivatives.
The same application was executed on the Intel Ivy Bridge and Haswell server CPUs with the following configurations:
+ CPU: 2 sockets, Ivy-Bridge (Xeon E5-2697 v2) & Haswell (Xeon E5-2697 v3)
+ HT: Hyper-threading enabled
+ OS: RedHat Enterprise Linux 6 (64bit)
+ RAM: 64GB
+ Development Tools: Xcelerit SDK 3.0.0a / ICC 15.0
The full application time was recorded by the Xcelerit staff for each run and detailed in the following chart:
As illustrated above, Haswell enabled significant speedups, as high as 2.42x in single precision and 1.63x in double.
Lotze concludes:
“This is in line with the increase in the number of cores from 12 to 14 per chip, and the new fused multiply-add instruction. Many financial applications can benefit significantly from this instruction as multiplications followed by additions are very common. In single precision, a key Ivy-Bridge performance limiter of the test application is the cache bandwidth. The higher bandwidth of Haswell overcomes this limitation, explaining the high single-precision speedups.”