Ever faster financial analysis is a much-sought competitive edge throughout financial services. Last week a new STAC report showed a Cray XC40 Knights Landing solution outperforming a host of alternatives on the STAC-A2 benchmark intended to test technology stacks used for compute-intensive analytic workloads involved in pricing and risk management.
According to the report, the “Cray machine completed warm runs of the baseline Greeks benchmark (STAC-A2. β2.GREEKS.TIME.WARM) in just 0.212 seconds on average. This is the best score on this benchmark to date.” The testing, performed in mid-September, was done on a machine consisting of the Intel STAC-A2 Pack for Composer XE (Rev H) on a Cray XC40 compute node using an Intel Xeon Phi 7250 Processor with 192GB of DRAM and 16GB of Multi-Channel DRAM, on a Cray Linux Environment 6.0 UP01.
As bulleted out in the report, at Cray’s request, the Cray XC40 KNL performance against other published STAC reports was:
- 2% faster than the dual-Haswell/dual-Knights Corner system ( STAC report INTC151028, Intel Composer XE with 2 x Intel Xeon Phi 7120P Co-Processor (Knights Corner) and 2 x Intel Xeon E5-2697 v3 (Haswell EP) @ 2.60GHz CPUs on a Supermicro Superserver SYS-1028GR-TR)
- 4% faster than the KNL timing reported by Intel (INTC160428, Intel Composer XE with 1 x Intel Xeon Phi 7250 processor (Knights Landing) @ 1.4GHz on an Intel White Box)
- 29% faster than a 4-socket Haswell system (INTC150811, Intel Composer XE on Intel White Box with 4 x Intel Xeon E7-8890 v3 (Haswell EX) @ 2.50GHz)
- 36% faster than the fastest reported GPU-based system (NVDA141116,
NVIDIA CUDA 6.5 / NVIDIA Tesla K80 dual-GPU Accelerator card) - 50% faster than the fastest reported non-Intel CPU-based system (IBM150305, IBM XL C/C++ on IBM Power System S824 Server using RHEL 7 with IBM Power8 processors)
Included in the report was a new STAC metric, effective volume. The Cray effective volume reported is 1,080 cubic inches. For comparison purposes, a 1U server in an Open Compute Project (OCP) rack has an effective volume of 2,055 cubic inches. Presumably smaller means more efficient here. Also in terms of pure performance, without modifying Intel’s Rev. H STAC-A2 implementation, Cray’s XC40 KNL Component Benchmark “warm” times were better in 12 of 15 cases than the KNL times reported by Intel (INTC160428, noted above) according to the report.
Because the benchmark was “run on a single node in an integrated 768-node system, it wasn’t feasible to apply the STAC-A2 power-measurement methodology to this rack-scale SUT (i.e., direct measurement at the wall) so official energy efficiency values weren’t provided in the STAC Report Card,” according to the report. That said, Cray promotes the machine’s power efficiency and density as advantages.
Cray says this KNL-based system is well suited for firms that are about to modernize or have modernized their code and want the benefits of parallel acceleration without needing to learn a new language.
All together the STAC-A2 benchmark produced nearly 200 test results in the report.
The widely used STAC-A2 benchmark is intended to evaluate new technologies including, for example – “latest CPUs, GPUs, and FPGAs, server architectures, programming languages and deployment environments including public and private cloud.” (For a fuller description of the STAC-A2 benchmark see, End-User Driven Technology Benchmarks Based on Market-Risk Workloads
Link to STAC report: Cray XC40 Using One Compute Node with 1 x Intel Xeon Phi 7250 processor (Knights Landing) @ 1.4GHz and Intel Composer XE