Intel Haswell-EX Server Sets STAC-A2 Performance Record

By Tiffany Trader

September 2, 2015

Intel has reasserted its prominence on a subset of financial benchmarks designed to evaluate platforms for the pricing and market risk analytics. More powerful Xeons — “Haswell-EX” E7-8890 v3 processors — combined with changes to the software stack enabled Intel to set a new speed record on the STAC-A2 benchmark for both warm and cold runs of the baseline “Greeks” benchmark.

The STAC-A2, which debuted in 2012, is an architecture-agnostic benchmark suite that represents a class of financial risk analytics workloads characterized by Monte Carlo simulation and “Greeks” computations, which provide a measure of how changes in various parameters, such as the price of one particular asset, affects the price of an overall derivative.

The baseline benchmark computes Greeks with five assets, 25,000 paths, and 252 timesteps — that’s one for each trading day over the course of one year. The test is executed five times, resulting in one cold run and four warm runs. As STAC literature explains, “a cold run simulates a deployment situation in which a risk engine starts up in response to a request [while, a] warm run simulates a case in which an engine is already running, with sufficient memory allocated to handle the request.” Another way to look at this is that the cold run stresses the entire application (including initialization and memory allocation) while a warm run relates to the computationally intensive portion of the application.

In addition to the standard end-to-end run, STAC also assesses key component algorithms, such as random number generation and special math functions. All told, STAC-A2 specifications deliver nearly 200 test results related to performance, scaling, efficiency, and quality.

STAC-A2 Intel Xeon Haswell EXSTAC carried out the testing on a four-socket Intel white-box server with 72 Intel Xeon E7 v3 (Haswell EX) cores @ 2.50GHz, 1 TB DRAM and Red Hat Enterprise Linux 7.1. The software stack — the STAC-A2 Pack — was coded by Intel using its Composer XE (revision F), its Math Kernel Library (MLK) and its C++ compiler. Vector programming was done using the OpenMP 4.0 standard and parallelization relied on the Intel Threading Building Block library.

Intel explained that this implementation of the STAC-A2 Pack is based on key elements of the Intel Architecture (IA) parallel programming model: parallelization, vectorization, blocking algorithms and data layout/memory alignment. Following principles of code modernization, the algorithmic design principle is to parallelize outer loops and vectorize inner loops. As Intel’s parallel programming evangelist James Reinders shared with HPCwire, algorithmic optimizations also played a part, as did being more careful on cache efficiencies, and VTune Amplifier facilitated the detection and remedying of bottlenecks.

Indirectly referencing IBM, Reinders stated that comparisons published by “a competitor” earlier this year prompted Intel to take another look at the benchmark. He went on to say that while Intel doesn’t have unlimited resources to devote to codes that won’t actually be used by customers, the company wanted the STAC record to accurately reflect the capabilities of its current hardware. Haswell-EX is a newer, higher-end machine and the upgrade had a big effect on performance, he said.

Reinders acknowledged that the Haswell-EX E7-based system is pricier than previous submissions to STAC, but with the IBM machine being “much more expensive,” he said that Intel felt justified using the top-of-the-line Xeons.

“It’s a very balanced machine that does its job extremely well,” Reinders stated, “so we weren’t surprised that our numbers came out on top across the board in terms of the benchmarks that matter.”

“I’m quite confident that our hardware is the best in terms of performance and price-performance, and it’s a well-implemented stack,” he added.

On to the numbers…

Intel did nudge out the competition, setting a new speed record for any architecture in both warm and cold runs of the baseline performance test (STAC-A2.β2.GREEKS.TIME). Results from Intel and its next highest-scoring competitors are shown below. Intel’s previous four-socket entrant is also included for the sake of comparison.

Intel:

4 x Intel Xeon E7-8890 v3 Haswell EX processors (published August  13, 2015)
Warm: 0.274
Cold: 0.343

4 x Intel Xeon E7-4890 v2 Ivy Bridge EX processors (published May 15, 2014)
Warm: 0.556
Cold: 0.651

NVIDIA:

Tesla K80 GPU accelerator and 2 x Intel Xeon E5-2690 v2 Ivy Bridge processors (published November 18, 2014)
Warm: 0.287
Cold: 0.395

IBM:

2 x POWER8 processor cards (published March 16, 2015)
Warm: 0.317
Cold: 0.589

IBM’s two-socket Power System S824 server (with 24 POWER8 cores) still holds the record for path scaling (STAC-A2.β2.GREEKS.MAX_PATHS), which denotes paths completed in 10 minutes with five assets and 252 timesteps (using cold test runs), and asset capacity (STAC-A2.β2.GREEKS.MAX_ASSETS), which denotes assets completed in 10 minutes with 25,000 paths and 252 timesteps (using cold test runs).

Intel:

4 x Intel Xeon E7-8890 v3 Haswell EX processors
STAC-A2.β2.GREEKS.MAX_ASSETS: 72
STAC-A2.β2.GREEKS.MAX_PATHS: 21,000,000

4 x Intel Xeon E7-4890 v2 Ivy Bridge EX processors
STAC-A2.β2.GREEKS.MAX_ASSETS: 67
STAC-A2.β2.GREEKS.MAX_PATHS: 13,500,000

NVIDIA:

Tesla K80 GPU accelerator and 2 x Intel Xeon E5-2690 v2 Ivy Bridge processors
STAC-A2.β2.GREEKS.MAX_ASSETS: 55
STAC-A2.β2.GREEKS.MAX_PATHS: 8,300,000

IBM:

2 x POWER8 processor cards
STAC-A2.β2.GREEKS.MAX_ASSETS: 78
STAC-A2.β2.GREEKS.MAX_PATHS: 28,000,000

And NVIDIA can still claim the highest energy-efficiency (STAC-A2.β2.GREEKS.ENERGY_EFFICIENCY) for its Supermicro server powered with an NVIDIA Tesla K80 GPU accelerator card plus 2 x Intel Xeon E5-2690 v2 “Ivy Bridge” CPUs. Note that energy efficiency = GREEKS.MAX_ASSETS / Energy at Capacity.

Intel:

4 x Intel Xeon E7-8890 v3 Haswell EX processors
403 assets/kWh

4 x Intel Xeon E7-4890 v2 Ivy Bridge EX processors
343 assets/kWh

NVIDIA:

Tesla K80 GPU accelerator and 2 x Intel Xeon E5-2690 v2 Ivy Bridge processors
1,650 assets/kWh

IBM:

2 x POWER8 processor cards
459 assets/kWh

Reinders insisted that the Haswell-EX and Tesla-based machines are closer on energy-efficiency than these numbers would suggest and he expressed confidence that Intel’s real efficiency numbers are competitive. He credited the STAC benchmarking team with doing a really great job given the difficulty of representing reality under all sorts of conditions but he suggested a revision may be in order on this one.

“If you don’t do similar amounts of work, the benchmark is misleading,” Reinders clarified further. “The GPU wasn’t able to do as much work and so it posts a different efficiency, and it’s not a linear relationship, which is not obvious at all looking at the benchmark. While the results depict a multiple difference in power efficiency, I can promise you it is at most a two digit number difference in performance efficiency.”

Delving further into exactly what constitutes equivalency of machines, Reinders said it’s important to look at factors like the cost of the machine, the cost to deploy it, maintain it and flexibility.

“I think it’s a mistake to count cores or number of threads and so forth, because at the end of the day, it’s about how much work did you get done and at what cost and how difficult is it to deploy,” he added.

Having performance per watt and performance per dollar calculations would be conducive to the evaluation process. While STAC does not provide guidance on pricing at this time, a new benchmark is under review that would indicate “total theoretical price to complete 1 million jobs” for both the standard run and a more involved problem set, which STAC recently introduced.

The new benchmark measures a second, larger problem size beyond the baseline workload. STAC-A2.β2.GREEKS.10-100k-1260.TIME, as its called, calculates the seconds to compute all Greeks with 10 assets, 100,000 paths, and 1,260 timesteps. NVIDIA’s results pre-date the inclusion of this benchmark, but according to a March report, IBM’s Power system finished a warm run in 28.9 seconds and a cold run in 34.5 seconds, besting the Intel’s Haswell-EX stack, which executed a warm run in 38.6 seconds and a cold run in 42.6 seconds.

In summary, the new tests show that with the right software, a four-socket Xeon E7 server can outperform the competition on the baseline warm and cold runs. Given the difficulty of lining up exact apples-to-apples comparisons, results always need to be analyzed carefully with respect to system size, system cost, performance-per-watt and any other specs that are important to the end user.

And in case you are wondering, Intel hasn’t run the new Intel Knights Landing through the STAC A2 testing yet, but Reinders said it was safe to expect Intel would be refreshing its Phi numbers given its relevance to the financial services’ space.

“We actually have an advantage on Xeon Phi,” Reinders asserted. “It’s very power-efficient and very well-suited for number-crunching this particular problem. We’ve got more memory, bigger vectors, and so I’m expecting to have very good numbers there when we have time to refresh those numbers.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Nvidia Touts Strong Results on Financial Services Inference Benchmark

February 3, 2023

The next-gen Hopper family may be on its way, but that isn’t stopping Nvidia’s popular A100 GPU from leading another benchmark on its way out. This time, it’s the STAC-ML inference benchmark, produced by the Securi Read more…

Quantum Computing Firm Rigetti Faces Delisting

February 3, 2023

Quantum computing companies are seeing their market caps crumble as investors patiently await out the winner-take-all approach to technology development. Quantum computing firms such as Rigetti Computing, IonQ and D-Wave went public through mergers with blank-check companies in the last two years, with valuations at the time of well over $1 billion. Now the market capitalization of these companies are less than half... Read more…

US and India Strengthen HPC, Quantum Ties Amid Tech Tension with China

February 2, 2023

Last May, the United States and India announced the “Initiative on Critical and Emerging Technology” (iCET), aimed at expanding the countries’ partnerships in strategic technologies and defense industries across th Read more…

Pittsburgh Supercomputing Enables Transparent Medicare Outcome AI

February 2, 2023

Medical applications of AI are replete with promise, but stymied by opacity: with lives on the line, concerns over AI models’ often-inscrutable reasoning – and as a result, possible biases embedded in those models Read more…

Europe’s LUMI Supercomputer Has Officially Been Accepted

February 1, 2023

“LUMI is officially here!” proclaimed the headline of a blog post written by Pekka Manninen, director of science and technology for CSC, Finland’s state-owned IT center. The EuroHPC-organized supercomputer’s most Read more…

AWS Solution Channel

Shutterstock 2069893598

Cost-effective and accurate genomics analysis with Sentieon on AWS

This blog post was contributed by Don Freed, Senior Bioinformatics Scientist, and Brendan Gallagher, Head of Business Development at Sentieon; and Olivia Choudhury, PhD, Senior Partner Solutions Architect, Sujaya Srinivasan, Genomics Solutions Architect, and Aniket Deshpande, Senior Specialist, HPC HCLS at AWS. Read more…

Microsoft/NVIDIA Solution Channel

Shutterstock 1453953692

Microsoft and NVIDIA Experts Talk AI Infrastructure

As AI emerges as a crucial tool in so many sectors, it’s clear that the need for optimized AI infrastructure is growing. Going beyond just GPU-based clusters, cloud infrastructure that provides low-latency, high-bandwidth interconnects and high-performance storage can help organizations handle AI workloads more efficiently and produce faster results. Read more…

Intel’s Gaudi3 AI Chip Survives Axe, Successor May Combine with GPUs

February 1, 2023

Intel's paring projects and products amid financial struggles, but AI products are taking on a major role as the company tweaks its chip roadmap to account for more computing specifically targeted at artificial intellige Read more…

Quantum Computing Firm Rigetti Faces Delisting

February 3, 2023

Quantum computing companies are seeing their market caps crumble as investors patiently await out the winner-take-all approach to technology development. Quantum computing firms such as Rigetti Computing, IonQ and D-Wave went public through mergers with blank-check companies in the last two years, with valuations at the time of well over $1 billion. Now the market capitalization of these companies are less than half... Read more…

US and India Strengthen HPC, Quantum Ties Amid Tech Tension with China

February 2, 2023

Last May, the United States and India announced the “Initiative on Critical and Emerging Technology” (iCET), aimed at expanding the countries’ partnership Read more…

Intel’s Gaudi3 AI Chip Survives Axe, Successor May Combine with GPUs

February 1, 2023

Intel's paring projects and products amid financial struggles, but AI products are taking on a major role as the company tweaks its chip roadmap to account for Read more…

Roadmap for Building a US National AI Research Resource Released

January 31, 2023

Last week the National AI Research Resource (NAIRR) Task Force released its final report and roadmap for building a national AI infrastructure to include comput Read more…

PFAS Regulations, 3M Exit to Impact Two-Phase Cooling in HPC

January 27, 2023

Per- and polyfluoroalkyl substances (PFAS), known as “forever chemicals,” pose a number of health risks to humans, with more suspected but not yet confirmed Read more…

Multiverse, Pasqal, and Crédit Agricole Tout Progress Using Quantum Computing in FS

January 26, 2023

Europe-based quantum computing pioneers Multiverse Computing and Pasqal, and global bank Crédit Agricole CIB today announced successful conclusion of a 1.5-yea Read more…

Critics Don’t Want Politicians Deciding the Future of Semiconductors

January 26, 2023

The future of the semiconductor industry was partially being decided last week by a mix of politicians, policy hawks and chip industry executives jockeying for Read more…

Riken Plans ‘Virtual Fugaku’ on AWS

January 26, 2023

The development of a national flagship supercomputer aimed at exascale computing continues to be a heated competition, especially in the United States, the Euro Read more…

Leading Solution Providers

Contributors

SC22 Booth Videos

AMD @ SC22
Altair @ SC22
AWS @ SC22
Ayar Labs @ SC22
CoolIT @ SC22
Cornelis Networks @ SC22
DDN @ SC22
Dell Technologies @ SC22
HPE @ SC22
Intel @ SC22
Intelligent Light @ SC22
Lancium @ SC22
Lenovo @ SC22
Microsoft and NVIDIA @ SC22
One Stop Systems @ SC22
Penguin Solutions @ SC22
QCT @ SC22
Supermicro @ SC22
Tuxera @ SC22
Tyan Computer @ SC22
  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire