As part of the run-up to SC18, taking place in Dallas next week (Nov. 11-16), Intel is doling out info on its next-gen Cascade Lake family of Xeon processors, specifically the “Advanced Performance” version (Cascade Lake-AP), architected for high-performance computing, artificial intelligence and infrastructure-as-a-service workloads.
Cascade Lake-AP, like the rest of the Cascade Lake line, is based on 14nm technology, and includes a new AI extension called Intel Deep Learning Boost (DL Boost) that extends the Intel AVX 512, adding a new vector neural network instruction (VNNI) to drive inference performance. Cascade Lake also debuts Intel Optane DC persistent memory and implements hardware security mitigation to address Spectre and Meltdown vulnerabilities.
As rumored, Cascade Lake-AP will employ a multichip package (MCP) design, joining two 24-core dies in a package with Ultra Path Interconnect (UPI) links. The result is a 48-core server chip that supports 12 channels of DDR4 memory per socket. That’s a doubling over Skylake, and 50 percent more than what we’ve seen so far from AMD with Eypc or Marvell with its Cavium ThunderX Arm offering.
Multi-chip packaging is part of Intel’s strategy to fend off competitive advances from AMD, which is disrupting the datacenter with more cores and higher I/O, but it is a turnabout for a company that has championed the benefits of the monolithic big-die approach. In a pre-briefing last Wednesday (Oct. 31), Lisa Spelman, vice president and general manager of Xeon products, held that it’s not MCPing that Intel has objected to. “We’ve done [MCP] for several generations — our Xeon D products, our Atom products, our SoC product lines in general. We think there’s a lot of value in integrating capabilities into the package for increased performance or better cost structures or a variety of different reasons.”
Spelman highlighted one of the primary differences in the architecture of Cascade Lake-AP is the ability to utilize UPI to deliver better interconnection between the dies and between the processor sockets. “Having the UPI connection leads to performance consistency versus performance variability,” she said.
Intel is claiming that the new 48-core Cascade Lake-AP offers a performance leadership over competitor AMD’s 32-core Eypc 7601 product. According to Intel projections (configuration details at end of article), the company expects a dual-socket Cascade Lake-AP server to deliver a 3.4x boost on Linpack and achieve 1.3x better results on Stream Triad versus an AMD 7601 dual-socket server.
“When it launches, we expect Cascade Lake Advanced Performance to be the world’s fastest CPU, based on our current understanding of the Linpack performance of general processors commercially available in 2019,” Intel stated in press materials.
The standard disclaimers about the perils of vendor-led benchmarking apply, and note the Intel numbers are modeling-based projections for a future product.
While Intel focuses on setting expectations around performance leadership, we will have to wait for future disclosures over the coming months to find out about the power profile of Cascade Lake-AP, a key factor in TCO equations. Spelman said Intel is looking to make those power efficiency assessments as the market develops, but for now fell back on emphasizing the Xeon portfolio’s myriad options for satisfying both power-constrained and performance-bound users.
With its new DL Boost feature, Intel is promising a step-function improvement for deep learning inference moving to Cascade Lake-AP. Modeling by Intel (disclosed earlier this year) showed that Cascade Lake with DL Boost achieves an average speedup of about 11x images per second over Skylake, running Caffe ResNet-50. Cascade Lake-AP takes that to a 17x, according to Intel.
The Cascade Lake-AP product is targeted for two-socket servers, while other Cascade Lake family products will support four-socket and eight-socket glueless architectures as well as the ability to scale to sixteen and thirty-two sockets.
Intel reports it is already shipping Cascade Lake product for revenue this quarter, but it will launch the entire Cascade Lake family, including the AP version, “in the first part of 2019.” We’ve been told that more details about the Cascade Lake Advanced Processor class will be announced during SC.
Intel’s configuration details in support of its “performance leadership” claims:
LINPACK: AMD EPYC 7601: Supermicro AS-2023US-TR4 with 2 AMD EPYC 7601 (2.2GHz, 32 core) processors, SMT OFF, Turbo ON, BIOS ver 1.1a, 4/26/2018, microcode: 0x8001227, 16x32GB DDR4-2666, 1 SSD, Ubuntu 18.04.1 LTS (4.17.0-041700-generic Retpoline), High Performance Linpack v2.2, compiled with Intel(R) Parallel Studio XE 2018 for Linux, Intel MPI version 18.0.0.128, AMD BLIS ver 0.4.0, Benchmark Config: Nb=232, N=168960, P=4, Q=4, Score = 1095GFs, tested by Intel as of July 31, 2018. compared to 1-node, 2-socket 48core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018.
Stream Triad: 1-node, 2-socket AMD EPYC 7601, http://www.amd.com/system/files/2017-06/AMD-EPYC-SoC-Delivers-Exceptional-Results.pdf tested by AMD as of June 2017 compared to 1-node, 2-socket 48-core Cascade Lake Advanced Performance processor projections by Intel as of 10/3/2018.