Intel today unveiled its third-generation Scalable Xeon processor family (codenamed Cooper Lake) for four- and eight-socket servers, aimed at AI and analytics workloads running in the datacenter. In total, 11 new SKUs were announced with between 16-28 cores, up to 3.1 Ghz base clock (up to 4.3 Ghz with Turbo Boost), and support for up to six memory channels.
Intel says Cooper Lake delivers an average 1.92x performance gain on cloud data analytics usage models and up to 1.98x higher database performance versus a standard five-year-old (Haswell) platform. Supermicro and Lenovo are among the system makers announcing servers today that are optimized for the new Intel processors.
The Cooper Lake launch introduces Intel Optane persistent memory 200 series, which delivers an average of 25 percent more memory bandwidth than the previous generation, according to Intel. As with the Optane 100 series, Optane 200 is available in 128 GB, 256 GB, and 512 GB modules that can sit side by side with traditional DDR4 DIMMs on the motherboard. Up to six modules fit on a single socket, providing up to 3 TB of persistent memory per socket and total memory capacity of 4.5 TB per socket.
Fabricated on Intel’s 14nm++ process, Cooper Lake is the first x86 processor to deliver built-in AI training acceleration through new bfloat16 support added to Intel Deep Learning Boost (DL Boost) technology. Intel describes bfloat16 as “a compact numeric format that uses half the bits as today’s FP32 format but achieves comparable model accuracy with minimal (if any) software changes required.”
With these new AI capabilities, a four-socket Cooper Lake platform delivered 1.93X more AI training performance and 1.87X more AI inference performance for image classification versus a four-socket Cascade Lake reference platform, in Intel benchmarking. In another internal test, Intel demonstrated 1.7x more AI training performance over Cascade Lake on BERT throughput for natural language processing.
Principal analyst at TIRIAS Research Kevin Krewell sees DL Boost and bfloat16 as critical technologies for Intel, providing an important competitive advantage. “AI acceleration with DL Boost and bfloat16 is a really innovative solution to a problem,” said Krewell. “It allows you to pack more performance and save more energy for to get the same workload done. That’s one way that Intel can stay ahead of AMD, by adding this type of instruction innovation. As much as AMD is doing a great job in terms of getting more cores into the same power envelope and improving performance on two-socket and single-socket, AMD still trails Intel in adding machine learning technologies into the server products.”
Intel reports demand for DL Boost and bfloat16 in the four- and eight-socket market, especially among the large cloud service providers. “Facebook has been the most vocal on talking about their use of our 3rd-Gen Xeon processors in their infrastructure,” a company rep told us, referencing Facebook’s May announcement that the new Xeon server CPU would be the foundation for its refreshed Open Compute Platform (OCP) servers. “Alibaba, Tencent and Baidu are also strong advocates for the technology. BF16 provides these customers, and others, with greater performance without the loss of accuracy vs. FP32. Adding BF16 into our DL Boost feature set (which also includes INT8 and FP32) enables us to continue to deliver our customers advanced AI features built into a mainstream server CPU.”
The new processors also introduce enhanced Intel Select Speed technology. Launched with the second-generation Scalable Xeon processors, Intel Select Speed technology provides users controls over the base interval frequencies of specific cores, which allows them to maximize performance of highest priority workloads. “You can think of it as a quality of service type of capability that allows you to prioritize your most important traffic your most important portion of the workload in order to guarantee a best response and then to super efficiently utilize the rest of the compute resources available to you,” said Lisa Spelman, Intel corporate vice president and general manager, Xeon and Memory Group.
Cooper Lake SKUs are differentiated based on supported features, with not all SKUs supporting all features. At the top of the SKU stack are the Intel Xeon Platinum 8380H and 8380HL processors with 28 cores, 2.9 Ghz base frequency (up to 4.3 Ghz with boost), 38.5 MB of cache memory in a 250 watt TDP, supporting four- or eight-socket platforms. The -L designation stands for “large memory” and indicates support for up to 4.5 TB of memory per socket via a combination of Optane persistent memory and DRAM. The Platinum 8380H supports Intel DL Boost for AI training. The 18-core Intel Xeon Platinum 8354H processor is the only one to support Intel DL Boost for both training and inferencing. Intel has provided a reference guide that shows the features supported by each SKU, along with a SKU table.
Originally, Cooper Lake was intended for a full-range of datacenter platforms, including two-socket servers as well as a socketed 56-core part in a multichip module package, but Intel scaled back the product family to meet an interim need between Cascade Lake Refresh and the coming Ice Lake server CPU.
“We felt like the work that we did with the Cascade Lake Refresh solved and helped meet a bunch of the market needs that we would have been trying to address [with a “top to bottom” Cooper Lake]; the four- and the eight- socket and the need for the second-generation of Optane Persistent Memory was the more pressing or pervasive opportunity,” said Spelman. “As we looked at how that all came together, we felt the Cascade Lake Refresh gave a super quick path to performance and upgradeability, Cooper Lake addresses the four- and eight-socket and then Ice Lake for the more mainstream two-socket. That felt like the right fit and gave us a chance to remove some of that congestion.”
Spelman said that Ice Lake, Intel’s 10nm successor to Cooper Lake, is on track to launch later this year.
Intel also reported that Sapphire Rapids, the next-gen 10nm server chip coming after Ice Lake, has recently completed its power on, and the company is testing out its features, including a next generation AI acceleration feature, called Intel Advanced Matrix Extensions or AMX, that Intel says will further boost training and the inference performance. Spelman said the specification for AMX will be published this month to give developers a chance to start preparing for it and optimizing their underlying software.
The third-generation Intel Xeon Scalable processors and Intel Optane persistent memory 200 series modules are shipping to customers today. Facebook, Alibaba, Baidu and Tencent have announced plans to adopt the CPUs. OEM systems are expected to start shipping in the second half of 2020, and Supermicro and Lenovo today announced upgraded servers that leverage the new Xeon processors and Optane 200-series memory modules.