At its AI developer conference in San Francisco yesterday, Intel embraced a holistic approach to AI and showed off a broad AI portfolio that includes Xeon processors, Movidius technologies, FPGAs and Intel’s Nervana Neural Network Processors (NNPs), based on the technology it acquired in 2016.
In his opening keynote, Naveen Rao, general manager of Intel’s artificial intelligence products group and former CEO of Nervana Systems, revealed that the first commercial Nervana product will debut in late 2019 and will be called NNP L-1000 (codename: Spring Crest). Intel anticipates that Spring Crest will offer 3-4x the training performance of its development product Lake Crest. Originally scheduled for availability last year, Lake Crest is being used as a software development vehicle to gather feedback from early partners. This is reminiscent of how Intel handled its first Phi product, Knights Ferry, a development prototype that was never widely available. And speaking of Phi, Knights Mill, the ML-specific Phi product that had a quiet launch late last year, was absent from Intel’s AI roapmap update, indicating it may be the end of the road for that line.
Another product we didn’t hear about was Knights Crest, the original intended follow-on to Lake Crest that was to integrate Intel Xeon processors with Nervana technology. The branding overlap with the Phi line was always confusing anyway, so if Intel let go of that codename it’s probably for the best, but it will be interesting to see if Intel reprises its plans for a joint Xeon-Nervana chip.
Lake Crest incorporates 12 cores, each equipped with two math core units, delivering 40 peak tera-ops of performance and drawing less than 210 watts per device. Each chip sports 32 GB of high bandwidth memory (HBM2) and offers 2.4 terabits per second off-chip I/O bandwidth at less than 790 nanoseconds of latency.
High compute utilization and robust model parallelism were Intel’s primary goals in building its Nervana platform and the company believes that Lake Crest accomplishes both. According to Intel benchmarking, General Matrix to Matrix Multiplication (GEMM) operations have achieved more than 96 percent utilization, representing around 38 tera-ops of actual, not theoretical, performance on a single chip. The company further claims Lake Crest achieves 96.2 percent scaling efficiency. See Rao’s blog post for a drill-down on the details.
In that post, Rao writes: “Our industry talks a lot about maximum theoretical performance or TOP/s numbers; however, the reality is that much of that compute is meaningless unless the architecture has a memory subsystem capable of supporting high utilization of those compute elements. Additionally, much of the industry’s published performance data uses large square matrices that aren’t generally found in real-world neural networks.
“At Intel, we have focused on creating a balanced architecture for neural networks that also includes high chip-to-chip bandwidth at low latency. Initial performance benchmarks on our NNP family show strong competitive results in both utilization and interconnect.”
Rao also announced that Spring Crest will support bfloat16, an emerging numerical format being adopted for neural networks. “Essentially this is taking a 32-bit floating point number and culling it down to 16 bits, which allows you to use I/O more effectively because you’re moving half the bits around and allows you to build smaller multipliers and have more of them on a die and do it at lower power,” said Rao, adding that bfloat16 can achieve the same algorithmic performance and convergence as the larger 32-bit memory. Intel will initially introduce bfloat16 to its Nervana platform and says it plans to extend that support across its AI product lines.
As a final point, Rao teased that Intel is also working on a discrete accelerator for inference, and that more details would be forthcoming.
Intel’s animation of Nervana Lake Crest architecture: