This week during SC13, Intel hosted a roundtable session to discuss the future of its upcoming Knights Landing product, hitting on where the key benefits are expected for technical computing users and how Knight’s Landing might influence the shape of next generation systems and applications.
As Intel turns its focus on the Xeon front to doubling FLOPS, boosting memory bandwidth and stitching in I/O, technical computing lead Raj Hazra says the long-term goal is to make the full transition from multi-core to manycore via the Knights-codenamed family.
Knight’s Corner, which was introduced last year, has already received a fair degree of press and play, especially with the announcement of its position inside the world’s top Chinese super as well as powering several other top-ranked systems. But Hazra said the focus goes beyond just the Top 500—he says they’re seeing significant interest in the broader technical computing space, in part because of the familiar x86 programming underpinnings.
Hazra reiterated the messaging around the eventual reality of Knight’s Landing—the next generation Xeon Phi on 14 nm. The key to Knight’s Landing, says Hazra, is that takes the biggest problem of heterogeneous computing—the offloading—out of the picture, allowing users far more memory and cache flexibility and capacity, and ultimately allowing the choice of using Landing in a self-hosted manner or as an accelerator.
“We continue to innovate at the microarchitecture level of the core itself to target per core performance and efficiency,” he noted. “A big challenge when you create these types of architectures that are designed to consume a lot of data quickly is, ‘how to feed the beast’?” The solution to beast-feeding, at least according to Intel, is nestled inside the Knight’s Landing memory architecture.
Today, whether it’s a GPU or first-generation Xeon Phi, GDDR memory reigns, which only offers relatively limited memory capacity to handle the needed offloaded code. Knight’s Landing targets this shortfall by the effort that Hazra describes to “bring back enough capacity and bandwidth to allow the programmers to use this CPU just like they’d use a Xeon today but with better performance.”
As he described, “In Knight’s Landing, we provide an in-package high bandwidth memory; enough capacity and to actually hold meaningful portions of workloads or the workloads themselves and have it backed with a very large amount of standard DDR memory. The important thing here is the way that applications and systems software can use this memory is flexible.” From using it as cache, controlling (or not) the memory placement, and taking advantage of the added cores, Hazra said he’s confident Intel can deliver on the opportunity for allowing users to build workload-targeted, customized systems.
In terms of customizations, Intel isn’t clear on what the next generation of Xeons will bring to actual systems over the next few years. These will, according to Hazra, be driven by a combination of technical, economic and practical needs. The customization angle, however, which is driven forward by the fact that users will be able to extend their Knight’s Corner systems by swapping out cards to create either self-hosted or accelerated systems, will play out by application area.
Hazra speculated that in addition to standard “upgrading” of the cards, some users might use Knight’s Landing cards to replace Knight’s Corner cards because they will be PCIe compliant, so they can be dropped in. The catch is that many of the next-gen applications are going to have to be primed for this new generation of possibilities. When asked whether it’s possible to put the two Knight’s Families on the same socket where they might theoretically share memory, Hazra blinked slowly a few times and said, “at this point it hasn’t been disclosed.” So there’s that.
In sum, according to Hazra, the last few years have seen both quiet and highly visible revolutions inside HPC systems. The rise of heterogeneous computing has been at both ends of the spectrum; from the long development cycles to the widely-publicized placement of the company’s MIC architecture infiltrating the top supercomputer rankings with more installations expected over the coming year.