British hardware designer Graphcore, which emerged from stealth in 2016 to launch its first-generation Intelligence Processing Unit (IPU), has announced its next-generation IPU platform: the IPU-Machine M2000. With the new M2000, Graphcore promises “greater processing power, more memory and built-in scalability for handling extremely large machine intelligence workloads.” The platform – which is available to preorder today – will begin production shipment at the end of 2020.
The M2000 compute blade (which Graphcore describes as “plug-and-play”) delivers one petaflops of “machine intelligence” compute power thanks to four of Graphcore’s new 7nm Colossus Mk2 GC200 IPU processors – each containing 1,472 separate IPU-cores and more than 59.4 billion transistors in an architecture Graphcore is calling “the most complex processor ever made.” The GC200 also contains an “unprecedented” 900 MB of high-speed SRAM inside the processor, a three-fold speedup compared to Graphcore’s first-generation IPU.
The system is supported by Graphcore’s Poplar software stack, allowing users to apply their preferred AI framework while Poplar assembles the compute graph and the necessary runtime programs. The second-generation system offers full backwards compatibility with Graphcore’s first-generation Mk1 IPU products – at an eight-fold speedup, of course.
A new floating-point format developed by Graphcore, called AI-Float, tunes energy and performance for machine learning computation. FP32 IEEE floating point arithmetic is supported via FP16.32 (16-bit multiply with 32-bit accumulate) and FP16.16 (16-bit multiply accumulate), but Graphcore notes that by using stochastic rounding, the Colossus Mk2 IPU is able to keep all arithmetic in 16-bit formats, thereby “reducing memory requirements, saving on read and write energy and reducing energy in the arithmetic logic, while delivering full accuracy machine intelligence results.” AI-Float also provides native support for sparse arithmetic floating-point operations, according to Graphcore.
Graphcore emphasized the scalability of the M2000 with its 1U slim blade design. Configurations scaled beyond eight of the M2000s will use Graphcore’s rack-scale IPU-POD64, which contains 16 M2000s built into a 19-inch rack, providing a whopping 16 exaflops of (AI-Float) machine intelligence computing performance.
For connectivity at this scale, Graphcore is using its new, low-latency IPU-Fabric technology, which it says “keeps communication latency close to constant while scaling from 10s of IPUs to 10s of thousands of IPUs.” Users will be able to choose their preferred mix of CPUs and IPUs (connected via Ethernet), and they will be able to dynamically provision those IPUs using Graphcore’s Virtual-IPU tool.
While full production shipments won’t begin until Q4, Graphcore touts a number of early customers, including Microsoft, the University of Oxford, Lawrence Berkeley National Laboratory, Atos and Simula Research Laboratory.
“We are partnering with Graphcore to make their Mk2 IPU systems products, including IPU-Machine M2000 and IPU-POD scale out systems, available to our customers, specifically large European labs and institutions,” said Arnaud Bertrand, SVP, head of strategy and R&D for big data systems at Atos. “We are already planning with European early customers to build out an IPU cluster for their AI research projects. The IPU new architecture can enable a more efficient way to run AI workloads which fits to the Atos decarbonization initiative and we are delighted to be working with a European AI semiconductor company to realize this future together.”
With this second salvo, Graphcore is aiming to disrupt Nvidia’s market leadership in the increasingly competitive AI silicon market — and they may have a good shot. “With this new product, Graphcore may now be first in line to challenge Nvidia for datacenter AI,” said Karl Freund, senior analyst for AI at Moor Insights & Strategy, “at least for large-scale training.”