As Nvidia’s GPU Technology Conference gets underway in San Jose, Calif., Microsoft today revealed plans to add Pascal-generation GPU horsepower to its Azure cloud. Azure, which already includes an M60 and a K80 GPU-backed instance, will be adding P40 and P100-powered virtual machines to its lineup. The new instance families will not be available until “later in the year” according to Microsoft.
The P40 accelerators will be rolled out as part of the brand-new ND series instance, while the (PCIe-based) P100 will be included in the next generation NC-series, NCv2. Missing from the announcement was any mention of the open source HGX-1 servers, announced in March. We’re still waiting to hear how Azure will utilize the eight-way NVLink-connected P100 boxes developed under Project Olympus. A future NCv3 instance perhaps?
Since the N+X naming convention can get confusing, here’s a quick cheat-cheat.
NV series: M60 GPUs
NC series: K80 GPUs
NCv2 series: P100 GPUs (new, not yet available)
ND series: P40 GPUs (new, not yet available)
Microsoft reports that the new ND-series, based on Pascal-generation P40 architecture, is excellent for training and inference. “These instances provide over 2x the performance over the previous generation for FP32 (single precision floating point operations), for AI workloads utilizing CNTK, TensorFlow, Caffe, and other frameworks,” said Corey Sanders, director of compute, Azure, in a blog post. “The ND-series also offers a much larger GPU memory size (24GB), enabling customers to fit much larger neural net models.”
Microsoft and Nvidia emphasized the performance boost provided by GPUs for AI and deep learning workloads, including image recognition, speech training, and natural language processing, but also identified the benefit to traditional HPC workloads, such as reservoir modeling, DNA sequencing, protein analysis, Monte Carlo simulations and rendering.
Both Pascal based offerings in Azure provide a VM option with RDMA and InfiniBand connectivity to support HPC workloads and speed large-scale neural net training jobs spanning up to hundreds of GPUs.
ND Instance sizes
NCv2 Instance sizes
There’s a sign-up page here for those seeking a private preview of the new instance types. Microsoft says it will respond if “additional preview participants are needed.”
Other cloud providers, notably Nimbix, IBM and Cirrascale, have already deployed the Pascal-gen P100s in their clouds. Google says that P100s will be “coming soon” to its cloud. Tencent is in the process of incorporating P100 and P40 accelerators into its datacenters.