Nvidia’s DGX-Ready Data Center Program, announced in January and designed to provide colo and public cloud-like options to access the company’s GPU-powered servers for AI workloads, has expanded the program beyond the U.S. and doubled the number of datacenter partners to 19.
The company said it has added three new partners in Europe, five in Asia and two in North America and that the program is now is available in 24 countries.
Nvidia touts DGX-Ready as easing adoption of GPU computing, which places energy and cooling demands on compute infrastructures beyond the capabilities of many on-prem datacenters built for conventional IT workloads. The company’s DGX product line includes servers with eight and 16 GPUs, while the DGX Pod, ranked the world’s 22nd fastest supercomputer, has 96 DGX-2H nodes.
“Many organizations that want to deploy AI in their infrastructure find they lack the capital budget to design, build and support GPU-accelerated computing within their premises,” Tony Paikeday, Nvidia’s director of product marketing, AI and DL, told us. “This is partly because AI workloads place unique demands on resources in the datacenter, including power and cooling, needed to support these systems. …organizations that want to buy our solutions need proven facilities to host them now, but in a cost-effective, accessible format delivered…as operating expense. For these customers…, the DGX-Ready Program offers a validated network of colocation services providers whose datacenter facilities meet our requirements for hosting DGX compute systems….”
Among Nvidia’s new program partners are Verne Global, with its zero-carbon, hydro-thermal-powered datacenters in Iceland, and Fujitsu, with its Yokohama datacenter that hosts more than 60 Nvidia DGX-1 and DGX-2 systems.
Paikeday said DGX-Ready Program has been expanded to include a try-and-buy offering that lets customers sample pre-configured systems, along with, from some partners, GPU-as-a-Service, a cloud-like model using DGX servers and Docker containers.
Today’s announcement comes on the heels of yesterday’s news that the latest round of the MLPerf AI benchmarking suite showed Google Cloud and Nvidia each with three wins in the at-scale division, with Nvidia claiming eight new performance records. Given that Google’s Tensor Processing Units (TPUs) AI chips are offered on Google Cloud while DGX systems are typically installed on-prem, is the DGX-Ready a response to the TPUs’ cloud consumption model?
“The marketplace is demanding more choice and flexibility in how they consume accelerated compute resources,” Paikeday said. “At one point, it was almost exclusively an on-premises story. Now organizations want to have hybrid approaches… What we find is a lot of customers start in cloud, gain competency there, they’ll build more complex models, they’ll grow their data sets, and at some point, they find their iterations speeds goes down because a lot of the workflow is now preoccupied with moving the data physically to where the compute resources are, and… spending more time carefully grooming each training run. And as we know, it’s all about speed of iteration with these models to get the best models out.”