Google continues to add GPU horsepower in tandem with its internally developed deep learning processors to its cloud platform with this week’s announcement that it will soon offer Nvidia’s Tesla P4 accelerators for AI inferencing workloads.
Among many partnership announcements during Google’s annual cloud event, the search giant said it would be the first cloud provider to offer the small form-factor GPUs in Europe and the U.S.
Google announced in May it would join cloud rivals Amazon Web Services and Microsoft Azure in offering Tesla V100 GPUs. The datacenter GPU is aimed at machine learning and high-performance computing workloads.
The partners said Wednesday (July 25) the addition of Tesla P4 responds to customer demand for GPU-accelerated inferencing platforms focused on real-time AI-based services.
Besides the smaller package, Nvidia claimed the P4 graphics card delivers as much as a 40-fold processing boost for training emerging AI models such as understanding speech and live translations. Extremely low latency is a key requirement for these apps, added Parash Kharya, Nvidia’s product marketing chief for accelerated computing.
Nvidia said the P4 handles inference tasks with 1.8 milliseconds of latency. “This unlocks a new wave of AI services previous impossible due to latency limitations,” the GPU specialist claims.
Along with machine learning inference, the partners said the cloud-based graphics processor also handles tasks such as visualization and video transcoding, the process for converting files from one format to another so, among other applications, they can be viewed on different devices.
The Tesla P4 inferencing engine is based on Nvidia’s Pascal architecture and is geared specifically to boost the performance of servers running deep learning workloads.
Nvidia first announced the Tesla P4 in September 2016 as the GPU vendor’s neural network inferencing card. The Pascal architecture includes 2,560 CUDA cores and delivers 5.5 teraflops of single-precision or 22 INT8 tera-operations-per-second of peak speed.
Google was vague about availability, saying only that Nvidia’s P4 GPUs would be “coming soon” to its public cloud.
The Nvidia partnership is the latest in a steady stream of cloud deals announced in recent weeks by Google. Earlier this week, SAP announced it was collaborating with the public cloud vendor and to offer cloud virtual machines supporting in-memory workloads running on the SAP HANA database manager.
“Google Cloud’s mission is to organize [customers’] information, and supercharge it,” CEO Diane Greene said during this week’s company event in San Francisco.
Separately, Google released its third-generation cloud Tensor Processing Units, or TPUs, a move aimed squarely at AI developers using its Auto ML service designed to improve model training. “We know that many [developers] need more flexibility than our APIs were designed for,” said Fei-Fei Li, Google’s chief cloud scientist. “That’s why we developed Auto ML.”