AI Cloud Competition Heats Up: Google’s TPUs, Amazon Building AI Chip

By Doug Black

February 12, 2018

Competition in the white hot AI (and public cloud) market pits Google against Amazon this week, with Google offering AI hardware on its cloud platform intended to make it easier, faster and cheaper to train and run machine learning/deep learning systems, while Amazon is reportedly developing its own AI chip portfolio. It’s the latest in a series of processor-related moves by the two companies, along with Microsoft Azure, IBM Cloud and other public cloud services providers, have made in recent months to position themselves as AI becomes increasingly integrated into our business and home lives.

Google is making Cloud TPU (Tensor Processing Units) accelerators available starting today on the Google Cloud Platform (GCP), an offering the company said will help get machine learning (ML) models trained and running more quickly.

Cloud TPUs is Google-designed hardware designed to speed and scale up ML workloads programmed with TensorFlow. Built with four custom ASICs, each Cloud TPU has up to 180 teraflops of floating-point performance and 64 GB of memory on a single board.

“Instead of waiting for a job to schedule on a shared compute cluster, you can have interactive, exclusive access to a network-attached Cloud TPU via a Google Compute Engine VM that you control and can customize,” said John Barrus, product manager for Cloud TPUs, Google Cloud, and Zak Stone, product manager for TensorFlow and Cloud TPUs, Google Brain Team, in a jointly written blog post. “Rather than waiting days or weeks to train a business-critical ML model, you can train several variants of the same model overnight on a fleet of Cloud TPUs and deploy the most accurate trained model in production the next day.”

Meanwhile, Reuters reports that Amazon two months ago paid $90 million for home security camera maker Blink and its energy efficient chip technology, according to unnamed sources.

“The deal’s rationale and price tag, previously unreported, underscore how Amazon aims to do more than sell another popular camera, as analysts had thought,” Reuters reported. “The online retailer is exploring chips exclusive to Blink that could lower production costs and lengthen the battery life of other gadgets, starting with Amazon’s Cloud Cam and potentially extending to its family of Echo speakers, one of the people said.”

According to the report, Amazon seeks to strengthen its ties to consumers via in-house devices. And while Amazon’s Cloud Cam and Echo need a plug-in power source, Blink claims its cameras can last two years on two AA lithium batteries.

Amazon declined to comment on the acquisition’s terms or strategy.

In addition, a published report from The Information states that Amazon is developing its own AI chip designed to work on the Echo and other hardware powered by Amazon’s Alexa virtual assistant. The chip reportedly will help its voice-enabled products handle tasks more efficiently by enabling processing to take place locally at the edge, by the device, rather than in AWS.

HPCwire reported last October that the surging demand for HPC and AI compute power has been shrinking the time gap between the introduction of high-end GPUs, primarily developed by Nvidia, and adoption by cloud vendors. “With the Nvidia V100 launch ink still drying and other big cloud vendors still working on Pascal generation rollouts, Amazon Web Services has become the first cloud giant to offer the Tesla Volta GPUs, beating out competitors Google and Microsoft,” HPCwire reported. “Google had been the first of the big three to offer P100 GPUs, but now we learn that Amazon is skipping Pascal entirely and going directly to Volta with the launch of V100-backed P3 instances that include up to eight GPUs connected by NVLink.”

As for Google’s Cloud TPUs, the company said it is simplifying ML training by providing high-level TensorFlow APIs, along with open-sourced reference Cloud TPU model implementations. Using a single Cloud TPU, the authors said ResNet-50 (and other popular models for image classification) “to the expected accuracy on the ImageNet benchmark challenge in less than a day” for less than $200.

Barrus and Stone also said customers will be able to use Cloud TPUs either alone or connected via “an ultra-fast, dedicated network to form multi-petaflop ML supercomputers that we call ‘TPU pods.'” Customers who start now with Cloud TPUs, they said, will benefit from time-to-accuracy improvements wne TPU pods are introduced later this year. “As we announced at NIPS 2017, both ResNet-50 and Transformer training times drop from the better part of a day to under 30 minutes on a full TPU pod, no code changes required.”

“We made a decision to focus our deep learning research on the cloud for many reasons,” said Alfred Spector, CTO at investment management firm Two Sigma, “but mostly to gain access to the latest machine learning infrastructure. Google Cloud TPUs are an example of innovative, rapidly evolving technology to support deep learning, and we found that moving TensorFlow workloads to TPUs has boosted our productivity by greatly reducing both the complexity of programming new models and the time required to train them. Using Cloud TPUs instead of clusters of other accelerators has allowed us to focus on building our models without being distracted by the need to manage the complexity of cluster communication patterns.”

On-demand transportation  company Lyft also said it’s impressed with the speed of Google Cloud TPUs. “What could normally take days can now take hours,” said Anantha Kancherla, head of software, self-driving Level 5, Lyft. “Deep learning is fast becoming the backbone of the software running self-driving cars. The results get better with more data, and there are major breakthroughs coming in algorithms every week. In this world, Cloud TPUs help us move quickly by incorporating the latest navigation-related data from our fleet of vehicles and the latest algorithmic advances from the research community.”

Barras and Stone highlighted in Cloud TPUs the usual advantages offered by public cloud computing.  “Instead of committing the capital, time and expertise required to design, install and maintain an on-site ML computing cluster with specialized power, cooling, networking and storage requirements,” they said, “you can benefit from large-scale, tightly-integrated ML infrastructure that has been heavily optimized at Google over many years.”

Google said Cloud TPUs are available in limited quantities today and usage is billed by the second at the rate of $6.50 USD / Cloud TPU / hour.

This article first appeared in our sister publication, EnterpriseTech.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Alibaba Highlights COVID-19 Research Enabled by Its Cloud HPC

April 8, 2020

Many supercomputer centers are fast-tracking COVID-19-related proposals and prioritizing COVID-19-related jobs on their systems. For the researchers whose access to these systems is limited (or for whom time is too limit Read more…

By Oliver Peckham

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems waded deliberately into the then calmer quantum computing ( Read more…

By John Russell

Ethernet Technology Consortium Launches 800 Gigabit Ethernet Specification

April 7, 2020

The newly rebranded Ethernet Technology Consortium (ETC), formerly known as the 25 Gigabit Ethernet Consortium, announced a new 800 Gigabit Ethernet specification and an expanded scope aimed at meeting the needs of perfo Read more…

By Tiffany Trader

Spanish Researchers Introduce HPC-Ready COVID-19 Spread Simulator

April 7, 2020

With governments in a mad scramble to identify the policies most likely to curb the spread of the pandemic without unnecessarily crippling the global economy, researchers are turning to AI and high-performance computing Read more…

By Oliver Peckham

Stony Brook Researchers to Run COVID-19 Simulations on Supercomputers

April 6, 2020

A wide range of supercomputers are crunching the infamous “spike” protein of the novel coronavirus, from Summit more than a month ago to [email protected] to a Russian cluster just a week ago. Read more…

By Staff report

AWS Solution Channel

Amazon FSx for Lustre Update: Persistent Storage for Long-Term, High-Performance Workloads

Last year I wrote about Amazon FSx for Lustre and told you how our customers can use it to create pebibyte-scale, highly parallel POSIX-compliant file systems that serve thousands of simultaneous clients driving millions of IOPS (Input/Output Operations per Second) with sub-millisecond latency. Read more…

What’s New in Computing vs. COVID-19: Fast-Tracked Research, Susceptibility Study, Antibodies & More

April 6, 2020

Supercomputing, big data and artificial intelligence are crucial tools in the fight against the coronavirus pandemic. Around the world, researchers, corporations and governments are urgently devoting their computing reso Read more…

By Oliver Peckham

Honeywell’s Big Bet on Trapped Ion Quantum Computing

April 7, 2020

Honeywell doesn’t spring to mind when thinking of quantum computing pioneers, but a decade ago the high-tech conglomerate better known for its control systems Read more…

By John Russell

Ethernet Technology Consortium Launches 800 Gigabit Ethernet Specification

April 7, 2020

The newly rebranded Ethernet Technology Consortium (ETC), formerly known as the 25 Gigabit Ethernet Consortium, announced a new 800 Gigabit Ethernet specificati Read more…

By Tiffany Trader

ECP Milestone Report Details Progress and Directions

April 1, 2020

The Exascale Computing Project (ECP) milestone report issued last week presents a good snapshot of progress in preparing applications for exascale computing. Th Read more…

By John Russell

Pandemic ‘Wipes Out’ 2020 HPC Market Growth, Flat to 12% Drop Expected

March 31, 2020

As the world battles the still accelerating novel coronavirus, the HPC community has mounted a forceful response to the pandemic on many fronts. But these efforts won't inoculate the HPC industry from the economic effects of COVID-19. Market watcher Intersect360 Research has revised its 2020 forecast for HPC products and services, projecting... Read more…

By Tiffany Trader

LLNL Leverages Supercomputing to Identify COVID-19 Antibody Candidates

March 30, 2020

As COVID-19 sweeps the globe to devastating effect, supercomputers around the world are spinning up to fight back by working on diagnosis, epidemiology, treatme Read more…

By Staff report

Weather at Exascale: Load Balancing for Heterogeneous Systems

March 30, 2020

The first months of 2020 were dominated by weather and climate supercomputing news, with major announcements coming from the UK, the European Centre for Medium- Read more…

By Oliver Peckham

Q&A Part Two: ORNL’s Pooser on Progress in Quantum Communication

March 30, 2020

Quantum computing seems to get more than its fair share of attention compared to quantum communication. That’s despite the fact that quantum networking may be Read more…

By John Russell

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

[email protected] Turns Its Massive Crowdsourced Computer Network Against COVID-19

March 16, 2020

For gamers, fighting against a global crisis is usually pure fantasy – but now, it’s looking more like a reality. As supercomputers around the world spin up Read more…

By Oliver Peckham

Julia Programming’s Dramatic Rise in HPC and Elsewhere

January 14, 2020

Back in 2012 a paper by four computer scientists including Alan Edelman of MIT introduced Julia, A Fast Dynamic Language for Technical Computing. At the time, t Read more…

By John Russell

Global Supercomputing Is Mobilizing Against COVID-19

March 12, 2020

Tech has been taking some heavy losses from the coronavirus pandemic. Global supply chains have been disrupted, virtually every major tech conference taking place over the next few months has been canceled... Read more…

By Oliver Peckham

[email protected] Rallies a Legion of Computers Against the Coronavirus

March 24, 2020

Last week, we highlighted [email protected], a massive, crowdsourced computer network that has turned its resources against the coronavirus pandemic sweeping the globe – but [email protected] isn’t the only game in town. The internet is buzzing with crowdsourced computing... Read more…

By Oliver Peckham

DoE Expands on Role of COVID-19 Supercomputing Consortium

March 25, 2020

After announcing the launch of the COVID-19 High Performance Computing Consortium on Sunday, the Department of Energy yesterday provided more details on its sco Read more…

By John Russell

Steve Scott Lays Out HPE-Cray Blended Product Roadmap

March 11, 2020

Last week, the day before the El Capitan processor disclosures were made at HPE's new headquarters in San Jose, Steve Scott (CTO for HPC & AI at HPE, and former Cray CTO) was on-hand at the Rice Oil & Gas HPC conference in Houston. He was there to discuss the HPE-Cray transition and blended roadmap, as well as his favorite topic, Cray's eighth-gen networking technology, Slingshot. Read more…

By Tiffany Trader

Fujitsu A64FX Supercomputer to Be Deployed at Nagoya University This Summer

February 3, 2020

Japanese tech giant Fujitsu announced today that it will supply Nagoya University Information Technology Center with the first commercial supercomputer powered Read more…

By Tiffany Trader

Tech Conferences Are Being Canceled Due to Coronavirus

March 3, 2020

Several conferences scheduled to take place in the coming weeks, including Nvidia’s GPU Technology Conference (GTC) and the Strata Data + AI conference, have Read more…

By Alex Woodie

Leading Solution Providers

SC 2019 Virtual Booth Video Tour

AMD
AMD
ASROCK RACK
ASROCK RACK
AWS
AWS
CEJN
CJEN
CRAY
CRAY
DDN
DDN
DELL EMC
DELL EMC
IBM
IBM
MELLANOX
MELLANOX
ONE STOP SYSTEMS
ONE STOP SYSTEMS
PANASAS
PANASAS
SIX NINES IT
SIX NINES IT
VERNE GLOBAL
VERNE GLOBAL
WEKAIO
WEKAIO

Cray to Provide NOAA with Two AMD-Powered Supercomputers

February 24, 2020

The United States’ National Oceanic and Atmospheric Administration (NOAA) last week announced plans for a major refresh of its operational weather forecasting supercomputers, part of a 10-year, $505.2 million program, which will secure two HPE-Cray systems for NOAA’s National Weather Service to be fielded later this year and put into production in early 2022. Read more…

By Tiffany Trader

Exascale Watch: El Capitan Will Use AMD CPUs & GPUs to Reach 2 Exaflops

March 4, 2020

HPE and its collaborators reported today that El Capitan, the forthcoming exascale supercomputer to be sited at Lawrence Livermore National Laboratory and serve Read more…

By John Russell

Summit Supercomputer is Already Making its Mark on Science

September 20, 2018

Summit, now the fastest supercomputer in the world, is quickly making its mark in science – five of the six finalists just announced for the prestigious 2018 Read more…

By John Russell

IBM Unveils Latest Achievements in AI Hardware

December 13, 2019

“The increased capabilities of contemporary AI models provide unprecedented recognition accuracy, but often at the expense of larger computational and energet Read more…

By Oliver Peckham

TACC Supercomputers Run Simulations Illuminating COVID-19, DNA Replication

March 19, 2020

As supercomputers around the world spin up to combat the coronavirus, the Texas Advanced Computing Center (TACC) is announcing results that may help to illumina Read more…

By Staff report

IBM Debuts IC922 Power Server for AI Inferencing and Data Management

January 28, 2020

IBM today launched a Power9-based inference server – the IC922 – that features up to six Nvidia T4 GPUs, PCIe Gen 4 and OpenCAPI connectivity, and can accom Read more…

By John Russell

Summit Joins the Fight Against the Coronavirus

March 6, 2020

With the coronavirus sweeping the globe, tech conferences and supply chains are being hit hard – but now, tech is hitting back. Oak Ridge National Laboratory Read more…

By Staff report

CINECA’s Carlo Cavazzoni Describes the Supercomputing Battle Against COVID-19

March 17, 2020

The latest episode of the This Week in HPC podcast features Carlo Cavazzoni, a senior staff member at CINECA, one of the leading supercomputing organizations in Europe. Intersect360 Research's Addison Snell spoke to Cavazzoni to discuss both CINECA's work using supercomputing to combat COVID-19 and Cavazzoni's personal experience living near the epicenter... Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
Do NOT follow this link or you will be banned from the site!
Share This