Microsoft and Google are driving a major computing shift by bringing AI to people via search engines, and one measure of success may come down to the hardware and datacenter infrastructure supporting the applications.
Last week, Microsoft and Google announced next-generation AI-powered search engines that can reason and predict, and provide more comprehensive answers to user questions. The search engines will be able to generate full answers to complex queries, much like how ChatGPT can provide detailed answers or compile essays.
Microsoft is putting AI in Bing to respond to text queries, and Google shared plans to put AI in its text, image and video search tools. The announcements were made on back-to-back days last week.
The companies acknowledged that the AI into search engines would not be possible without strong hardware infrastructure. The companies did not share details on the actual hardware driving the AI computing.
For years, Microsoft and Google have been nurturing AI hardware designed for primetime announcements like last week’s AI search engines.
The companies have vastly different AI computing infrastructures, and the speed of responses and accuracy of results will be an acid test on the viability of the search engines.
Google’s Bard is powered by its TPU (Tensor Processing Unit) chips in its cloud service, which was confirmed by a source familiar with the company’s plans. Microsoft said its AI supercomputer in Azure – which likely runs on GPUs – can deliver results in the order of milliseconds, or at the speed of search latency.
That sets up a very public battle in AI computing between Google’s TPUs against the AI market leader, Nvidia, whose GPUs dominate the market.
“Teams were working on powering and building out machines and data centers worldwide. We were carefully orchestrating and configuring a complex set of distributed resources. We built new platform pieces designed to help load balance, optimize performance and scale like never before,” said Dena Saunders, a product leader for Bing at Microsoft, during the launch event.
Microsoft is using a more advanced version of OpenAI’s ChatGPT. At the Microsoft event, OpenAI CEO Sam Altman estimated there were 10 billion search queries every day.
Microsoft’s road to Bing with AI started with making sure it had the computing capacity with its AI supercomputer, which the company claims is among the five fastest supercomputers in the world. The computer isn’t listed in the Top500 rankings.
“We referenced the AI supercomputer, but that work has taken years and it’s taken a lot of investments to build the type of scale, the type of speed, the type of cost that we can bring in every layer of the stack. I think that … is quite differentiated, the scale at which we operate,” said Amy Hood, executive vice president and chief financial officer at Microsoft, during a call with investors last week.
The cost of computing for AI at the supercomputer layer will continue to come down over time as usage scales and optimizations are implemented, Hood said.
“The cost per search transaction tends to come down with scale, of course, I think we’re starting with a pretty robust platform to be able to do that,” Hood said.
The computing costs typically go up as more GPUs are implemented, with the cooling costs and other supporting infrastructure adding to bills. But companies typically tie revenue to the cost of computing.
Microsoft’s AI supercomputer was built in partnership with OpenAI, and it has 285,000 CPU cores and 10,000 GPUs. Nvidia in November signed a deal to put tens of thousands of its A100 and H100 GPUs into the Azure infrastructure.
Microsoft’s Bing search share does not come close to Google Search, which had a 93 percent market share in January, according to Statcounter.
Artificial intelligence is fundamentally a different style of computing predicated on the ability to reason and predict, while conventional computing revolves around logical calculations. AI is done on hardware that can carry out matrix multiplication, while conventional computing has revolved around CPUs, which excel at serial processing of data.
Google is taking a cautious approach and releasing its Bard conversational AI as a lightweight modern version of its LaMDA large-language model. Google’s LaMDA is a homegrown version that competes with OpenAI’s GPT-3, which underpins the ChatGPT conversational AI.
“This much smaller model needs significantly less computing power, which means we’ll be able to scale it to more users and get more feedback,” said Prabhakar Raghavan, a senior vice president at Google who is in charge of the search business, during an event last week.
The infrastructure buildout to handle AI search is still a work in progress and there is a lot that Microsoft and Google need to figure out, said Bob O’Donnell, principal analyst at Technalysis Research.
Microsoft realizes that AI computing is evolving quickly, and is open to testing and using new AI hardware, O’Donnell said, who talked to Microsoft’s infrastructure team at the Bing AI launch event last week.
“They also made it clear that ‘we are trying everything, because it’s changing all the time. And even the stuff we are doing now is going to change over time – there will be differences down the road,'” O’Donnell said.
It is more important for Microsoft to have a computing platform that is more flexible “than necessarily 5% faster on one given task,” O’Donnell said.
“They admitted, that ‘look, we’re going to learn a lot in the next 30 days as people start to use this and we start to see what the loads are really like.’ It is very much of a dynamic, in motion kind of thing,” O’Donnell said.
For example, Microsoft may learn about the peak times when people are hitting servers with their search requests. During low usage periods, Microsoft could switch from the inferencing part, which is what spits out the results, to the training part, which requires more GPU computing, O’Donnell said.
Google’s TPUs, introduced in 2016, have been a key component of the company’s AI strategy. The TPUs famously powered AlphaGo, the system that defeated Go champion Lee Sedol in 2016. The company’s LaMDA LLM was developed to run on TPUs. Google’s sister organization, DeepMind, is also using TPUs for its AI research.
Google’s chip “has significant infrastructure advantages using the in-house TPUv4 pods versus Microsoft/OpenAI using Nvidia-based HGX A100s” in a raw AI implementation with minimal optimizations, said SemiAnalysis founder, Dylan Patel, in a newsletter that lays out the billions of dollars it will cost Google to insert large-language models into its search offerings.
Over time, the costs will decrease as hardware scales and models are optimized to the hardware, Patel wrote.
Facebook is now building datacenters with the capacity for more AI computing. The Facebook clusters will have thousands of aaccelerators, which include GPUs, and will operate in a power envelope of eight to 64 megawatts. The AI technologies are used to remove objectionable content, and the computing clusters will drive the company’s metaverse future. The company is also building an AI research supercomputer with 16,000 GPUs.
Generally, datacenters are now being built for targeted workloads, which increasingly are around artificial intelligence applications, and feature more GPU and CPU content, said Dean McCarron, principal analyst at Mercury Research.
Cloud providers go through lengthy evaluation cycles of picking the best CPUs, GPUs and other components. The total cost of ownership is another consideration.
“One of the other issues here is how flexible is it? Because some buyers may not want to dedicate, or make too big of a commitment to a particular workload, not knowing if it will be there in the future,” McCarron said.
Datacenters that preferentially support AI workloads will see a little bit more uptake for both GPUs and CPUs from Intel, Nvidia and AMD. Some may choose alternate accelerators for AI workloads, but they could coexist with GPUs and CPUs.
“You’re always going to need faster GPUs. Ten years in the future, in a datacenter, are there going to be CPUs? Yes. Are there going to be GPUs? Yes, as well,” McCarron said.
Header image created using OpenAI’s DALL·E 2.