Since 1987 - Covering the Fastest Computers in the World and the People Who Run Them

July 26, 2013

How Far Can Google Extend into the Cloud?

Ian Armas Foster

Google is bigger and more diverse than ever when it comes to high tech. Their large and environmentally friendly datacenters land them in Green Computing Report, their big data efforts see them covered in Datanami, and their global science initiative with CERN has been mentioned over on HPCwire.

That notion has been extended by a study done by Deepfield, where it was found that Google accounts for approximately 25 percent of all internet traffic in the United States. That makes it more far-reaching than Facebook, Twitter, and Netflix combined.

That same study, as mentioned in a blog post written by Craig Labovitz, the founder of the internet monitoring company, noted that approximately 60 percent of all internet end devices and users connect with Google in some fashion over an average day.

“This analysis,” Labovitz wrote, “includes computers and mobile device as well as hundreds of varieties game consoles, home media appliances, and other embedded devices (Google’s device share is much larger if we look only at computers and mobile devices).”

Deepfield found that something called Google Global Cache (GGC) was an underrated aspect of Google’s record wide net.

“By far the most striking change in Google’s Internet presence has come with the deployment of thousands of Google servers in Internet providers around the world,” Labovitz wrote. “With little press coverage or fanfare, Google has deployed [GGC] servers in the majority of US Internet providers.”

According to the post on their website about Google Global Cache, the system has high requirements for internet service providers in the United States. The servers were designed to cache networks of 300 Mbps or higher, with the recommendation that networks under 1 Gbps form or join Internet Exchanges to better the cache system.

The implication here is that more intensive networks deliver more accurate predictive results. This makes a modicum of sense, as the big data space knows, the larger the dataset, the higher value one can get from said dataset.

However, big data has also taught us that it is markedly difficult actually garnering insights from those large datasets.

A couple of months ago, HPC in the Cloud looked at a presentation done by Google at their annual I/O event that discussed the results and the future prospects of working with CERN on global scientific efforts.

With that said, a majority of high performance scientific experimentation and computation happening in the cloud take place on Amazon’s EC2 high performance instances.

Now, in the context of high performance computing, these GGC servers are unlikely to be able to band together to form a virtualized, low latency, high throughput network that high performance applications would need. However, it could potentially further the building of a global scientific data-sharing network.

In that sense, Google would be less hosting high performance applications like Amazon does and like Rackspace is starting to do, and more making it easier and more possible for the data used in those applications to be transferred over the internet to anywhere in the world.

As such, the focus would be less on building and maintaining a significant amount of data centers and more on the effort to connect the various internet service providers to each other

“It used to be that the focus of people like Google and Facebook was about building data centers,” Labovitz wrote. “They’re still doing that, but what is equally interesting is watching these edge boxes — these servers being embedded just everywhere.”

Deepfield estimates that their survey covered roughly 20 percent of internet activity in the United States, ensuring a decent representation of the country’s internet usage, particularly when including devices like the Apple TV, Xbox 360, and Roku, almost none of which access Google are go through Google GGCs.

“Of particular note,” Labovitz wrote, “our study leverages anonymized data from core Internet infrastructure (i.e. backbone routers) so that unlike web bug based measurements (e.g. Alexa / Comscore), the data includes traffic from both browsers as well as all embedded devices. We believe this is the largest ongoing study of its kind covering roughly 1/5 of the US consumer Internet.”

Below is a graphic showing the average amount of devices that accessed Google over a given day, as shown by Deepfield.

So how do these Google Global Caches help the cause of high performance computing in the cloud? Put simply, it increases performance and offers bandwidth control across the country’s internet. Along with a direct impact on internet speed, the GGC quickens desired or requested search results. This in turn reduces the necessary bandwidth on the entire network, freeing it up for larger dataset requests from researchers.

Further, Google’s global cache network lends itself to the establishment of a faster global data-sharing network, similar to the one that they and CERN are aiming toward as established at I/O this year.


Share This