The Bandwidth Bottleneck
Last week, HPC in the Cloud discussed what types of HPC applications are best suited for cloud technologies. While capabilities offered by cloud providers (minimal upfront costs, high scalability and quick time to deployment) remain attractive to HPC users, the needs of their workloads are sometimes at odds with the technology. One particular hurdle is the amount of bandwidth between the end user and their provider of choice. Earlier this week, a scalability.org blog covered this dilemma, calling it a “non-trivial” issue.
Most public cloud providers are best suited for Web hosting, email services and similar ongoing tasks. Their infrastructures are geared toward these purposes, scaling up capacity relative to end user demand. However, if a single user wants to store and process massive datasets, the lack of high bandwidth connectivity can severely hinder their research.
NASA is familiar with this problem. The agency recently launched a program called NEX, which houses 40 years of earth satellite data in a storage cluster next to their Pleiades supercomputer. NASA AMES Earth scientist Ramakrishna Nemani, spoke to us about the project. He described how long it took to migrate a large collection of landsat images from a datacenter in South Dakota to the AMES facility.
“I’ll give you an example about how difficult this has been. We brought about 400 terabytes of data from the EROS datacenter in Sioux Falls, South Dakota. I was blown away, it took us nearly 6 ½ months.”
With a turnaround time like that, it probably would have been easier to FedEx the dataset on a set of hard drives. The scalability blog directs blame for this kind of issue at lack of competition between ISPs in the US.
They priced an asymmetric connection delivering 100Mbit/s down and 10-15Mbit/s up at roughly $300/ mo. That translates to 12.5MByte/s down and 1.25MByte/s up.
Given that performance, an end user could download roughly one terabyte per day. But since the upload transfers at 10 percent the download speed, it would take approximately 10 days to upload a single terabyte.
Although standard service providers have been lacking in their ability to match throughput with demand, they may receive more incentive from Google. The Internet search giant has decided to throw themselves into the mix, launching their own fiber service in Kansas City. For $70 a month, users can get symmetrical 1,000Mbit/s (1 Gb/s) connectivity. With that performance, the 10 day/TB upload becomes a more practical, two hour transfer.
By effectively eliminating the bandwidth bottleneck, end users have the ability to implement a new range of cloud-based services. This includes high capacity storage and data-intensive research. Unfortunately Google’s service is limited to Kansas City and no plans to expand the program have been announced.