PSC Developing Networking Tool to Speed Big Data Transfers
April 8 — A new, $1 million National Science Foundation grant will enable engineers at Pittsburgh Supercomputing Center (PSC), the National Institute for Computational Sciences, the Pennsylvania State University, the Georgia Institute of Technology, the Texas Advanced Computing Center and the National Center for Supercomputing Applications to create a new tool for high-volume scientific users to achieve faster data transfers over Internet2.
The Developing Applications with Networking Capabilities via End-to-End SDN (DANCES) project will add network bandwidth scheduling capability to the network infrastructure and the supercomputing applications used by the collaborating sites. The DANCES team will develop and integrate file system, scheduling and networking software along with advanced networking hardware. Their aim is to prevent “Big Data” users who are transferring vast amounts of data from being slowed or even halted by periodic surges in competing network traffic.
“There currently is no tool that schedules network resources automatically within our existing scheduling systems,” says Kathy Benninger, PSC Manager of Networking Research and principal investigator in DANCES. “You figure out when you think you need to start your data transfer and then you do it manually.”
But the egalitarian structure of the Internet—and the protocol underlying the majority of network traffic—causes problems for Big Data users. Such researchers and engineers must compete with many other users of all sizes on an equal footing. For example, a researcher transferring a 100-Terabyte data set over a 10 Gbps Internet2 research connection could do the transfer in just over 22 hours. A home user with a typical 15 Mbps Internet connection would need almost 1.7 years to complete the download. But even on the research-only Internet2, such a large user could be bumped and sometimes halted by surges of traffic by other users. An automatic tool that protects designated flows from local congestion—essentially creating a “high occupancy vehicle lane” for large-scale data by prioritizing their traffic—would provide dramatically faster network speeds for Big Data users.
“The idea behind the DANCES tool is that you have an idea of how much data you need to transfer, how long you want to take, and how long your computations will take,” says Joe Lappa, Operations Networking Manager for XSEDE. “So the tool will work backwards and grab the data you need from a site and a network path that isn’t crowded.”
“Instead of having a bunch of equal competing jobs at one time, you’ll be able to push priority data through at a guaranteed, predictable data rate,” Benninger adds.
In addition to developing new software, the DANCES team will use hardware upgrades at the participating institutions that will in essence provide high-speed on-ramps for Big Data users. While most of the chokepoints that the system is intended to bypass are expected to be at the level of the campuses, Internet2 is also participating by monitoring its network capacity, adding more bandwidth if necessary.
Ultimately, the system will provide larger benefits as well. With the Big Data transfers DANCES is designed to serve, the energy wasted and heat generated by slow network speeds is significant.
“It’s greener,” says Lappa. “Your machine’s not waiting. Everything is queued, everything is where it needs to be for a faster data transfer.”
The DANCES web site is available at http://www.dances-sdn.org
Pittsburgh Supercomputing Center ( http://www.psc.edu) is a joint effort of Carnegie Mellon University and the University of Pittsburgh together with Westinghouse Electric Company. Established in 1986, PSC is supported by several federal agencies, private industry and the Commonwealth of Pennsylvania, and is a major partner in the National Science Foundation XSEDE program.
Source: Pittsburgh Supercomputing Center