DALLAS, Texas, Nov. 13, 2018 — Staff and faculty from Indiana University aim to make history this month by building the world’s first single-channel 400-gigabit-per-second capable network for research and education. The connection will be capable of transmitting 50 gigabytes of data every second—or, enough to stream 16,000 ultra-high-definition movies simultaneously.
Experts from IU’s Research Technologies division will unveil their demonstration, “Wide area workflows at 400 Gbps,” at the 2018 International Conference for High Performance Computing, Networking, Storage, and Analysis, or SC18, in Dallas, Texas, November 11-16. The demo is IU’s submission to the conference’s annual Network Research Exhibition, which spotlights innovation in emerging network hardware, protocols, and advanced network-intensive scientific applications.
IU’s demonstration will leverage the advanced capabilities of SCinet, the dedicated high-capacity network for SC18. For the duration of the conference, SCinet is the fastest and most powerful network in the world.
IU’s Research Technologies division, which is affiliated with the Pervasive Technology Institute at Indiana University, has been an early adopter of cutting-edge network technology since 2003, when the distributed AVIDD Linux cluster, separated by 82 kilometers, was networked together using 10 Gbps Ethernet less than a year after the 10 Gbps Ethernet standard was approved in 2002. Additionally, in 2006 the Lustre-based Data Capacitor was using 10 Gbps Ethernet host adapters long before they were widely deployed. As part of the Data Capacitor project, Indiana University worked with Oak Ridge National Laboratory to examine the feasibility of mounting the Lustre file system across 10 Gbps networks to compute against data in place.
Stephen Simms, manager of IU’s High Performance File Systems team and Data Capacitor grant co-principal investigator, said, “We could see the increasing rate of data production from digital instruments and simulations was creating datasets of such size that they were cumbersome to move. Simple data management tasks, like replication and version control, were becoming tricky operations. In cases like these we thought it might be advantageous to compute in place.”
Neena Imam, deputy director of research collaboration in the Computing and Computational Sciences Directorate at Oak Ridge National Laboratory, whose contribution to the project was supported by the U.S. Department of Defense, said, “Lustre has been used at Oak Ridge National Laboratory for more than a decade at very-large scales for supporting multiple U.S. government agencies’ mission needs. Innovative solutions based on Lustre for accessing data over large distances are needed as data sizes grow at unprecedented rates.“
IU deployed Data Capacitor WAN, the first production Lustre WAN file system in 2009, which allowed researchers to compute against their data across distance, and made it available to the NSF TeraGrid project.
For SC18, and in partnership with researchers from Oak Ridge National Laboratory, IU will use a modest compute resource on the SC18 exhibit floor to mount Slate, a newly acquired Lustre file system from Bloomington, Ind., via four 100 Gbps links to Chicago, Ill., and the new Monon 400 research and education network, providing a single 400 Gbps channel that will connect Chicago to Bloomington.
The team at SC18 intends to benchmark file system performance over distance using multiple protocols and configurations. Additionally, they will be showcasing applications that could potentially benefit from computing in place across distance. For example, IU physicist Matthew Shepherd works on the GlueX program analyzing collision events created by the particle accelerator at Virginia’s Jefferson Laboratory. The lab produces as much as seven PB of data per year and each PB of data contains about 50 billion collision events requiring about 10 million CPU hours to reconstruct trajectories of subatomic particles from raw electronics signals. Even at 400 Gbps, these data would be difficult to move and store on local resources. The team intends to run applications in Dallas against some of the data in Bloomington without having to bundle it all up and move it.
The idea for the most recent demonstration was born at the 2018 Lustre User Group conference, where researchers at Oak Ridge National Laboratory (Nagi Rao, Sarp Oral, Jesse Hanley, and Neena Imam) presented data showing how it would be possible to increase Lustre performance across the WAN through the use of Lustre routers to aggregate and optimize data communication for high latency networks, work likewise supported by the U.S. Department of Defense.
Indiana University is working with, and grateful to, the following collaborators who will make this ground-breaking demonstration happen: Ciena, Hewlett Packard Enterprise, DataDirect Networks, Whamcloud, ESnet, Juniper, Mellanox, NVIDIA, PIER Group, SCinet, Starlight, and the U.S. Naval Research Laboratory.
Source: Indiana University