December 09, 2009
Caltech-led high-energy physicists show how long range networks can be used to support leading edge science
PASADENA, Calif., Dec. 5 -- Building on eight years of record-breaking developments, and on the restart of the Large Hadron Collider (LHC), an international team of high-energy physicists, computer scientists, and network engineers led by the California Institute of Technology (Caltech) joined forces to capture the Bandwidth Challenge award for massive data transfers during the SuperComputing 2009 (SC09) conference held in Portland, Ore.
Caltech's partners in the project include scientists from Michigan (UM), Fermilab, Brookhaven National Laboratory, CERN, San Diego (UCSD), Florida (UF and FIU), Brazil (Rio de Janeiro State University, UERJ, and the São Paulo State University, UNESP), Korea (Kyungpook National University, KISTI), Estonia (NICPB) and Pakistan (NUST).
Caltech's exhibit at SC09 by the High Energy Physics (HEP) group and the Center for Advanced Computing Research (CACR) demonstrated applications for globally distributed data analysis for the LHC at CERN. It also demonstrated Caltech's worldwide collaboration system, EVO (Enabling Virtual Organizations), developed with UPJS in Slovakia; its global-network and grid monitoring system MonALISA; and its Fast Data Transfer application, developed in collaboration with the Politechnica University (Bucharest). The CACR team also showed near-real-time simulations of earthquakes in the Southern California region, experiences in time-domain astronomy with Google Sky, and recent results in multiphysics multiscale modeling.
The focus of the exhibit was the HEP team's record-breaking demonstration of storage-to-storage data transfer over wide area networks from two racks of servers and a network switch-router on the exhibit floor. The high-energy physics team's demonstration, "Moving Towards Terabit/Sec Transfers of Scientific Datasets: The LHC Challenge," achieved a bidirectional peak throughput of 119 gigabits per second (Gbps) and a data flow of more than 110 Gbps that could be sustained indefinitely among clusters of servers on the show floor and at Caltech, Michigan, San Diego, Florida, Fermilab, Brookhaven, CERN, Brazil, Korea, and Estonia.
Following the Bandwidth Challenge, the team continued its tests and established a world-record data transfer between the Northern and Southern hemispheres, sustaining 8.26 Gbps in each direction on a 10 Gbps link connecting São Paulo and Miami.
By setting new records for sustained data transfer among storage systems over continental and transoceanic distances using simulated LHC datasets, the HEP team demonstrated its readiness to enter a new era in the use of state-of-the-art cyber infrastructure to enable physics discoveries at the high energy frontier, while demonstrating some of the groundbreaking tools and systems they have developed to enable a global collaboration of thousands of scientists located at 350 universities and laboratories in more than 100 countries to make the next round of physics discoveries.
"By sharing our methods and tools with scientists in many fields, we hope that the research community will be well-positioned to further enable their discoveries, taking full advantage of current networks, as well as next-generation networks with much greater capacity as soon as they become available," says Harvey Newman, Caltech professor of physics, head of the HEP team, colead of the U.S. LHCNet, and chair of the U.S. LHC Users Organization. "In particular, we hope that these developments will afford physicists and young students throughout the world the opportunity to participate directly in the LHC program, and potentially to make important discoveries."
One of the features of next-generation networks supporting the largest science programs-notably the LHC experiments-is the use of dynamic circuits with bandwidth guarantees crossing multiple network domains. The Caltech team at SC09 used Internet2's recently announced ION service-developed together with ESnet, GEANT and in collaboration with US LHCNet-to create a dynamic circuit between Portland and CERN as part of the bandwidth-challenge demonstrations.
One of the key elements in this demonstration was Fast Data Transfer (FDT), an open-source Java application developed by Caltech in close collaboration with Politechnica University in Bucharest. FDT runs on all major platforms and uses the NIO libraries to achieve stable disk reads and writes coordinated with smooth data flow using TCP across long-range networks. The FDT application streams a large set of files across an open TCP socket, so that a large data set composed of thousands of files-as is typical in high-energy physics applications-can be sent or received at full speed, without the network transfer restarting between files. FDT can work on its own, or together with Caltech's MonALISA system, to dynamically monitor the capability of the storage systems as well as the network path in real time, and send data out to the network at a moderated rate that achieves smooth data flow across long-range networks.
Since it was first deployed at SC06, FDT has been shown to reach sustained throughputs among storage systems at 100 percent of network capacity where needed in production use, including among systems on different continents. FDT also achieved a smooth bidirectional throughput of 191 Gbps (199.90 Gbps peak) using an optical system carrying an OTU-4 wavelength over 80 km provided by CIENA last year at SC08.
Another new aspect of the HEP demonstration was large-scale data transfers among multiple file systems widely used in production by the LHC community, with several hundred terabytes per site. This included two recently installed instances of the open-source file system Hadoop, where in excess of 9.9 Gbps was read from Caltech on one 10 Gbps link, and up to 14 Gbps was read on shared ESnet and NLR links, a level just compatible with the production traffic on the same links. The high throughput was achieved through the use of a new FDT/Hadoop adaptor-layer written by NUST in collaboration with Caltech.
The SC09 demonstration also achieved its goal of clearing the way to Terabit/sec (Tbps) data transfers. The 4-way Supermicro servers at the Caltech booth-each with four 10GE Myricom interfaces-provided 8.3Gbps of stable throughput each, reading or writing on 12 disks, using FDT. A system capable of one Tbps to or from storage could therefore be built today in just six racks at relatively low cost, while also providing 3,840 processing cores and 3 petabytes of disk space, which is comparable to the larger LHC centers in terms of computing and storage capacity.
An important ongoing theme of SC09 -- including at the Caltech booth, where the EVOGreen initiative was highlighted -- was the reduction of carbon footprint through the use of energy-efficient information technologies. A particular focus is the use of systems with a high ratio of computing and I/O performance to energy consumption. In the coming year, in preparation for SC10 in New Orleans, the HEP team will be looking into the design and construction of compact systems with lower power and cost that are capable of delivering data at several hundred Gbps, aiming to reach 1 Tbps by 2011 when multiple 100 Gbps links into SC11 may be available.
The two largest physics collaborations at the LHC-CMS and ATLAS, each encompassing more than 2,000 physicists, engineers, and technologists from 180 universities and laboratories-are about to embark on a new round of exploration at the frontier of high energies. When the LHC experiments begin to take collision data in a new energy range over the next few months, new ground will be broken in our understanding of the nature of matter and space-time, and in the search for new particles. In order to fully exploit the potential for scientific discoveries during the next year, more than 100 petabytes (10^17 bytes) of data will be processed, distributed, and analyzed using a global grid of 300 computing and storage facilities located at laboratories and universities around the world, rising to the exabyte range (10^18 bytes) during the following years.
The key to discovery is the analysis phase, where individual physicists and small groups located at sites around the world repeatedly access, and sometimes extract and transport, multi-terabyte data sets on demand from petabyte data stores in order to optimally select the rare "signals" of new physics from the potentially overwhelming "backgrounds" from already-understood particle interactions. The HEP team hopes that the demonstrations at SC09 will pave the way toward more effective distribution and use for discoveries of the masses of LHC data.
The demonstration and the developments leading up to the SC09 Bandwidth Challenge were made possible through the support of the partner network organizations mentioned, the National Science Foundation (NSF), the U.S. Department of Energy (DOE) Office of Science, and the funding agencies of the HEP team's international partners, as well as the U.S. LHC Research Program funded jointly by DOE and NSF.
Further information about the demonstration may be found at http://supercomputing.caltech.edu.
-----
Source: Caltech
In quieter times, sounding the bell of funding big science with big systems tends to resonate further than when ears are already burning with sour economic and national security news. For exascale's future, however, the time could be ripe to instill some sense of urgency....
Read more...
In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Read more...
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
Read more...
May 23, 2013 |
The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
Read more...
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
Read more...
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.