April 02, 2009
ARGONNE, Ill., April 2 -- The advantages of cloud computing were dramatically illustrated last week by researchers working on the STAR nuclear physics experiment at Brookhaven National Laboratory's Relativistic Heavy-Ion Collider. New simulation results were needed for presentation at the Quark Matter physics conference; but all the computational resources were either committed to other tasks or did not support the environment needed for STAR computations. Fortunately, working with technology developed by the Nimbus team at the U.S. Department of Energy's (DOE) Argonne National Laboratory, the STAR researchers were able to dynamically provision virtual clusters on commercial cloud computers and run the additional computations just in time.
Nimbus is an open source cloud computing infrastructure that provides tools allowing users to deploy virtual machines on resources, similar to Amazon's EC2, as well as user-level tools such as the Nimbus Context Broker that combines several deployed virtual machines into "turnkey" virtual clusters.
The Nimbus team at Argonne has been collaborating with STAR researchers at Brookhaven's Relativistic Heavy Ion Collider for a few years. Both research groups are supported by DOE's Office of Science.
"The benefits of virtualization were clear to us early on," said Jerome Lauret, software and computing project leader for the STAR experiment. "We can configure the virtual machine image exactly to our needs and have a fully validated experimental software stack ready for use." The image can then be overlaid on top of remote resources using infrastructure such as Nimbus.
With cloud computing, Lauret said, a 100-node STAR cluster can be online in minutes. In contrast, Grid resources available at sites not expressly dedicated to STAR can take months to configure.
The STAR scientists initially developed and deployed their virtual machines on a small Nimbus cloud configured at the University of Chicago. Then they used the Nimbus Context Broker to configure the customized cloud into Grid clusters which served as platform for remote job submission using existing Grid tools. However, these resources soon proved insufficient to support STAR production runs.
"A typical production run will require on the order of 100 nodes for a week or more," said Lauret.
To meet these needs, the Argonne Nimbus team turned to Amazon EC2. A Nimbus gateway was developed to allow scientists to easily move between the small Nimbus cloud and Amazon EC2.
"In the early days, the gateway served as a protocol adapter as well," said Kate Keahey, the lead of the Nimbus project. "But eventually we found it easier to simply adapt Nimbus to be protocol-interoperable with EC2 so that the scientists could move their virtual machines between the University of Chicago cloud and Amazon easily."
Over the past year, the STAR experiment in collaboration with the Nimbus team successfully conducted a few noncritical runs and performance evaluations on EC2. The results were encouraging. When the last-minute production request came for new simulations, the STAR researchers had virtual machine images ready to go.
"It was a textbook case of EC2 usage," said Keahey. "The overloaded STAR resources were elastically 'extended' by additional virtual clusters deployed on EC2." The run used more than 300 virtual nodes at a time, using the default EC2 instances at first and moving on to the high-CPU medium EC2 instances later to speed the calculations.
Using cloud resources to generate last-minute results for the Quark Matter conference demonstrates that the use of cloud resources for science has moved beyond "testing waters" and into real production. According to Keahey, virtualization and cloud computing provide an ideal platform for resource sharing.
"One day a provider could be running STAR images, and the next day it could be climate calculations in entirely different images, with little or no effort," said Keahey. "With Nimbus, a virtual cluster can be online in minutes."
For more information on Nimbus, visit http://workspace.globus.org/.
For more information on STAR, visit http://www.star.bnl.gov/.
About Argonne National Laboratory
Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America 's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.
-----
Source: Argonne National Laboratory
In quieter times, sounding the bell of funding big science with big systems tends to resonate further than when ears are already burning with sour economic and national security news. For exascale's future, however, the time could be ripe to instill some sense of urgency....
Read more...
In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Read more...
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
Read more...
May 23, 2013 |
The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
Read more...
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
Read more...
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.