HPCwire

Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them

HPCwire >> Topic >> Systems

Nimbus and Cloud Computing Meet STAR Production Demands


ARGONNE, Ill., April 2 -- The advantages of cloud computing were dramatically illustrated last week by researchers working on the STAR nuclear physics experiment at Brookhaven National Laboratory's Relativistic Heavy-Ion Collider. New simulation results were needed for presentation at the Quark Matter physics conference; but all the computational resources were either committed to other tasks or did not support the environment needed for STAR computations. Fortunately, working with technology developed by the Nimbus team at the U.S. Department of Energy's (DOE) Argonne National Laboratory, the STAR researchers were able to dynamically provision virtual clusters on commercial cloud computers and run the additional computations just in time.

Nimbus is an open source cloud computing infrastructure that provides tools allowing users to deploy virtual machines on resources, similar to Amazon's EC2, as well as user-level tools such as the Nimbus Context Broker that combines several deployed virtual machines into "turnkey" virtual clusters.

The Nimbus team at Argonne has been collaborating with STAR researchers at Brookhaven's Relativistic Heavy Ion Collider for a few years. Both research groups are supported by DOE's Office of Science.

"The benefits of virtualization were clear to us early on," said Jerome Lauret, software and computing project leader for the STAR experiment. "We can configure the virtual machine image exactly to our needs and have a fully validated experimental software stack ready for use." The image can then be overlaid on top of remote resources using infrastructure such as Nimbus.

With cloud computing, Lauret said, a 100-node STAR cluster can be online in minutes. In contrast, Grid resources available at sites not expressly dedicated to STAR can take months to configure.

The STAR scientists initially developed and deployed their virtual machines on a small Nimbus cloud configured at the University of Chicago. Then they used the Nimbus Context Broker to configure the customized cloud into Grid clusters which served as platform for remote job submission using existing Grid tools. However, these resources soon proved insufficient to support STAR production runs.

"A typical production run will require on the order of 100 nodes for a week or more," said Lauret.

To meet these needs, the Argonne Nimbus team turned to Amazon EC2. A Nimbus gateway was developed to allow scientists to easily move between the small Nimbus cloud and Amazon EC2.

"In the early days, the gateway served as a protocol adapter as well," said Kate Keahey, the lead of the Nimbus project. "But eventually we found it easier to simply adapt Nimbus to be protocol-interoperable with EC2 so that the scientists could move their virtual machines between the University of Chicago cloud and Amazon easily."

Over the past year, the STAR experiment in collaboration with the Nimbus team successfully conducted a few noncritical runs and performance evaluations on EC2. The results were encouraging. When the last-minute production request came for new simulations, the STAR researchers had virtual machine images ready to go.

"It was a textbook case of EC2 usage," said Keahey. "The overloaded STAR resources were elastically 'extended' by additional virtual clusters deployed on EC2." The run used more than 300 virtual nodes at a time, using the default EC2 instances at first and moving on to the high-CPU medium EC2 instances later to speed the calculations.

Using cloud resources to generate last-minute results for the Quark Matter conference demonstrates that the use of cloud resources for science has moved beyond "testing waters" and into real production. According to Keahey, virtualization and cloud computing provide an ideal platform for resource sharing.

"One day a provider could be running STAR images, and the next day it could be climate calculations in entirely different images, with little or no effort," said Keahey. "With Nimbus, a virtual cluster can be online in minutes."

For more information on Nimbus, visit http://workspace.globus.org/.

For more information on STAR, visit http://www.star.bnl.gov/.

About Argonne National Laboratory

Argonne National Laboratory seeks solutions to pressing national problems in science and technology. The nation's first national laboratory, Argonne conducts leading-edge basic and applied scientific research in virtually every scientific discipline. Argonne researchers work closely with researchers from hundreds of companies, universities, and federal, state and municipal agencies to help them solve their specific problems, advance America 's scientific leadership and prepare the nation for a better future. With employees from more than 60 nations, Argonne is managed by UChicago Argonne, LLC for the U.S. Department of Energy's Office of Science.

-----

Source: Argonne National Laboratory


HPCwire on Twitter

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  

HPC in the Cloud Part 2
People to Watch 2010


Feature Articles

The Week in Review

The National Science Foundation has awarded funding to four projects as part of the Future Internet Architecture program; and the 3PAR bidding war is won by HP. We recap those stories and more in our weekly wrapup.
Read More...

Intel Flexes Parallel Programming Muscles

Intel Corp has released Parallel Studio 2011, a set of four tools designed to mainstream software development on multicore x86 architectures. The update folds in a number of parallel programming technologies that the company has acquired or developed independently over the past few years, including the Cilk Arts and RapidMind technologies, and Intel's own Ct data parallel language framework.
Read More...

Startup Makes Liquid Cooling an Immersive Experience

There's nothing like a blazing hot summer to focus one's attention on the best ways to keep cool. That goes for datacenter operators as well, who are equally worried about keeping their servers properly chilled. While there is no shortage of innovative cooling solutions being proffered by various vendors, a new liquid immersion cooling solution from startup Green Revolution Cooling could end up being the best of them all.
Read More...

Around the Web

Picking the Right Processor

Sep 03 | Should engineers take advantage of GPU computing? Read more...

HP, Hynix Start Memristor on Path to Commercialization

Sep 02 | Could see first products in three years. Read more...

TED Talks for the IT Crowd

Sep 01 | A hand-picked selection of video presentations from the TED conference -- because the next big thing has to start somewhere. Read more...

LHC Compute Grid Teaches Some Valuble Lessons

Aug 30 | CERN project adapts its computation and storage strategy as hardware gets cheaper and better. Read more...

Godson CPUs Groomed for Supercomputing Duty

Aug 26 | Chinese-made chip adds vector SIMD unit; delivers 128 gigaflops in 40 watts. Read more...

Featured Whitepapers

Effective Backup and Restore

Jul 29 | | Panasas storage solutions deliver high throughput with many concurrent backup IO streams to standard backup applications such as Veritas NetBackup™ or EMC® NetWorker™. Download this whitepaper to understand the essential elements for effective backup and restore: the tape subsystem, networking, file system workload and administrative policy.

GPU Cluster Realities Whitepaper from Platform Computing

Jul 28 | | As compelling economics and performance drive GPUs into HPC clusters, developers are scrambling to catch up. Download this whitepaper from Platform Computing to understand how to capture the benefits of exciting new GPU capabilities.

Multimedia

Webcast: Are you drowning in data?

In this webinar you will hear about the current storage challenges facing the HPC community, how Panasas storage solutions provide exceptional performance, scalability, and manageability, and how you can achieve the lowest total Cost of Ownership with a system that installs and configures in 15 minutes.

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

ISC'10 HPC in the Cloud

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events

High Performance Computing Financial Markets
Frontiers of Multi-Core Computing
The 9th USENIX Symposium on Operating Systems Design and Implementation (OSDI '10)
Harvard Biomedical HPC Leadership Summit 2010
eResearch Australasia 2010
SC10
  • November 13-19, 2010
    SC10
    New Orleans , LA
    USA