Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them
March 24, 2009
ARGONNE, Ill., March 24 -- A novel system is enabling high energy physicists at CERN in Switzerland, to make production runs that integrate their existing pool of distributed computers with dynamic resources in "science clouds." The work was presented at the 17th annual conference on Computing in High Energy and Nuclear Physics, held in Prague, Czech Republic, March 21-27.
The integration was achieved by leveraging two mechanisms: the Nimbus Context Broker, developed by computer scientists at the U.S. Department of Energy's Argonne National Laboratory and the University of Chicago, and a portable software environment developed at CERN.
Scientists working on A Large Ion Collider Experiment, also known as the ALICE collaboration, are conducting heavy ion simulations at CERN. They have been developing and debugging compute jobs on a collection of internationally distributed resources, managed by a scheduler called AliEn.
Since researchers can always use additional resources, the question arose, How can one integrate a cloud's dynamically provisioned resources into an existing infrastructure such as the ALICE pool of computers, and still ensure that the various AliEn services have the same deployment-specific information? Artem Harutyunyan, sponsored by the Google Summer of Code to work on the Nimbus project, made this question the focus of his investigation. The first challenge was to develop a virtual machine that would support ALICE production computations.
"Fortunately, the CernVM project had developed a way to provide virtual machines that can be used as a base supporting the production environment for all four experiments at the Large Hadron Collider at CERN -- including ALICE," said Harutyunyan, a graduate student at State Engineering University of Armenia and member of Yerevan Physics Institute ALICE group. "Otherwise, developing an environment for production physics runs would be a complex and demanding task."
The CernVM technology was originally started with the intent of supplying portable development environments that scientists could run on their laptops and desktops. A variety of virtual image formats are now supported, including the Xen images used by the Amazon EC2 as well as Science Clouds. The challenge for Harutyunyan was to find a way to deploy these images so that they would dynamically and securely register with the AliEn scheduler and thus join the ALICE resource pool.
Here the Nimbus Context Broker came into play. The broker allows a user to securely provide context-specific information to a virtual machine deployed on remote resources. It places minimal compatibility requirements on the cloud provider and can orchestrate information exchange across many providers.
"Commercial cloud providers such as EC2 allow users to deploy groups of unconnected virtual machines, whereas scientists typically need a ready-to-use cluster whose nodes share a common configuration and security context. The Nimbus Context Broker bridges that gap," said Kate Keahey, a computer scientist at Argonne and head of the Nimbus project.
Integration of the Nimbus Context Broker with the CernVM technology has proved a success. The new system dynamically deploys a virtual machine on the Nimbus cloud at the University of Chicago, which then joins the ALICE computer pool so that jobs can be scheduled on it. Moreover, with the addition of a queue sensor that deploys and terminates virtual machines based on demand, the researchers can experiment with ways to balance the cost of the additional resources against the need for them as evidenced by jobs in a queue.
According to Keahey, one of the most exciting achievements of the project was the fact that the work was accomplished by integrating cloud computing into the existing mechanisms. "We didn't need to change the users' perception of the system," Keahey said.
Page: 1 of 2(Digg, Technorati, more)
The National Science Foundation has awarded funding to four projects as part of the Future Internet Architecture program; and the 3PAR bidding war is won by HP. We recap those stories and more in our weekly wrapup.
Read More...
Intel Corp has released Parallel Studio 2011, a set of four tools designed to mainstream software development on multicore x86 architectures. The update folds in a number of parallel programming technologies that the company has acquired or developed independently over the past few years, including the Cilk Arts and RapidMind technologies, and Intel's own Ct data parallel language framework.
Read More...
There's nothing like a blazing hot summer to focus one's attention on the best ways to keep cool. That goes for datacenter operators as well, who are equally worried about keeping their servers properly chilled. While there is no shortage of innovative cooling solutions being proffered by various vendors, a new liquid immersion cooling solution from startup Green Revolution Cooling could end up being the best of them all.
Read More...
Sep 03 | Should engineers take advantage of GPU computing? Read more...
Sep 02 | Could see first products in three years. Read more...
Sep 01 | A hand-picked selection of video presentations from the TED conference -- because the next big thing has to start somewhere. Read more...
Aug 30 | CERN project adapts its computation and storage strategy as hardware gets cheaper and better. Read more...
Aug 26 | Chinese-made chip adds vector SIMD unit; delivers 128 gigaflops in 40 watts. Read more...
Jul 29 | | Panasas storage solutions deliver high throughput with many concurrent backup IO streams to standard backup applications such as Veritas NetBackup™ or EMC® NetWorker™. Download this whitepaper to understand the essential elements for effective backup and restore: the tape subsystem, networking, file system workload and administrative policy.
Jul 28 | | As compelling economics and performance drive GPUs into HPC clusters, developers are scrambling to catch up. Download this whitepaper from Platform Computing to understand how to capture the benefits of exciting new GPU capabilities.
In this webinar you will hear about the current storage challenges facing the HPC community, how Panasas storage solutions provide exceptional performance, scalability, and manageability, and how you can achieve the lowest total Cost of Ownership with a system that installs and configures in 15 minutes.
Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.
Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.