HPCwire

The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing

HPCwire >> Special Features >> HPC in the Cloud >> HPC in the Cloud News

DOE Labs to Build Science Clouds


Like many organizations that rely on industrial-strength datacenters, the US Department of Energy (DOE) would like to know if cloud computing can make its life easier. To answer that question, the DOE is launching a $32 million program to study how scientific codes can make use of cloud technology. Called Magellan, the program will be funded by the American Recovery and Reinvestment Act (ARRA), with the money to be split equally between the the two DOE centers that will be conducting the work: the Argonne Leadership Computing Facility (ALCF) at Argonne National Laboratory and the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory.

One of the major questions the study hopes to answer is how well the DOE's mid-range scientific workloads match up with various cloud architectures and how those architectures could be optimized for HPC applications. Today most public clouds lack the network performance, as well as CPU and memory capacities to handle many HPC codes. The software environment in public clouds also can be at odds with HPC, since little effort has been made to optimize computational performance at the application level. Purpose-built HPC clouds may be the answer, and much of the Magellan effort will be focused on developing these private "science clouds."

The bigger question, though, is to find out if the cloud model in general is applicable to high performance computing applications used at DOE labs and can offer a cost-effective and flexible approach for researchers. According to ALCF director Pete Beckman, that means getting the best science for the dollar. In a cloud architecture, the virtualization of resources usually translates into better utilization of hardware. In the HPC realm though, virtualization can be a performance killer and utilization is often not the big problem it is in commercial datacenters where hardware is typically undersubscribed. Perhaps of greater interest for HPC users is the ability to fast-track application deployment by taking advantage of the cloud's ability to encapsulate complete software environments.

"There are a lot users who spend time developing there own software inside their own software stack," says Beckman. "Getting those running on traditional supercomputers can be quite challenging. In the cloud model, sometimes these people find it easier to bring their software stack with them. That can broaden the community."

The entire range of DOE scientific codes will be looked at, including energy research, climate modeling, bioinformatics, physics codes, applied math, and computer science research. But the focus will be on those codes that are typically run on HPC capacity clusters, which represent much of the computing infrastructure at DOE labs today. In general, codes that require capability supercomputers such as the Cray XT and the IBM Blue Gene are not considered candidates for cloud environments. This is mainly because large-scale supercomputing apps tend to be tightly coupled, relying on high speed inter-node communication and a non-virtualized software stack for maximum performance.

Most of the program's $32 million will, in fact, be spent on new cluster systems, which will form the testbed for Magellan. According to NERSC director Kathy Yelick, the cluster hardware will be fairly generic HPC systems, based on Intel Nehalem CPUs and InfiniBand technology. Total compute performance across both sites will be on the order of 100 teraflops. Yelick says there will also be a storage cloud, with a little over a petabyte of capacity. In addition, flash memory technology will be used to optimize performance for data-intensive applications. The NERSC and ALCF clusters will be linked via ESnet, the DOE's cutting-edge 100 Gbps network. ESnet was also a recipient of ARRA funding, and will be used to facilitate super-speed data transfers between the two sites.

One of the challenges in building a private cloud today is the lack of software standards. However, the Magellan work will employ some of the more popular frameworks that have emerged from the cloud community. Argonne, for instance, will experiment with the Eucalyptus toolkit, an open-source package that is compatible with Amazon Web Services API. The idea is to be able to build a private cloud with the same interface as Amazon EC2.

Apache's Hadoop and Google's MapReduce, two related software frameworks that deal with large distributed datasets, will also be evaluated. Like Eucalyptus, Hadoop and MapReduce grew up outside of the HPC world, so currently there's not much support for them at traditional supercomputing centers. But the notion of large distributed data sets is a feature of many data-intensive scientific codes and is a natural fit for cloud-style computing.

The other aspect of the Magellan effort has to do with experimentation of commercial cloud offerings, such as those from Amazon, Google, and Microsoft. Public clouds, in particular, are attracting a lot of interest due to their ability to offer virtually infinite capacity and elasticity. (Private clouds, because of their smaller size, tend to be seen as fixed resources.) Just as important to the DOE, a public cloud has the allure of offloading the development and maintainence of local infrastructure to someone else.

"Will it be more cost effective for a commercial entity to run a cloud, and presumably make a profit on it, than for the DOE to run their own cloud?" asks Yelick. "That is going to be one of the questions most challenging to answer."

Some DOE researchers are already giving public clouds a whirl. Argonne's Jared Wilkening recently tested the feasibility of employing Amazon EC2 to run a metagenomics application (PDF). The BLAST-based code is a nice fit for cloud computing because there is little internal synchronization, therefore it doesn't rely on high performance interconnects. Nevertheless, the study's conclusion was that Amazon is significantly more expensive than locally-owned clusters, due mainly to EC2's inferior CPU hardware and the premium cost associated with on-demand access. Of course, given increased demand for compute-intensive workloads, that could change. Wilkening's paper was published in Cluster 2009, and slides (PDF) are available on the conference Web site.

The Magellan program is slated to run for two years, with the initial clusters expected to be installed sometime in the next few months. At NERSC, Yelick says the hardware could arrive as early as November, and become operational in December or January. Meanwhile at Argonne, Beckman is already running into researchers who can't wait to host their codes on the Magellan cloud. "They're lined up," he says. "They keep coming down to my office asking when it will be here and how soon they can log in."


HPCwire on Twitter

Article Tools

  • Print This Page
  • Bookmark This Article

Share Options

(Digg, Technorati, more)


Subscribe

Discussion

There are 0 discussion items posted.  

HPC in the Cloud Part 2
People to Watch 2010


Feature Articles

The Week in Review

C-DAC announces plans for a petaflop system; IBM researchers are working on vertical integration techniques to extend Moore's Law another 15 years. We recap those stories and more in our weekly wrapup.
Read More...

Moscow State University Supercomputer Has Petaflop Aspirations

The Moscow State University supercomputer, Lomonosov, has been selected for a high-performance makeover, with the goal of tripling its processing power to achieve petaflop-level performance in 2010. T-Platforms, who developed and manufactured the supercomputer, is the odds-on favorite to lead the project.
Read More...

Intel Ups Performance Ante with Westmere Server Chips

Right on schedule, Intel has launched its Xeon 5600 processors, codenamed "Westmere EP." The 5600 represents the 32nm sequel to the Xeon 5500 (Nehalem EP) for dual-socket servers. Intel is touting better performance and energy efficiency, along with new security features, as the big selling points of the new Xeons.
Read More...

Top Headlines

Intel Partners See 'Easy' Upgrade Path With Xeon 5600 Chips

Mar 18 | ChannelWeb | Westmere parts already showing up in HPC machines. Read more...

AMD: OEMs primed for Opteron 6100s

Mar 17 | The Register | But what about the tier ones? Read more...

Arrival of the Desktop Supercomputer

Mar 17 | Cadalyst Magazine | A new generation of workstations is changing the nature of technical computing. Read more...

Scheduling HPC In The Cloud

Mar 17 | Linux Magazine | Latest iteration of Sun Grid Engine able to tap into Cloud. Read more...

Tailoring Medicine with Supercomputers

Mar 16 | Bio-IT World | Biotech firm builds genetic models from patient data. Read more...

Featured Whitepapers

Virtualization for Aggregation And The vSMP Architecture™

Jan 12 | | In-depth look at vSMP Foundation server virtualization technology, technical implementation, use cases and capabilities. The technical whitepaper provides an architectural overview and details on the three vSMP Foundation products: vSMP Foundation for SMP, vSMP Foundation for Cluster and vSMP Foundation for Cloud.

Copper Cable Technologies for High Performance Computing

Jan 18 | | This white paper discusses Gore’s copper cable assemblies, and how they continue to exceed the standards for providing reliable, cost-effective solutions for high-performance computer applications.

Multimedia

Webcast: Virtualized Data Center Roundtable

Join this online panel discussion for live Q&A with leading industry experts, analysts, and end-users to discuss the latest innovations, best practices, barriers to implementation, and measurable benefits of server virtualization with a particular focus on today's real world solutions.

Webcast: Watch SC09 Birds of a Feather Video: Scalable Fault-Tolerant HPC Supercomputers

Learn about scalable fault-tolerant architectures and examples of energy efficient and scalable supercomputing clusters using dual QDR InfiniBand to combine capacity computing with network failover capabilities with the help of programming languages such as MPI and a robust Linux cluster management package.

Webcast: High Performance Computing for a Smarter Planet

LIVE@SCO9: The IBM team discusses new innovations in hardware, software and services that help clients better understand their workloads and get insight from their R&D efforts. Technology demonstrations include the soon-to-be-released Power7 HPC processor, the DCS990 system with 2.4 petabytes of storage, the xCAT management tool, secure HPC cloud computing and more. Winners of two HPCwire Readers' and Editors’ Choice Awards! Take the IBM virtual tour at SC09 or more information go online to: http://www-03.ibm.com/systems/deepcomputing/sc09.html

SC09 HPC in the Cloud

Newsletters

Stay informed! Subscribe to HPCwire email Newsletters.






HPC Job Bank


Featured Events

HPC User Forum DICE
2010 High Performance Computing Linux Financial Markets
Cloud Computing Expo
Cloud Lab
ESC
DEISA PRACE Symposium