September 5, 2013

ANL Hosts HPC and Cloud Community Workshop

Tiffany Trader

Last week, 30 stakeholders from research and industry came together to explore the potential for cloud-based cyber-infrastructure to support existing and emerging use cases in a range of research disciplines. Hosted by Argonne National Labs and jointly sponsored by Notre Dame University, Internet2 and Rackspace, the event drew participants from major research organizations, top universities and their industry partners.

In a comprehensive summary of the workshop, Rackspace Vice President of Private Cloud Certification Programs Paul Rad discusses how open standards and ease of collaboration propel the research community forward. “When hundreds of researchers contribute to a shared purpose and solve a shared problem in open and transparent ways, everyone benefits,” writes Rad.

The meeting reveals a research community that is keen to have a conversation about cloud-based technical computing. Rad makes the point that while big data and high-performance computing are game-changing, they also bring new challenges. Researchers often face long wait times on oversubscribed machines. And even after waiting months to get approved, the allocation time may not be sufficient to the workload. Cloud computing offers an alternative way to obtain resources on-demand, utility-style.

Rad, who was also an event organizer, presented some of the main points that emerged during the first half of the day:

  • Public clouds were not designed to address the requirements of research communities.

  • Public clouds appear inexpensive until you factor in the costs of networking and I/O. Consequently, technical computing often requires hybrid and community clouds.
  • Provisioning a private cloud for average loads and bursting for peak loads provides the most economic model, but access to data sets can be critical for big data workflows, e.g. in high-energy physics.
  • By working together as community, we can scale up more appropriately. e.g. a community-owned open cloud might include a number of federated universities and research labs.
  • Within five to 10 years, large super computer systems will converge with cloud provider systems.

Some of these concepts showed up again in the afternoon panels. The “Cloud Best Practices” session included an overview of three case studies:

Case Study No. 1: Lessons Learned Running a Technical Cloud – Narayan Desai (Argonne National Lab)

Case Study No. 2: Bridging campus, lab, and commercial research infrastructure with an open cloud for high energy physics – Dr. Paul Brenner (Notre Dame University)

Case Study No. 3: OpenStack-based High Performance Cloud – Dr. Rajendra Boppana (UT, San Antonio)

The group also came up with some ideas for incubation projects with two being selected for immediate action:

  • Big Data Reachback for Cloud Bursting Scientific Applications such as high energy physics led by Notre Dame, Internet2, Rackspace, UTSA, MIT and Cycle Computing

  • Big Data Scale out storage architecture led by Argonne, University of Chicago and Nimbus Services

Work on these projects will be a continuing collaborative effort. There are tentative plans in place for the groups to meet up at Supercomputing 2013 and WCSC 2013 in San Antonio, Texas.

One of the stronger messages that came from the event was an appreciation for open communication and collaboration. In the video wrap-up, Narayan Desai, Principle Systems Engineer at Argonne National Labs, echos this sentiment.

“It really resonated with me,” said Desai. “There’s not a large community that’s already formed around this topic. While there’ve been a lot of parallelized conversations, it seems like everyone’s been working by themselves. What really excited me about this meeting was the potential to crystalize a community around the idea of building clouds for technical workloads.”

For more information on this important meeting, Internet2’s Todd Sedmak provides a concise writeup of the major outcomes and action items. One of these steps will be developing a plan/charter for a continuing collaborative effort.

