April 20, 2010

HubZero and Cloud Forecasting at Purdue University

Nicole Hemsoth

On Thursday of this past week, I took a day out of the “office” (which means that I actually had to wear something pretty and meet real live people) and went scampering off to Purdue University shortly after the announcement that the university  had a new offering based on the existing nanoHub coolness, which has been dubbed HUBzero.

While my scampering abilities were somewhat hindered by poor choice of shoes, I was nonetheless able to meet with some interesting folks, including William Gerry McCartney, CIO and Vice President of Information Technology at Purdue University; Michael McLennan, Senior Research Scientist and hub technology architect; as well as Associate VP of the Rosen Center for Advanced Computing, Mr. John P. Campbell. I was graciously received by Steve Tally, iTaP Communications Manager (and avid reader of HPCwire) who also shared information about Purdue’s desire to reach to the clouds. Sometime soon, anyway.

While initial intent for the visit was to learn more about the HUBzero project, it naturally led to a series of questions that were related to the grid to cloud transition and how this is happening at similarly-sized universities who are not recognized as supercomputing superheroes. In fact, while there is a great deal of information about the hub concept and its application across a university that came along with me in recorded form, when I listened to the interviews I conducted during the long drive back, I was struck by the fact that many universities are relying on their own “power in numbers” paradigms to set up and allocate along both gridlines and cloud objectives. Turns out, these projects are getting overwhelming support from research communities across disciplines—and for good reason.

From nanoHub to HUBzero

nanoHUB is an international web-based gateways for nanotechnology-related simulation tools, instructional materials, and real-time networking for the community it serves and is fed by. Users can access these advanced simulation and other tools directly from their browsers and tap into the power of the distributed network at Purdue in addition to the Open Science Grid and TeraGrid.
With the success and usefulness of the nanoHUB project, it became clear that this same model could be extended to serve various disciplines, all with their own simulation, modeling, communication, and networking needs based on research requirements and objectives. Accordingly, on April 14, HUBzero launched in open source format as a template for future nanoHUB-like projects.

With Great Infrastructure Comes Great Responsibility

HUBzero’s name is particularly well-suited to what it is in raw form. It’s a hub—all ready-made and eager to serve, but there is nothing in it outside of what one community decides to inject. All of the basic infrastructure that currently serves the nanoHUB community is delivered, including the ability to create and utilize social networks of researchers in a particular field, the necessary middleware that provides the opportunity to tap into major centers and grid infrastructure as well as what it takes to handle the incoming data, tools, kits, and other contributions that delivers research codes onto as many screens as possible.

As the release notes, “a HUBzero-powered site presents a polished, organized collection of tools and resources. Under the hood, powerful middleware serves up interactive simulation sessions that display results from the HUBzero rendering farm and grid computing resources.” The HUBzero project is what Michael McLennan so very aptly called a “hub in a box” for these reasons. While the basic functionality is clear (and proven in the context of nanoHUB) we were able to talk about someof the murky issues that are present in such an undertaking.

With Purdue University Technology Personal Security Aide (PUTPSA), Xena, Warrior Princess, looming over McLennan’s shoulder ensuring informational quality control, I was able to glean some details about the concept behind the HUBzero project and what changes might occur in the future as the university looks at the cloud to extend capabilities.

As you might imagine, security is one of the first problems cited with extending this outward where the resources can become limitless. McLennan stated, “typically, you’ll have a website and you’ll have a few developers that are putting the capability into the website to broadcast to the world. In this project, however, we have the framework where hundreds of developers who we don’t know and might not be able to trust can get in and contribute their modeling codes. In order to pull this trick off we have to be able to execute those modeling codes in a secure sandbox environment and the users we don’t know and trust, when running the code, can’t do any damage. So we have this model where we can engage many more people in the community than we might have traditionally been able to have.

Hubs for All… Supercomputers for Some

The hub concept, along with all of the other grid-based resources at Purdue and other universities without their own ultra-Hal (DynaGrid, for example) is what has ultimately been propelling research initiatives since funding is (surprise) no easy task.

Purdue CIO Gerry McCartneyAs Purdue CIO, William Gerry McCartney said, “The NSF funds individual researchers and big national centers. So campuses are kind of on their own. So we think to where we were in the 80s or 90s—we were on a quest find big money and buy a supercomputer. This was millions of dollars, of course, and by now people have kind of given up—most campuses just can’t afford such a resource. So one of the things we did was say, “we’re not a national center—we understand that but all these faculty have sort of built these little data centers.”

So, like other universities without a major center, Purdue is making the most of the grid but with careful eyes on the cloud. They see the possibilities of extending projects like HUBzero outward and skyward, but for now are relying on the power of current infrastructure until the possibilities of the grid are outpaced by rapid adoption of programs such as this one.

Wait, Aren’t We Supposed to be Talking About Clouds Here?

So yes, I agree, it’s not technically full-on cloud. Yet. But it’s a great start in a university setting. Furthermore, it is reliant on the combined efforts of developers and researchers from all over the world. As the program grows, the resource needs will follow suit but until that time, there are, of course, several issues to sort out.

One of the reasons why we need watch carefully for news on the university community-driven front is to see how these efforts of researchers from all over the world are being synthesized. With that established (use the nanoHUB as an example here) the next important question becomes how to most effectively provision resources as these grow from the grid.

Many thanks to Purdue University staff for welcoming me on short notice and giving me some bottled water without even making me ask. Also, many thanks to the sadists at Nine West for the eight screaming blisters on the tops of my feet.

Share This