Compute intensive workloads can prove taxing to an institution’s HPC resources. When applications require more horsepower, the process of upgrading an existing cluster or deploying a new system altogether can slow down research and productivity. There is another alternative, though. Cloud service providers offer scalable on-demand resources, giving organizations an anytime boost to their computational capacity.
Last week, Penguin Computing revealed partnerships between a number of universities and its cloud computing division. Penguin On Demand’s (POD) academic clouds make local and remote HPC resources available to these institutions. In the official announcement, Tom Coull, senior vice president and general manager of software and services at Penguin, explained that the model adds compute capacity while reducing upfront costs.
“Penguin Computing has traditionally been very successful with HPC deployments in academic environments with widely varying workloads, many departments competing for resources and very limited budgets for capital expenses,” stated Coull.
HPC in the Cloud caught up with Mr. Coull to discuss the POD service and Penguin’s partnerships with these institutions.
In each scenario, the university houses computing equipment, which is owned and operated by Penguin. How the resources are used determine the nature of the agreement. In some cases, cycles can be resold to outside users, creating a new revenue stream for IT departments.
The academic partnerships typically follow one of three models:
Channel Partnership
A channel partnership between academic institutions and Penguin essentially means the user becomes a distributor of the POD service. In this configuration, departments that need to obtain HPC resources can access Penguin’s virtual cycles on demand. Again, this is a “pay as you go” model, in which upfront capital costs are transformed into operational overhead. The university receives an umbrella invoice from Penguin and charges individual departments for their usage of compute cycles.
The agreement may seem exclusive to off-site infrastructures, but it has been in place at Cal Tech with local hardware for nearly two years. When workloads become too compute-intensive for the local cluster, Penguin allows system admins to burst jobs to a company datacenter in Salt Lake City. Coull explained how Cal Tech developed a software suite with Penguin called POD tools, which manages cloud resource interactions.
“The connection to the cloud is only a small part of the problem,” notes Coull. “To really make this useful, you have to be able to migrate data reliably and quickly. You have to be able to migrate applications [ISV software like MathWorks] if you want to run your own applications, and you have to be able to migrate job scripts.”
The management software is designed to securely handle workload migration, monitoring and reporting from behind a firewall. Tracking usage from the various departments can be complicated, so Penguin created custom portals to help universities keep up to date with user access.
Hybrid / Channel Model
This is the model currently in place at Indiana University (IU). The public-private partnership engenders a symbiotic relationship wherein Penguin provides and operates a cluster on campus. The university provides the facilities, power, cooling and the Internet connection for the system. In return for their contribution, the university gets to use cycles free of charge.
The system also works in the case of repackaging cycles to external federally-funded research and development centers (FFRDC). Coull gave an example where IU would be able to send cycles to another facility:
UC Berkeley could contact IU and say ‘hey, we’d like some cycles.’ We can turn that on literally with just a white list.
Hybrid Model
This configuration involves an on-site cluster using a prepaid model. When an institution deploys a system, they typically purchase a number of core hours. In most cases, those hours can be used over the next three years. Penguin also includes tools that enable cloud bursting if more compute power is needed. This is sometimes utilized in the early stages before the local cluster is fully deployed.
“What’s more common is that they’ll size their cluster based on an average workload and then they’ll let the peaks burst out to the cloud. That’s a nice model because it’s actually a little more cost effective to do that,” said Coull.
The menu of offerings gives academic institutions access to essentially infinite compute power in a way that best fits their needs.