Improving HPC access to the so-called long tail of science is an ongoing NSF priority. Several initiatives, funded at least in part by NSF, are underway and one – Jetstream – is the first NSF-funded HPC Cloud targeted directly at domain scientists and engineers who typically have limited access to HPC resources and limited expertise.
“This is the first cloud for science and engineering across all areas of NSF. If that sounds a little nuanced, that’s because it is,” said David Hancock, Senior Manager, Research Technologies, Indiana University. “There are numerous other cloud projects (e.g. Chameleon and CloudLab), but they have been focused test-beds for computer science or other audiences.” Hancock provided an update on Jetstream and its progress at the HPC User Forum last month in Norfolk, VA.
Jetstream–led by Indiana University’s Pervasive Technology Institute (PTI)–will add cloud-based computation to the national cyberinfrastructure. Researchers will be able to create virtual machines on the remote resource that look and feel like their lab workstation or home machine, but are able to harness thousands of times the computing power.
“Jetstream is focused on a production environment and providing a different level of usability. We hope this environment lies at the border and intersection of existing users that are well-served by HPC and advanced cyberinfrastructure provided by NSF programs and users that are not yet part of XSEDE.”
Here’s Hancock’s brief description of Jetstream:
- NSF’s first cloud for science and engineering research across all areas of activity supported by the NSF!
- A user-friendly cloud environment designed to give researchers access to self-provisioned interactive computing and data analysis resources!
- Globus for data movement and authorization!
- User-selectable library of virtual machines that researchers can customize!
- A geographically distributed environment; leveraging Internet2 and XSEDE resources!
Rollout plans are aggressive with test gear installation and development now underway now and production gear installation scheduled throughout the summer. Early ‘friendly user” access is expected by SC2015 and full production operation in January 2016.
As planned, Jetstream is a multi-site project spread between IU, University of Texas (TACC), and University of Arizona – details are shown in the diagram below.
Hancock says dominant use will fall into three areas:
- Researcher needing a handful of cores TODAY rather than thousands next week.
- Software creators and researchers needing to create their own customized virtual machines.
- As a backend supporting science gateways.
A diverse range of science domains and use cases are expected to use Jetstream. Hancock cited several: biology; earth science/polar science; field station research; geographical information systems; network Science; observational astronomy; and climate
Software, particularly ensuring ease-of-use, is a major challenge. “You don’t want to offer 36 buttons and just say go configure your VLAN. That’s not something most domain scientists want to undertake. We feel we can simplify requirements in this project,” says Hancock
According to Hancock, software layers will include: Atmosphere interface; OpenStack; KVM; CentOS Linux. The idea is to use tools currently in wide use and provide easy-to-use interfaces. Atmosphere, for example, is now used by the iPlant Collaborative and has 300 or so VMs in the environment. Not surprisingly, they are bio-centric but the expectation is more VMs from many disciplines will be added over time.
Announced last November, the initial Jetstream grant was for $6.6 million. Another roughly $5 million is expected over five years, making it the largest grant IU has ever received to deliver computational and data storage services to the national research community.
The University of Chicago is providing the project with data tools and integrating federated authentication with the existing environment. Johns Hopkins University is another project partner providing its expertise and deep connection into the Galaxy community. In the maintenance and operation Hancock indicated more funded partners would be brought on board targeting activities such as providing virtual workshops.
In formulating the proposal, ”We examined the XEDE Cloud Report of 2013 and we used Jetstream to target 10 of the 12 uses case in the report,” said Hancock
“I think between looking at Wrangler and Bridges and Jetstream, we can triangulate some of these communities. There’s good overlap. The Hadoop usage that doesn’t fit well on Jetstream may be a great fit for Wrangler or for Bridges. Some of the more interactive VM use at a different scale than what is supported on Wrangler will be well supported on Jetstream.”
Here is a link to a video of Hancock’s presentation, which is 15 min long, https://www.youtube.com/watch?v=YEyjFpvGwo0&feature=youtu.be