Want to Snapshot Your Supercomputer?
Nimbix, Inc., one of the original HPC cloud vendors, sought to usher in a new era of heterogeneous cloud computing with the unveiling of its JARVICE platform in November. The platform-as-a-service offering uses high performance cloud hardware to create Nimbix Application Environments (NAEs) for high throughput batch processing. Once the environments are created and deployed to JARVICE, runtimes can be executed in the Nimbix Accelerated Compute Cloud (NACC).
Nimbix has always emphasized heterogenous computing through acceleration hardware such as the latest NVIDIA GPUs, Intel Xeon Phi coprocessors, Texas Instruments DSPs, and FPGAs, and JARVICE puts this technology to work combining the benefits of cloud with bare-metal performance.
Although the platform debuted several months ago, Nimbix Chief Executive Officer Steve Hebert presents a rather interesting use case in a recent blog entry: snapshotting a supercomputer.
“What if you could quickly build a Hadoop cluster in the cloud, and then snapshot it for later use on demand?” inquires Hebert.
Snapshotting has become a feature of many cloud platforms, allowing users to save the current state of a virtual machine and then revert to that snapshot at any point in the future. The mainstreaming of HPC had resulted to experiments with running compute- and/or data-intensive workloads on VMs. As Nimbix engineers added features to JARVICE, they came upon the idea of snapshotting a supercomputer.
“While the functionality is novel, what’s the benefit and use case?” asks Hebert. “Well, imagine that you are a student or post-doctoral researcher who needs access to a certain class of supercomputing resources to get your work done? I know for some, grant proposals have to be written or budgets have to be scraped to pull together actual hardware to build the supercomputer. I recall my brother’s work as a post-doctoral chemical oceanographer from Texas A&M. He literally had to build his computing environment, which took him several weeks of working with hardware suppliers, getting machine specs, allocating funds, and building the environment before he could even start his science.”
While some users with cloud-suitable HPC applications could submit their jobs to a public cloud platform, JARVICE would enable this hypothetical researcher to construct a cloud supercomputer in minutes, writes Hebert, and that includes elements like GPUs and InfiniBand.
Once the machine is provisioned, the user can customize the environment by selecting their workload management software, applications and other management tools. At this point, the head node’s Nimbix Application Environment (NAE) can be saved (i.e., snapshotted) for provisioning at a later time or to make a clone supercomputer.
Hebert further details how he created a 4-node Hadoop cluster in JARVICE and even though he’s not a Hadoop cluster expert he was “amazed” at how quick and easy it was to provision and set up – and snapshot.
He writes: “After running the default benchmark, I found that with RDMA enabled, it ran almost 2x faster versus TCP. When I was finished, since I wasn’t going to come back to it for a few days, I ran a snapshot and simply terminated the Hadoop cluster with a mouse click and it was deprovisioned. I can now re-launch it later at any time for further benchmarking activities. Pretty cool!”