Without a proper toolkit, running HPC applications and computations in the cloud can be a tedious exercise, especially for those who run those with relative infrequency.
StarCluster may help with that problem. StarCluster, according to Admin Magazine’s Gavin Burris, is a project developed by MIT’s Software Tools for Academics and Researchers team, hence the STAR. It caters to those in the scientific and researching fields and in particular those who wish to utilize clusters to perform computations but have not the tools in house to do so.
In order to get started on StarCluster, one must, according to Burris, have an Amazon Web Services (AWS) account, as the toolkit that codes in Python runs on Amazon’s Elastic Compute Cloud (EC2).
Burris proceeded to walk through the process of installing and configuring StarCluster before ultimately showing how a test computation would run. He used a fairly common test case, using a Monte Carlo simulation to approximate the value of pi.
Understanding how to set up these clusters and perform jobs and tasks on them on a case-by-case basis can be critical for system programmers who only intermittently require the use of an HPC cluster and thus have no use for an onsite cluster.
“The cloud has become a key resource in the support of HPC,” said Burris has he discussed the value of StarCluster within HPC in the cloud in his conclusion. “Given the proper use case, cloud offerings are an affordable fit for a variety of different workflows. A key tool in any systems programmer’s arsenal should be the StarCluster toolkit, which provides a powerful interface for harnessing these cloud resources in an effective manner.”
Burris espoused the notion of utilizing cloud-based high performance computing in general, noting that it allows programmers and administrators to build and develop custom ecosystems for researchers. “Cloud computing is the next level of abstraction, allowing for the programmable out-sourcing of the data center,” Burris mentioned in his endorsement of using HPC in the cloud. “What would traditionally be a locally managed room, full of physical hardware with a three- to five-year life cycle, situated within a managed facility that provides electricity and cooling, is now available through a programmable API.”
Other advantages according to Burris include outsourcing the task of chasing loose red lights and failed servers to a group in Amazon that specifically is trained for that.