Adaptive Computing, the company that powers many of the world’s largest technical computing environments with its Moab optimization and scheduling software, was among the many HPC-oriented vendors assembled at SC13 in Denver.
In preparation for the show, Adaptive’s Chief Solutions Architect Daniel Hardman prepared to demo a new technology, called Moab Task Manager, that was launching during SC. In order to run this demonstration, he required an HPC cluster, but shipping the gear across state lines would be a bit unpractical — especially if there were an easier, and perhaps less-expensive, solution.
Thus begins Hardman’s exploits with building a cluster in the Amazon cloud, which he writes about on the company’s official blog page.
Cloud-based HPC is growing more and more common, as cloud vendors like Amazon, Google and Microsoft introduce more instances aimed at satisfying the unique requirements of HPC customers. As a proof point, utility supercomputer specialists Cycle Computing recently created a 156,314-core Amazon Web Services cluster, totaling 1.21 petaflops of aggregate peak compute power, to help advance materials science.
Hardman wondered how difficult it would be to set up an AWS cloud to run his demo project on the show floor. Was this something that could be done with relative ease?
“Well, now I know,” writes Hardman. “The answer is ‘yes.’ It was amazingly easy.”
His first step was enlisting a devops expert to pick an appropriate AMI – one that “could be easily puppet-ized and managed with standard IT tools.” He then selected Amazon’s c1.medium instance type. The devops specialist built a .deb to configure the node. A short bash script enabled the conversion of generic AMIs into instances with all the necessary HPC apps pre-installed.
Hardman spun up a test instance, and experimented with it for an hour or so. The next part is where the magic happens:
“When I right-clicked my sample instance in the AWS console, and chose ‘more instances like this one,’ I found myself in a wizard where I could request 5 new machines, or 50, or 5000,” writes Hardman.
There was one sticking point: connecting the nodes in the cluster once they’d been turned on. He tried Amazon’s VPC feature, but couldn’t get it to work. He thinks he might have arrived at a solution given enough time and effort, but he ended up going with the default networking scheme, which was good enough. The only other work was writing a script to grab a list of host names, allowing the scheduler to send instructions, receive reports, and attribute work correctly.
The Adaptive rep says the experience left him more certain than ever that “HPC-cloud convergence is the wave of the future.” He doesn’t say how much it cost to rent the cluster and how that compares to the effort and expense of shipping a cluster to the SC show floor, but one gets the impression that he is pleased with the outcome.
In conclusion, Hardman raises an important question: with all the momentum for software-defined everything in the enterprise IT space, will “software-defined supercomputing” be the next big thing?