Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them

Language Flags
June 24, 2013

Cycling through Genomics and Other Cloud HPC Applications

Ian Armas Foster

HPC applications run in the cloud tend to be those of the experimental nature. That property thus lends cloud-based HPC nicely to scientific purposes, especially that in the genomics world, to the extent that such efforts are being recognized as a ‘best practice’ in a biological IT context.

HPC in the Cloud caught up with Cycle Computing CEO and Co-Founder Jason Stowe, where he discussed the company’s efforts in aiding Schrodinger, Inc., a company focused on chemical simulation for biotechnical and pharmaceutical purposes, in their efforts as they won Bio-IT World’s best practices award last month. Stowe also discussed how exactly their Utility HPC software advances the state of scientific HPC in the cloud as well as their initiatives in the months and years to come.

“Schrodinger won the best practice award,” Stowe said, “for a large-scale run that we did with them where we had a 50,000-core computing environment and ran approximately 12 years of science on it in 3 hours.” For Stowe, the biggest benefits here are cost and speed. In speaking with analysts from places like IDC, the cost of buying and operating such a server to run those computations could easily run to the millions.

That cost is worth it for national labs and large institutions that would continually use those servers. For a company like Schrodinger, however, the cost and space requirements to install such a datacenter would be prohibitive.

As such, through Cycle’s Utility HPC software running in the Amazon Web Services cloud, Schrodinger was able to significantly reduce costs on the simulation. “We turned [the system] off,” Stowe explained, “and the total cost at the time to do this was $4829 to run per hour so about $14,500 total for the workload.”

However, as one would surmise from previous HPC in the Cloud articles on organizations like CERN and the European Space Agency running experimental applications on a virtualized cloud environment, cloud-based HPC is not limited to those who can ill afford an idle datacenter. “We have customers who use 40 cores and customers who use 40,000 cores.”

According to Stowe, Cycle worked recently with a large pharmaceutical company, which was running genomics simulations, to garner similar cost and time compression, where they reportedly ran “39 years of science in 11 hours” on a ten thousand server infrastructure, a process which only cost about $4400.

Stowe explained how their software utilizes and takes advantage of server clusters such that they mimic an in-house scientific HPC machine. “Our premise here with utility supercomputing is basically that individual researchers can now grab very large high throughput capability machines.”

High throughput is important, as it is that feature which appeals to the majority of new scientific applications being built and run today. “[The new science is] data parallel, it’s big data, it’s analytics. All of those workloads work well on high throughput computing environments. Basically we have the ability to create large-scale environments that operate quickly to run these newer classes of workloads that require a high throughput,” Stowe said.

Specifically, according to Stowe, Cycle’s Utility HPC software works on creating that throughput with a heavy emphasis on job scheduling and workload management. Further, the software is quite active in the automatic bidding for Amazon’s idle computing services, acquiring additional resources when various jobs require it. “As you accumulate more and more samples from the sequencer, we would be able to deploy large scale clusters that would be capable of analyzing that data and then turn around and managing cost across those clusters by handling spot market bidding, which is Amazon’s marketplace for idle computing.”

 To give an example, Stowe spoke of a genomics company that requested MPI jobs that required many processors and heavy throughput. “If you’ve got a next gen sequencer, putting data down on a local cloud system, our software would schedule copying the data externally and would deploy clusters to run secondary and tertiary analysis on the genomic data, it would handle automatically archiving a copy of that data into glaciers so you always had a backup at a very low cost point”

Genomics is one of the more notable use cases for those looking to run certain HPC applications in a virtualized environment. This makes sense, as the ability to cheaply and quickly run genomic sequencing relative to ten years ago (when it took a decade and several billion dollars) is impressive. It is also highly data-intensive, and most of that data is necessary in the analytics. Stowe noted that Cycle’s goal is to be able to run background analytics while the data is stored in various cloud servers.

However, Cycle does not aim to solely focus on genomics. Stowe noted that cloud-based HPC applications are attracting the attentions of manufacturing and finance folks, as they look to run multiple experimental simulations without having to further tax their in-house HPC resources, and Cycle hopes to be on the forefront of that.

SC14 Virtual Booth Tours

AMD SC14 video AMD Virtual Booth Tour @ SC14
Click to Play Video
Cray SC14 video Cray Virtual Booth Tour @ SC14
Click to Play Video
Datasite SC14 video DataSite and RedLine @ SC14
Click to Play Video
HP SC14 video HP Virtual Booth Tour @ SC14
Click to Play Video
IBM DCS3860 and Elastic Storage @ SC14 video IBM DCS3860 and Elastic Storage @ SC14
Click to Play Video
IBM Flash Storage
@ SC14 video IBM Flash Storage @ SC14  
Click to Play Video
IBM Platform @ SC14 video IBM Platform @ SC14
Click to Play Video
IBM Power Big Data SC14 video IBM Power Big Data @ SC14
Click to Play Video
Intel SC14 video Intel Virtual Booth Tour @ SC14
Click to Play Video
Lenovo SC14 video Lenovo Virtual Booth Tour @ SC14
Click to Play Video
Mellanox SC14 video Mellanox Virtual Booth Tour @ SC14
Click to Play Video
Panasas SC14 video Panasas Virtual Booth Tour @ SC14
Click to Play Video
Quanta SC14 video Quanta Virtual Booth Tour @ SC14
Click to Play Video
Seagate SC14 video Seagate Virtual Booth Tour @ SC14
Click to Play Video
Supermicro SC14 video Supermicro Virtual Booth Tour @ SC14
Click to Play Video