In a show of momentum Cycle Computing today launched the latest version of its flagship product, CycleCloud v5, and also announced the company has been awarded U.S. Patent 9,146,840 for automatically detecting and resolving infrastructure faults in cloud infrastructure.
The new version of CycleCloud includes a single dashboard for quickly and securely accessing and managing workloads on leading cloud service providers: AWS (NASDAQ: AMZN), Microsoft Azure (NASDAQ: MSFT), and Google Compute Engine (NASDAQ: GOOG). It also includes and improved ability to deploy custom images, which facilitates supporting a wide range of compute use cases.
The patent relates to technology to ensure required infrastructure and services are fully functional before and during the execution of clustered applications in the cloud. Cycle reports the patent is a critical component of Cycle’s software, “enabling error-free, production-quality clusters to deliver Platform-as-a-Service (PaaS) computation and data analysis capabilities.” (see patent abstract is below.)
After years of “tire kicking” the use HPC in the cloud seems poised for fast expansion. Following a recent NSCI workshop at the White House, Doug Burger, of Microsoft Research, wrote in a blog, “The scale and explosive growth rate of the cloud market was surprising to many of the attendees. The scale of the cloud market will enable investments in system design that are otherwise unaffordable.”
Cycle Computing has been carefully positioning itself as a cloud-based HPC enabler. It’s software, among other things, orchestrates fast provisioning of clouds for HPC workloads, data transfer, and heterogeneous resource managment. Recently, Cycle helped The Broad Institute ramp up a genome analysis workflow on 50,000 cores on GCE. This job took advantage of GCE’s newly launched “preememptible virtual machine) capability to help control cost.
Cycle added several other features to CycleCloud, among them:
- AWS-specific Spot Bidding Optimizations provide the ability to minimize compute cost by automatically and efficiently using features such as equivalent instance types, multiple application zones, and real-time cost monitoring among others.
- Support for FTD (Fast Data Transfer developed by CERN). FDT is an application for efficient data transfers that is capable of reading and writing at disk speed over wide area networks (with standard TCP).
- Event-based workflow support, which enables independent execution of multi-step workflows in the cloud, such as those composed of data transfers and multiple applications or analyses.
“Cycle continues to focus on developing software that provides users simple, scalable access to cloud-based computing, so that customers can focus on their science, engineering, and making better business decisions,” said Jason Stowe, Cycle Computing CEO.
On the patent front, the technology identified helps Big Data and HPC users leverage Cycle’s software to deal with the challenges inherent to running at scale in the cloud. When deploying cloud resources often a sub-group of the resources will fail to be able to run the assigned application for a variety of reasons, which is the problem the patented software method resolves.
Here’s the patent abstract: “Systems and methods are provided for any party in a cloud ecosystem (cloud providers of such resources, the intermediate management software for such resources, and the end user of such resources) to detect and resolve faulty resources synchronously or asynchronously, before said faults adversely affect the users’ workloads. The system requests a service or set of one or more resources within a cloud, automatically checking the infrastructure for various faults that would cause it to be non-functional, including pre-defined and user-defined checks, and resolving them before including the infrastructure in the working service cluster of resources. The system presents an API to the user that returns only functional, production-quality resources that are not in a faulty state. An API that tests and resolves bad infrastructure can be registered during the request or a preceding/subsequent API call, removing the need for the end-user to deal with various types of infrastructure faults.”