News emerged recently that might reshape how genomics researchers think about the speed and accuracy of gene sequencing analysis projects that rely on the Smith-Waterman algorithm.
Sunnyvale, California-based coprocessor company, DRC Computer Corporation, announced a world record-setting genetic sequencing analysis appliance that was benchmarked in the multi-trillion cell updates per second range—a figure that could have gone higher, according to DRC’s Roy Graham. Although similar claims to supremacy have been made in the past, the company states that this marks a 5x improvement over previously published results.
While it might be tempting to think this is just another acceleration story about toppling old benchmarks, this one does have something of a unique slant.
While one of their FPGAs has the equivalent performance of 1000 cores, and this is interesting in itself, the company has advocated that there is a defined cloud computing angle since ideally, their FPGA-based Accelium board can be plugged into a standard x86 server via standard PCIe slots.
DRC claims that the “time and cost to complete [gene sequence analysis] can be reduced by a factor of 20 using standard Intel-based servers installed with their own DRC Accelium processors running on Windows HPC Server 2008 R2. They suggest that analysis time is sliced in addition to “over 90% the computing cost, power, real estate and infrastructure required to obtain the results.”
The beauty here, as they see it, is that standard commodity hardware can be significantly enhanced in a plug and play fashion that becomes thus cloud-enabled and more accessible to a broader array of potential users than before.
DRC is pitching this solution as cloud-ready when built in a private cloud, which was the environment they chose for their benchmarking effort. All debates about the validity (or newness) of private clouds aside, there could be changes coming for life sciences companies who want to make use of Smith-Waterman but have been barred due to the high costs of running this hungry algorithm in-house.
Roy Graham from DRC stated that the cloud value of the company’s announcement lies in the fact that eventually, many common sequencing services will be cloud-based and right now, what they’re looking at is a very high volume, scalable and cost-effective platform. He claims that the company is currently in discussions with a number of cloud services companies and at this point, what they’re looking for is a proof point.
DRC claims that due to the inherent parallelism of their reconfigurable coprocessors, such solutions are extremely scalable and adaptable to modern cloud computing environments where computing resources can be shared across multiple users and applications.
According to Steve Casselman, CEO of DRC Computer Corporation, there is definitely a future in the clouds for Smith-Waterman. During a conversation with HPC in the Cloud last week, he speculated on the concept of a “corporate biocloud” where users will be able to run Smith-Waterman on as much the hardware as needed while at the same time running other processes in an on-demand format. This is what he calls an example of “acceleration on-demand,” noting that there are several different algorithms ripe for this kind of capability.
Casselman insists that the main takeaway is that “it doesn’t require a very controlled environment to build this type of network or structure so it lends itself very well to a general cloud environment.”
There are some solid reasons to support efforts to refine the infrastructure concerns for an algorithm like Smith-Waterman. It has been around for over two decades and produces refined results, but the user base behind it is small given the high costs of achieving the precise output. This means that companies that want to make use of Smith-Waterman face far higher costs if they require the specificity that other genomic applications cannot match. This could make good cost sense for companies that need the specificity of results but cannot invest up-front for the hardware required.
While Smith-Waterman is considered by some to be the gold standard for this type of work, the associated costs have led to companies using heuristic applications like BLAST to achieve results, in part because it is a cost-efficient fit for modern CPU architectures, according to Steve Casselman, CEO of DRC.
Will Smith-Waterman be delivered as a service (with an application wrapped around it) so more refined results from genetic sequencing projects can be realized by a broader class of researchers and life sciences companies? Would it require a friendly interface and inherent ease of use–and if so, who would champion the middleware cause if it was made attractive enough by efforts from companies like DRC?
Microsoft (which already offers some applications via its Azure cloud to lure in life sciences) might be the source of such a project and did take initialinterest in DRC’s benchmarking effort. The coprocessor company approached them before undertaking the benchmark as they felt that some of their big life sciences customers who were using Windows HPC Server needed benchmarks not based on Linux (although by the way the results between Linux and HPC Server were comparable).
Jason Stowe, CEO of Cycle Computing noted that there is demand for Smith-Waterman as a service and that it can be successful. In a short interview Stowe noted that, “When it comes to Smith-Waterman, we have nVidia GPU-enabled versions (CUDA SW++) deployed on our CycleCloud Clusters-as-a-Service, that accelerate this algorithm to run 10-50x as fast as BLAST on comparable queries. CycleCloud’s ability to start up 64 GPU clusters on Amazon EC2 in 15 minutes enables users to take advantage of both GPU-acceleration, and cloud cost-cutting, to analyze whole genomes using Smith-Waterman, at a fraction of the cost.”
If the algorithm, which produces superior results but has been prohibitively expensive finds the needed acceleration to make it more affordable, demand could rise for the decades-old code, especially if it is being run remotely on a pay-as-you-go basis.
While there is certainly some speculation here, what is clear is that there could be a new slant on old tools to cure diseases. As DRC’s Roy Graham stated, “The FPGA is a means to an end, so although this is a nice FPGA story, the real story is now we have the ability to provide highly accurate assessments of an individual’s affinity to a specific disease condition. In predictive diagnosis, accuracy is key and so far it’s been compromised because of a lack of cost effective computing resources. We now have the platform that can bridge the cost/accuracy divide.”