In an HPC market that seems determined to go down the CPU-GPU path, upstart Convey Computer may yet offer a few surprises. The company today unveiled the sequel to its HC-1 platform it introduced in 2008. Called the HC-1ex, the new system adds a lot more performance and capability, but retains the original x86-FPGA co-processor design.
Convey’s first HC-1 design, unveiled at SC08, began production shipment in 2009. Although still in startup mode, Convey seems to be on sound financial footing. They collected their second round of funding last summer, bringing their total to $40 million. Since then the company has increased its head count from 25 to 55.
According to company president and CEO Bruce Toal, they now have roughly 30 customer deployments, ranging from single units up to 8-node clusters. The majority of the systems have been installed for bioinformatics, government and research applications, with financial services, energy and logic simulation also represented.
Because of the platform’s malleability, it can serve virtually any HPC application domain. The basic concept is to offer a standard x86 server platform, but accelerated by FPGAs in the guise of a co-processor. For a specific application domain (or even just a single application), the FPGAs are programmed to extend the x86 ISA with custom instructions intended to accelerate the target software. These instructions are then generated by the Convey tools during source compilation. It’s a nifty little design, and worlds away from the more typical FPGAs-as-an-afterthought HPC approach that has been used in the past.
The CPU and FPGAs are glued together via the shared memory subsystem, which blends the x86 memory to the customized high performance memory on the co-processor side. This allows both of them to work within the same cache-coherent shared memory space. The approach is quite different from a conventional HPC accelerator, which typically treats the FPGA, GPGPU, or whatever as an I/O device, hanging off a PCI-Express slot. In Convey’s model, the FPGAs are virtualized and act as a true co-processor. “It enables you to build a completely integrated compiled environment, which we believe is a fundamental element for hybrid computing,” explains Toal.
The HC-1ex is the higher end version of the HC-1 but, according to Toal, is not a replacement for the original. In the second-generation product, the company has upgraded the dual-core Xeon to a quad-core part, and increased CPU memory capacity from 64 GB to 128 GB. More importantly, though, the HC-1ex has moved up to the latest generation Xilinx Virtex-6 FPGA (the LX760) from the Virtex-5 part (the LX330) in the original HC-1. The newer 40nm FPGA offers more that three times the gates of its predecessor.
Assuming the application can take advantage of those additional gates, that translates to higher absolute performance, better price-performance and increased performance per watt. For example, using a Smith-Waterman search (a nucleotide sequencing algorithm that scales extremely well on FPGAs), the HC-1ex performed 401 times faster than a single-core Intel CPU. That’s more than twice the performance of the HC-1. The general idea is to replace multiple racks of conventional servers with a single rack of Convey gear, so as to reduce floor space requirements, power usage and overall total cost of ownership (TCO).
The first HC-1ex was deployed at Georgia Tech in September. Rich Vuduc, assistant professor School of Computational Science and Engineering, is leading a research team to apply heterogeneous computing systems to data analysis and data mining applications. With the HC-1ex , Vuduc is developing a custom FPGA personality for his particular data analytics domain. The work is being partly funded under a DARPA contract, so one could surmise the work could end up in some interesting defense- or security-related applications .
Beyond the HC-1ex unveiling, Convey is also announcing some new partnerships this week. These include Panasas, AutoESL, Impulse, Jacquard Computing, and Voci Technologies. The Panasas collaboration will bring the company’s storage client software into the Convey OS and cluster framework software. The next three, AutoESL, Impulse and Jacquard, are providing higher level FPGA programming tools to help develop co-processor personalities.
The last-mentioned partner, Voci, is actually OEMing the Convey gear in the form of a speech recognition appliance. Called V-Blaze, the appliance can process a hundred phone conversations in real time and convert the conversations to text. The idea here is to be to transform phone conversations into text, which can then be keyword searched for further analysis. One application would be call center monitoring. Purportedly, the V-Blaze appliance delivers much better resolution and lower error rates than commercial voice recognition products. That’s 100x better than a single CPU could accomplish and perhaps 10x better than a GPGPU implementation.
The Voci collaboration is a good example of how Convey can expand its market other than through direct end user sales. But Toal does expect to see sizable growth in such sales over the next year, thanks to a larger distribution channel and the additional technology partnerships, not to mention the new HC-1ex offering. Fighting the GPGPU juggernaut won’t be easy, but the true believers at Convey seem determined to do so.