Few user organizations have had more hands-on experience with accelerators than the National Cancer Institute’s Advanced Biomedical Computing Center (ABCC). We asked Jack Collins, manager of the ABCC’s Scientific Computation and Program Development group, for his take on accelerator appropriateness.
HPCwire: You and others at your site have experimented heavily with accelerators over the years, first Cray bit matrix multipliers, then FPGAs and now GPUs. Why?
Jack Collins: As the sizes of the scientific problems that we encounter scale, so must our solutions to computational demands. For instance, next-gen sequencing is generating terabytes of information and we need to analyze it quickly because whole farms of these machines are being deployed. Also, imaging is now being fully integrated into the workflow and not being treated as a separate area. From the technology side, price/performance is a big driver. Price includes the total cost of ownership: power, cooling, programming, etc. And when you look at new technologies, such as the Tesla card, that offer one teraflop performance you can get a lot of bang for the buck.
HPCwire: What applications are you trying to accelerate? Can you talk about them in some detail?
Collins: There are several applications. The most straightforward for GPGPU is molecular dynamics and simulation. There is a lot of computation in the kernel and it maps very well to the hardware. An example would be NAMD. It was ported to the GPGPU by University of Illinois folks and they got a factor of about 200x speedup. We have other codes that have a similar kernel so we expect similar results. Small molecule-protein docking is another area where we are using the GPGPU. Right now we’re at about 10x over the latest Xeon processor, and we should see another 2 to 4x with a couple of more tweaks.
We’re also looking at imaging applications where the processing and analysis is taking too long. Analyzing 3D images using a GPGPU for the computation and another to display the results is something we’re very interested in. These applications map very well to the architecture. We’re also exploring bioinformatics applications, but the really great thing about the GPGPU and CUDA right now is that post-docs and universities are porting codes and putting them back into the public domain at an incredible rate. This means that the community effort can be used to leverage standard codes without a large investment. Everyone has a GPU, and CUDA can be gotten by just hitting the download button in your browser.
HPCwire: What have you learned about using accelerators?
Collins: In general, you must realize that you’re taking a risk. Things generally sound better than they actually perform. Basically, I now ask about the programming model before I even care about how the hardware works. If there is no reasonable way to program the system, even if it’s buggy and a bit clunky, then pass unless you have resources to burn on a project that is high risk. Another important thing is the market being targeted. If the market isn’t big enough to support the company, then the company and product may disappear whether it works or not.
HPCwire: Why are you working with GPUs today?
Collins: GPUs have several advantages over other accelerators right now. First, they don’t cost that much. Second, everyone has access to both the hardware and the programming tools. Once it’s in the hands of that many people, the number of applications and tools will simply take off. Third, and quite importantly, the performance is truly staggering. And finally, the programming model is being developed hand-in-hand, at least at NVIDIA, with the hardware development, with the goal of making it accessible to general programmers and not just specialists.
HPCwire: Have you abandoned FPGAs?
Collins: No. But our efforts have been scaled back significantly.
HPCwire: GPUs generally lack 64-bit precision and error correction capability. Are those important for any of your applications?
Collins: For many of our applications we can live with these limitations, especially when we’re doing some Monte Carlo or genetic algorithm runs that are averaged over a large number of simulations. However, I think that the new NVIDIA products are addressing these limitations.
HPCwire: What results have you gotten from using GPUs so far?
Collins: We’ve gotten some nice speedups on molecular docking, as I said earlier. Talking to others, preliminary numbers look very good on the other codes I described as well. The results are good enough to change the way we approach problems from a business workflow perspective.
HPCwire: How difficult are GPUs to program and work with compared with other accelerators you’ve tried?
Collins: Compared to earlier accelerators where you needed special libraries that may have severe limitations, or to FPGAs where one needed to understand the basic hardware, the CUDA programming model is relatively straightforward. In my mind it’s more like OpenMP or UPC. You may have to restructure your code or algorithm to get good performance, but you can still recognize the programming language when you’re done.
HPCwire: What are you hoping for from GPUs?
Collins: For problems that map well to the GPU, it can dramatically change the workflow of our scientists. On the desktop it can bring a lot of analyses into the “doable” or “interactive” realm, and that can really change the way we attack a problem. In the computing center, adding nodes of GPU that can accelerate an application by 100x can free up that many cores on my compute servers and reduce my power and cooling requirements to keep up with demand. Adding a GPU instead of another 100 cores is much easier if the software supports it. When the next generation of GPU comes out, I can simply replace it. And finally, at home I have a supercomputer in a box. At one teraflop for a new Tesla card, I can do a lot on my home computer now.
HPCwire: What company or companies are you working with, and what kind of products do they have?
Collins: For GPGPUs I’m primarily working with NVIDIA, and I’m focused on the Tesla card that they’ve just announced. We’re also working with Silicon Informatics to help us port code to GPUs in the drug design and discovery area.
HPCwire: What’s the collaborative model you’re using to work with them?
Collins: They’re providing training, advanced access to hardware, and actually listening to us about what is important for our problems. We’ve worked through direct communication as well as bringing some of our vendors together with NVIDIA to build a better product before it gets to us.
HPCwire: What advice do you have for other HPC users who are considering adding accelerators to their computing mix?
Collins: See if their problem maps to the accelerator they are considering. Determine what speedup is necessary for their applications to make a good business justification for porting to the new hardware. Are they looking for 2x, 10x, 100x? Is that goal realistic? And check out the programming model that is necessary to take advantage of the accelerator. If it takes six months to compute the problem on today’s hardware or six hours on an accelerator after ten years of coding, the answer becomes obvious when you look at the total time to solution. And we’re really interested in answering the question, right?