Not long ago, the high cost and relative slowness of DNA sequencing were the rate-limiting bottlenecks in biomedical research. Today, post-sequencing data analysis is the biggest challenge. The reason, of course, is the prodigious output from modern next-generation sequencing (NGS) instruments (e.g., Illumina and ThermoFisher/Life Technologies) overwhelming analysis pipelines.
Efficiently sifting the data treasure trove is a huge headache for the bioinformatics community. Moreover, there are many different kinds of post-sequencing data analysis, (e.g., assembly, alignment, variant calling, RNAseq) all of which may stress HPC systems in different ways. Recently, HPC blogger Richard Casey reported achieving a 12X analysis speed-up using the new the Tesla K80, NVIDIA’s newest GPU in the Tesla series.
Casey detailed the work on in his blog. He chose to tackle NGS data alignment, a frequent and time-consuming step. Casey is a bioinformaticist at Colorado Statue University Next Generation Sequencing Core. His main role is providing bioinformatics and data analysis support to principal investigators and researchers who use the Core’s NextGen sequencers.
Blog Excerpt: “Typically, the DNA reads coming off the sequencers are aligned to a so-called reference genome. This step attempts to map sample reads to the reference genome, in which the results are used for SNP analysis and other types of reports. The current human reference genome contains about 3.1 billion nucleotide bases. This is a fairly large genome (although there are considerably larger genomes in other species).
“Sequence alignment of NGS sample reads against the reference human genome can take a few hours to several hours to run on a reasonably high-end server or cluster. Although this is not excessive for a single sample, with the shear number of samples handled by current sequencers, and with multiple sequencers in an NGS Core or lab, the accumulated time spent in the sequence alignment step can be problematic.
“To help alleviate this alignment issue, we have been evaluating parallel sequence alignment algorithms and software alignment tools. As mentioned in an earlier blog post, we’re testing the NVBIO suite on NVIDIA GPU’s. nvBowtie is a GPU-enabled sequence alignment tool in this software suite. bowtie2 is a popular CPU-only counterpart to nvBowtie. To compare the performance of these two applications, we performed a DNA sequence alignment benchmark test with human genome datasets.”
It’s best to refer to Casey’s blog for the full details (sample sources, sequencers used, reference genome used, etc.). The tools benchmarked included nvBowtie v.0.9.9.3 from the NVBIO suite and bowtie2 v.2.2.4 sequence alignment tools. “nvBowtie is designed for highly-parallel GPU-only sequence alignments, whereas bowtie2 is designed for moderately-parallel CPU-only alignments. In a sense this is a comparison of fine-grained vs. coarse-grained parallelism,” wrote Casey.
Numerical simulations were performed on Microway’s Tesla GPU Test Drive accelerated compute cluster. The performance comparisons were made between a cpu-only and gpu-only system.
The CPU-only Test used bowtie2 with the following system and run configurations:
- Cray XT6m
- (2) 12-core AMD Opteron 6100 CPU’s per compute node
- 32 GB RAM per compute node
- 12 CPU-threads
- Ran bowtie2 on a single compute node
Results of the CPU-only sequence alignment runs were 206 min. or 3.4 hrs.
The GPU-only Test used nvBowtie with the following system and run configurations:
- Intel cluster
- (2) 12-core Xeon E5-2680v3 CPU’s per compute node
- 128 GB host CPU RAM per compute node
- NVIDIA Tesla K80 GPU per compute node
- Ran nvBowtie on a single compute node
Results of the GPU-only sequence alignment runs were 16 min. or 0.25 hrs.
All netted out, the CPU-only bowtie2 run versus the GPU-only nvBowtie run produced a 12.8X speed-up (206 min/16 min).
Casey wrote: “The 12.8X speedup of nvBowtie on a K80 GPU is encouraging. For human genome sequence alignments, this reduced the wall-clock run time from several hours to a few minutes.
“The older AMD Opteron series processors used in these tests have been superseded by newer AMD FX series and Intel 5th generation processors, among others. The speedups seen here would undoubtedly be reduced somewhat when compared to newer CPU processor models. However, nvBowtie is currently at version 0.9 (not even version one yet). We would expect continued algorithm development and optimization in nvBowtie to result in performance improvements for the code, thus maintaining speedups somewhere in the 8X – 10X range. In any event, shaving hours off the sequence alignment runtimes is significant.”
The K80 specifications are:
- 4,992 GPU cores
- 24 GB GDDR5 RAM
- 480 GB/sec. memory bandwidth
- 300 W power consumption (important spec!)