Since 1986 - Covering the Fastest Computers in the World and the People Who Run Them

Language Flags
April 24, 2014

‘Sailfish’ Accelerates Gene Expression Analysis

Tiffany Trader
sailfish-logo_300x158

“Sailfish” is a new computational method out of Carnegie-Mellon University and the University of Maryland that speeds up RNA sequencing analysis by a factor of 20 or greater.

The method – dubbed Sailfish after the super-speedy fish – provides quantification estimates of gene expression much faster than previous methods such that a job that once took hours can now be completed in a few minutes without loss of accuracy. Details of the research have been published online in the journal Nature Biotechnology.

Gene expression is the process by which genes (stretches of DNA that encode information) interact to produce different traits, such as blue eyes or a predisposition toward cancer. Gene expression occurs in all known life – it’s how the genetic code stored in DNA is “interpreted.”

Along with major advances in genomics, gene expression analysis has grown in importance both for basic researchers and medical practitioners. There now exists large stores of RNA-seq data that scientists are using to re-analyze experiments, however the analysis is notoriously time-intensive with an average run taking about 15 hours.

Fifteen hours might not seem like a lot, but when you multiply that by 100 experiments, it adds up, says paper co-author Carl Kingsford, an associate professor in CMU’s Lane Center for Computational Biology, adding “with Sailfish, we can give researchers everything they got from previous methods, but faster.”

An organism’s genetic makeup is static, but the activity of individual genes varies greatly over time, explains the writeup from Carnegie Mellon. Gene expression is the key – it’s a research area that holds tremendous promise for disease prevention. Although gene activity can’t be measured directly, it can be inferred by tracking RNA, large molecules that perform vital roles in the coding, decoding, regulation, and expression of genes.

To observe RNA, scientists typically use a method called RNA-seq, which has been useful in the field of genomic medicine in the analysis of certain cancers. The process results in short segments of RNA, called “reads.” In previous methods, reconstructing RNA molecules in order to measure them employed a process called mapping where reads were mapped back to their original positions in the larger molecules like pieces in a puzzle. The research team was able to eliminate this time-consuming step by allocating parts of the reads to different types of RNA molecules. Essentially each read provides several up-votes for a given molecule. By leaving out the mapping step, Sailfish is able to perform its RNA analysis 20-30 times faster than previous methods.

The numerical approach will be more familiar to computer scientists than biologists, Kingsford notes, but Sailfish is more robust and better able to tolerate errors. Errors that would disrupt a mapping are not a problem for the “+1” approach. The result is increased accuracy.

“By facilitating frequent reanalysis of data and reducing the need to optimize parameters, Sailfish exemplifies the potential of lightweight algorithms for efficiently processing sequencing reads,” the authors write in the paper abstract.

The Sailfish code is available for download at http://www.cs.cmu.edu/~ckingsf/software/sailfish/.

SC14 Virtual Booth Tours

AMD SC14 video AMD Virtual Booth Tour @ SC14
Click to Play Video
Cray SC14 video Cray Virtual Booth Tour @ SC14
Click to Play Video
Datasite SC14 video DataSite and RedLine @ SC14
Click to Play Video
HP SC14 video HP Virtual Booth Tour @ SC14
Click to Play Video
IBM DCS3860 and Elastic Storage @ SC14 video IBM DCS3860 and Elastic Storage @ SC14
Click to Play Video
IBM Flash Storage
@ SC14 video IBM Flash Storage @ SC14  
Click to Play Video
IBM Platform @ SC14 video IBM Platform @ SC14
Click to Play Video
IBM Power Big Data SC14 video IBM Power Big Data @ SC14
Click to Play Video
Intel SC14 video Intel Virtual Booth Tour @ SC14
Click to Play Video
Lenovo SC14 video Lenovo Virtual Booth Tour @ SC14
Click to Play Video
Mellanox SC14 video Mellanox Virtual Booth Tour @ SC14
Click to Play Video
Panasas SC14 video Panasas Virtual Booth Tour @ SC14
Click to Play Video
Quanta SC14 video Quanta Virtual Booth Tour @ SC14
Click to Play Video
Seagate SC14 video Seagate Virtual Booth Tour @ SC14
Click to Play Video
Supermicro SC14 video Supermicro Virtual Booth Tour @ SC14
Click to Play Video