December 01, 2011
The leap forward in genomics technology promises to change health care as we know it. Sequencing a human genome, which costs millions of dollars just a few years ago, now costs thousands. And the prospect of mapping a genome for under a thousand dollars is on the horizon.
But cheap gene sequencing, by itself, won't usher in a health care revolution. An article in the New York Times this week points out that turning those sequenced genomes into something useful is the true bottleneck. Doctors would like to be able to use their patients genome to determine their susceptibility to specific diseases or to devise personalized treatments for conditions they already have.
Sequencing all the DNA base pairs is really the easy part of the problem. It just reflects the ordering of these bases -- adenine (A) , thymine (T), guanine (G), cytosine (C) -- in the chromosomes. The bioinformatics software necessary to extract useful information from this low-level biomolecular alphabet is much more complex and therefore costly, and necessitates a fair amount of computing power.
According to David Haussler, director of the center for biomolecular science and engineering at the University of California, Santa Cruz, that's why it costs more to analyze a genome than to sequence it, and that discrepancy is expected to grow as the cost of sequencing falls.
The NYT article reports that the cost of sequencing a human genome has decreased by a factor of more than 800 since 2007, while computing costs have only decreased by a factor of four. That has resulted in an enormous accumulation of unanalyzed data that is being generated by all the cheap sequencing equipment.
According to the article, the current capacity of sequencers worldwide is able to 13 quadrillion DNA base pairs per year. For this year alone, it is estimated that 30,000 human genomes will be sequenced, a figure that is expected to rise to the millions within just a few years.
Not only is that too much data to analyze in aggregate, it's also too difficult to share that volume of data between researchers. Even the fastest commercial networks are too slow to send multiple terabytes of information in anything less than a few weeks. That's why BGI (Beijing Genomics Institute), the largest genomics research institute in the world, has resorted to sending computer disks of sequenced data via FedEx.
Cloud computing may help alleviate these problems. In fact, some believe that Google alone has enough compute and storage capacity to handle the global genomics workload. Others believe that there is just too much raw data and researchers will have to pre-process it to reduce the volume or just hold onto the unique bits.
But there are even more challenging problems ahead. Metagenomics, which aggregates DNA sequences of a whole population of organism, is even more data-intensive. For example, the microbial species in the human digestive tract represent about a million times as much sequenced data as the human genome. And since that microbial population can have a profound effect on the its human host, that genomic data becomes a pseudo-extension of the person's genetic profile.
On top of that is all is the data associated with the RNA, proteins and other various biochemicals in the body. To get a complete picture of human health, all of this data has to be integrated as well. Data deluge indeed.
Full story at The New York Times
There is 1 discussion item posted.
Deep Computing for Deep Sequencing
Submitted by
prahalad
on Dec 4, 2011 @ 10:39 AM EST
Analyzing DNA data to arrive at an answer is more complex compared to data volume, transfer, and storage. Data analysis can only be attempted by using sophisticated technologies combined with the knowledge of Biology, Computer algorithm, and Statistics. Bioinformatics dealt with low-throughput assays in the past that have been employed to test a single hypothesis. DNA Sequencing in contrast, are hypothesis-generating experiments – they need complex mathematical and computational algorithms to quantify biology. As highlighted by this article, Cloud Computing can become handy and mitigate the issues related to storage, processing and even analysis (harnessing a large pool of high end computers). Thanks to Cloud computing - It will now be possible for a researcher in Tanzania to collaborate with a scientist in Thailand to protect the elephant population in the wild through the Cloud. iOmics is one such initiative offered by Geschickten Biosciences(www.geschickten.com) in the Cloud to address this grand challenge discussed in the article. iOmics (www.genomecomputingcenter.com/iomics) offers biologists end-to-end computational biology pipelines they need to process and analyze genomics, transcriptomics, SmallRNA, ChIP Seq data etc. The beauty of iOmics is that it is backed by a team of biologists, statisticians, and computational scientists to offer even more complex downstream analysis.
Post #1
|
Join the Discussion |
PGI, Cray, and CAPS enterprise are moving quickly to get their new OpenACC-supported compilers into the hands of GPGPU developers. At NVIDIA's GPU Technology Conference this week, there was plenty of discussion around the new HPC accelerator framework, and all three OpenACC compiler makers, as well as NVIDIA, were talking up the technology.
Read more...
NVIDIA has introduced its first Kepler-generation GPU product for high performance computing, and revealed some of the inner working of the new architecture. The announcement took place at the kickoff of the company's GPU Technology Conference taking place this week in San Jose, California.
Read more...
Intel Corp. has launched three new families of Xeon processors, joining the Xeon E5-2600 series the chipmaker introduced in March. These latest chips span the entire market for the Xeon line, from four- and two-socket servers, down to entry-level workstations and microservers. A number of HPC server makers, including SGI, Dell, and Appro announced updated hardware based on the new silicon.
Read more...