Nvidia Tesla King Abdullah University of Science and Technology
HPCwire

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Datanami
Digital Manufacturing Report
HPC in the Cloud

Genomics Drowning in Data


The leap forward in genomics technology promises to change health care as we know it. Sequencing a human genome, which costs millions of dollars just a few years ago, now costs thousands. And the prospect of mapping a genome for under a thousand dollars is on the horizon.

But cheap gene sequencing, by itself, won't usher in a health care revolution. An article in the New York Times  this week points out that turning those sequenced genomes into something useful is the true bottleneck. Doctors would like to be able to use their patients genome to determine their susceptibility to specific diseases or to devise personalized treatments for conditions they already have.

Sequencing all the DNA base pairs is really the easy part of the problem. It just reflects the ordering of these bases -- adenine (A) , thymine (T), guanine (G), cytosine (C) -- in the chromosomes. The bioinformatics software necessary to extract useful information from this low-level biomolecular alphabet is much more complex and therefore costly, and necessitates a fair amount of computing power.

According to David Haussler, director of the center for biomolecular science and engineering at the University of California, Santa Cruz, that's why it costs more to analyze a genome than to sequence it, and that discrepancy is expected to grow as the cost of sequencing falls.

The NYT article reports that the cost of sequencing a human genome has decreased by a factor of more than 800 since 2007, while computing costs have only decreased by a factor of four. That has resulted in an enormous accumulation of unanalyzed data that is being generated by all the cheap sequencing equipment.

According to the article, the current capacity of sequencers worldwide is able to 13 quadrillion DNA base pairs per year. For this year alone, it is estimated that 30,000 human genomes will be sequenced, a figure that is expected to rise to the millions within just a few years.

Not only is that too much data to analyze in aggregate, it's also too difficult to share that volume of data between researchers. Even the fastest commercial networks are too slow to send multiple terabytes of information in anything less than a few weeks. That's why BGI (Beijing Genomics Institute), the largest genomics research institute in the world, has resorted to sending computer disks of sequenced data via FedEx.

Cloud computing may help alleviate these problems. In fact, some believe that Google alone has enough compute and storage capacity to handle the global genomics workload. Others believe that there is just too much raw data and researchers will have to pre-process it to reduce the volume or just hold onto the unique bits.

But there are even more challenging problems ahead. Metagenomics, which aggregates DNA sequences of a whole population of organism, is even more data-intensive. For example, the microbial species in the human digestive tract represent about a million times as much sequenced data as the human genome. And since that microbial population can have a profound effect on the its human host, that genomic data becomes a pseudo-extension of the person's genetic profile.

On top of that is all is the data associated with the RNA, proteins and other various biochemicals in the body. To get a complete picture of human health, all of this data has to be integrated as well.  Data deluge indeed.


Full story at The New York Times

HPCwire on Twitter

Discussion

There is 1 discussion item posted.

Deep Computing for Deep Sequencing
Submitted by prahalad on Dec 4, 2011 @ 10:39 AM EST


Analyzing DNA data to arrive at an answer is more complex compared to data volume, transfer, and storage. Data analysis can only be attempted by using sophisticated technologies combined with the knowledge of Biology, Computer algorithm, and Statistics. Bioinformatics dealt with low-throughput assays in the past that have been employed to test a single hypothesis. DNA Sequencing in contrast, are hypothesis-generating experiments – they need complex mathematical and computational algorithms to quantify biology. As highlighted by this article, Cloud Computing can become handy and mitigate the issues related to storage, processing and even analysis (harnessing a large pool of high end computers). Thanks to Cloud computing - It will now be possible for a researcher in Tanzania to collaborate with a scientist in Thailand to protect the elephant population in the wild through the Cloud. iOmics is one such initiative offered by Geschickten Biosciences(www.geschickten.com) in the Cloud to address this grand challenge discussed in the article. iOmics (www.genomecomputingcenter.com/iomics) offers biologists end-to-end computational biology pipelines they need to process and analyze genomics, transcriptomics, SmallRNA, ChIP Seq data etc. The beauty of iOmics is that it is backed by a team of biologists, statisticians, and computational scientists to offer even more complex downstream analysis.

Post #1

Join the Discussion

Join the Discussion

Become a Registered User Today!


Registered Users Log in join the Discussion

May 22, 2012

May 21, 2012

May 18, 2012

May 17, 2012

May 16, 2012

May 15, 2012

May 14, 2012

May 11, 2012

May 10, 2012

May 09, 2012


Most Read Features

Most Read Around the Web

Most Read This Just In

DataDirect Networks

Feature Articles

OpenACC Starts to Gather Developer Mindshare

PGI, Cray, and CAPS enterprise are moving quickly to get their new OpenACC-supported compilers into the hands of GPGPU developers. At NVIDIA's GPU Technology Conference this week, there was plenty of discussion around the new HPC accelerator framework, and all three OpenACC compiler makers, as well as NVIDIA, were talking up the technology.
Read more...

NVIDIA Launches Kepler Into HPC

NVIDIA has introduced its first Kepler-generation GPU product for high performance computing, and revealed some of the inner working of the new architecture. The announcement took place at the kickoff of the company's GPU Technology Conference taking place this week in San Jose, California.
Read more...

Intel Rolls Out New Server CPUs

Intel Corp. has launched three new families of Xeon processors, joining the Xeon E5-2600 series the chipmaker introduced in March. These latest chips span the entire market for the Xeon line, from four- and two-socket servers, down to entry-level workstations and microservers. A number of HPC server makers, including SGI, Dell, and Appro announced updated hardware based on the new silicon.
Read more...

Sponsored Whitepapers

Sponsored Multimedia

ISC Think Tank 2012

Newsletters

Exxact

HPC Job Bank


Featured Events







HPC Wire Events