May 7, 2012

Genomic Data Gets Comfy in the Cloud

Robert Gelber

Sequencing the human genome has become an increasingly faster and cheaper task. While simplification of this process is welcome, it also creates some issues regarding delivery and analysis of sequencing data. One company believes it can solve these issues with the cloud.

DNAnexus logoTechnological advancements have greatly simplified the process of sequencing. Deepak Singh, Ph.D., principal product manager for Amazon Web Services, underscores this point:

“It took more than 10 years and billions of dollars to sequence the first human genome. Recent advances in genome sequencing technology have enabled researchers to tackle studies like the 1000 Genomes Project by collecting far more data faster.”

The task can now be accomplished in 24 hours for $1,000, creating an exponential growth in genomic data and introducing storage and delivery challenges.

Last week, Technology Review profiled startup DNANexus. The company views itself as a manager and distributor of data produced by sequencing centers. Genetic storage and analysis are accomplished through their platform, which leans on Amazon Web Services (AWS) rather than requiring an in-house cluster.

DNANexus views the cloud as the best vehicle to deliver and analyze sequencing data. The process begins at the sequencing center, where lab data is uploaded to AWS through the DNANexus website. Once transferred, the information can be accessed from the Web and analyzed using tools built into the site.

DNAnexus graphic

Andreas Sundquist, the company’s CEO and cofounder, is banking on exponential growth for services like DNANexus. While Sundquist estimates that 20,000 complete genomes have been sequenced already, he anticipates that number to grow to a million in the next few years. If that figure becomes a reality, the amount of information produced could exceed an exabyte.

DNANexus is not the only organization that recognizes the benefits of cloud services. Recently, the National Institutes of Health announced that data from the 1000 Genomes Project was publicly available through Amazon Web Services. Since the group’s inception in 2008 their dataset has grown to roughly 200 terabytes of genomic information.

In the future, Sundquist would like to see his company aggregate multiple genetic databases, possibly leading to better research and treatment of genetic-based diseases. He also believes, given the improving technology, that every member of developed nations will have their genome sequenced. This prediction even includes newborn babies. “I think probably you’ll stick your thumb in your cell phone and it will be built-in,” says Sundquist. While there isn’t currently an app for that, it’s not impossible to imagine one down the road.