Genomics workloads have proven to be perfect match for the cloud era, a point that was brought to light once again. As part of the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Project, Baylor College of Medicine, Amazon Web Services (AWS), and DNAnexus have teamed up to run the largest ever cloud-based analysis of genomic data.
It’s become a familiar tale. The CHARGE project had a job to run – a massive analysis load that needed processing. But the job exceeded the computing and storage resources of partner organization, the Human Genome Sequencing Center (HGSC) at Baylor College of Medicine.
The options were to purchase more hardware for a short-term project; “jam the cluster” to attempt to get the job done at the cost of pushing back other important work; or identify a suitable cloud-based solution. They decided to go for option number three, and signed on to work with DNAnexus and Amazon Web Services for this ultra-large scale genomic analysis project.
As part its participation in the project, the Human Genome Sequencing Center (HGSC) at Baylor College of Medicine used the DNAnexus enterprise cloud platform (hosted by AWS) to power its Mercury pipeline, a semi-automated and modular set of tools for the analysis of next-generation sequencing data. With DNAnexus providing the platform-as-a-service (PaaS) on top of AWS infrastructure, HGSC was able to analyze the genomes of over 14,000 patients, encompassing 3,751 whole genomes and 10,771 whole exomes.
As the case study describes, the entire job was run over a four-week period, using approximately 2.4 million core-hours of computational time with a peak of 20,800 cores to generate 440TB of results and nearly 1 PB of data storage. The output from the pipeline and the analysis of the CHARGE data, as well as the tools themselves were made available to over 300 researchers from across five collaborating institutions.
Having the option to run this ultra large-scale clinical analysis of genomic data without any capital investment helps the CHARGE Consortium get closer to its goal of unlocking the mysteries of human genetics with regard to heart disease and aging, paving the way for the development of new medical interventions and analysis tools.