In many biological disciplines, and particularly in the field of genetics, the answers to scientists' questions are buried under mountains of complex data. In genome sequencing — ordering the billions of chemical building blocks that make up the genetic code of a cell — a newly discovered genome is compared to vast databases of well-known and publicly available genomes.
Comparing new genomes to these huge, and still growing, databases has become a task that is larger than one computer, even a supercomputer, can handle. Grid computing has stepped up to meet the challenge, with the help of the Genome Analysis Database Update tool.
GADU, which creates workflows, runs them on the Open Science Grid and TeraGrid, and stores the output, is a backend for applications used by geneticists for tasks ranging from biomedical applications to environmental cleanup. GADU runs on an average of 650-700 CPUs at a time, using more than 30 sites on the Open Science Grid and five clusters on the TeraGrid — one of the few applications employing both grids simultaneously.
“The parallelization of data, running different tools as resource independent workflows, allows many sites from multiple grids to send jobs all at once,” says Dinanath Sulakhe from the Computational Biology group at Argonne National Laboratory, who developed GADU with colleague Alex Rodriguez. “This saves on time, expense and human resources.”
Margie Romine, a microbiologist at the Pacific Northwest National Laboratory in Richland, Washington, has used GADU to help her study the genetic code of the bacterium Shewanella oneidensis MR-1, whose metal- and radionuclide-reducing capability can impact the movement of such materials in the environment.
By using an application called GNARE that uses GADU, she is able to use genome sequences to better predict functions of proteins and map them to metabolic pathways that describe how proteins work together to synthesize and degrade cellular materials. By using this system to also study and compare MR-1 to other Shewanella genome sequences (18 in all), she hopes to gain a more comprehensive understanding of the diversity of Shewanella behavior in the environment.
“If you have a gateway to the grid like GADU, you can spend your time answering biological questions rather than worrying how to maintain and support your own computing facility,” says Natalia Maltsev, head of Argonne's Computational Biology group. “Collaborators using this resource really depend on GADU, and we want to open it up to an even wider community.”
Source: Open Science Grid