Just as computers assist detectives in finding people by comparing fingerprints from crime scenes with millions in databases, Argonne National Laboratory scientists are using computers to mine genetic information from pathogens, people and plants. This information is essential to progress in medical science and biotechnology.
“The biology revolution came about with the massive sequencing of the genomes,” said Natalia Maltsev, head of Argonne's bioinformatics group in the Mathematics and Computer Science Division. “Genomes are in essence the blueprints of the organisms. Currently there are 294 completely sequenced genomes publicly available, and more than 1,500 are in the pipeline.
“When the genomes are sequenced, it is just an alphabet soup,” Maltsev said. The genomic data provide a string of letter-pairs that represent a genome's chemical bases.” The string is long – the human genome alone has 3 billion base pairs. “The information is in there, but we need to extract it.”
By comparing data, these researchers can take a small amount of known information from one genome – for example, genes involved in energy production – and compare it to all other genomes. If the same sequences are found in other genomes, that solves a small segment of the genome under study.
Piecing these bits of biological information together by using computers is a relatively new field called bioinformatics. Argonne researchers are the first in the field of bioinformatics to use hundreds of computers around the clock to analyze genomic information.
For example, comparison of pathogenic and nonpathogenic Mycobacterium species revealed that these strains differ by several genes – which means that these genes can be implicated in causing disease. Knowing these genes, computers can seek them out in other genomes, and when they are found, their presence tells researchers that organism is potentially pathogenic. Medical researchers can use this data to develop treatments.
Argonne's contributions to bioinformatics include developing databases and analytical tools using an Argonne-developed technology to perform rapid calculations, and guiding biological research.
This bioinformatics research is a key component of Argonne's multi-million dollar, multi-disciplinary structural biology program, which provides bioinformatics guidance to researchers that can reduce the cost of identifying unique structures of medical and biotechnological significance.
Argonne's computational biologists have created databases and tools to extract important information from the genetic “alphabet soup.” Their main database, PUMA2, combines information from 22 databases.
“We set up PUMA2,” Maltsev said, “because we are interested in evolution – the fundamental questions – what is the same and what is different in each organism and how it affects function.”
The team has also developed “Pathos” and “Chisel,” software tools that work with PUMA to search for specific interests. Pathos is a database for biodefense research. It contains all publicly available genomes of pathogens, including Bacillus anthracis (anthrax) and Yersinia pestis (plague). Chisel enables identification of eukaryotic (muti-celled organisms) and bacterial versions of the same enzyme functions.
The bioinformatics group coupled its extensive network of data and tools with the Grid. Grid technology, spearheaded at Argonne, allows supercomputers in different locations to work together seamlessly. With this kind of computing power, researchers can perform in one week comparisons that would take 18 months for researchers using a single computer.
Elizabeth Glass, a member of Argonne's bioinformatics group, uses this combination of computing power and databases to guide researchers performing the more time-consuming and costly processes of structural biology to find unique structures to add to the databases.
Glass steers researchers at two National Institutes of Health-funded Regional Centers of Excellence – the Argonne-based Midwestern Center for Structural Genomics and the Great Lakes Regional Center for Excellence for Biodefense and Emerging Infectious Disease Research. The bioinformatics group also provides valuable resources for the National Institute of Health's Bioinformatics Research Center and National Microbial Pathogen Data Resource, and the Department of Energy's Microbial Genomes Program in the Office of Biological and Environmental Research.
“The work is fascinating,” said Maltsev, a medical doctor and immunologist. Immunology required her to “spend huge amounts of time to extract small facts. Now all of the biological information is at our disposal, and we can derive how evolution was working. We can see evolution because bacteria are similar to animals, and animals are similar to each other.”
The work is also varied. Argonne bioinformaticists are working with researchers at the Pacific Northwest National Laboratory to find an organism that can clean radioactive materials that have seeped into the ground under tanks at the Hanford site, which produced nuclear materials.
“There are organisms that actually live in this environment of boiling nitric acid with high levels of radiation,” Maltsev explains. “We are searching for microorganisms that could survive and even clean such an environment.”