April 24, 2009
COLLEGE PARK, Md., April 23 -- DNA sequencing is the next frontier in biological research. As new sequencing technology becomes more efficient and affordable, it is increasingly available to small laboratories. Thus, sequencing data is being generated at a faster rate than ever before.
However, the computing capacity needed to analyze such vast amounts of data still has some catching up to do. Large networks of interconnected computers, called computer clusters, are required to analyze these data. Expensive to establish and maintain, these computer clusters are generally available only to labs that can afford them.
Enter Mihai Pop, an assistant professor in the department of computer science and in the Center for Bioinformatics and Computational Biology at the University of Maryland. He and colleague Steven Salzberg, director of the center and Horvitz Professor of computer science, recently received a grant from the National Science Foundation Cluster Exploratory Program (CluE) to fund research aimed at discovering how remote cluster computers, computer networks available over the Internet, might be used to process DNA sequence data.
"There is a new initiative by NSF to figure out what you can do with cluster computers on the internet -- like the ones through Amazon, Google, and IBM," Pop said. "Our NSF grant will be used to find out if remote clusters of computers are a better option for DNA sequence analysis than local clusters of computers."
Pop's goal is to develop the software required to analyze sequence data in parallel (on many computers simultaneously). This massively parallel computing allows faster gene sequence alignment and genome assembly.
While parallel computing is already being used on locally maintained computer clusters, Pop will be working on programs that will allow researchers to perform their DNA sequence over the web by accessing remote computer clusters maintained by large companies on a pay-per-use basis. This paradigm is known as cloud computing.
So now, rather than buying and maintaining their own computer systems, researchers may simply be able to rent computer time at a fraction of the cost. But there are a few obstacles to overcome before Cloud Computing becomes a reality for genetic analysts.
"The first question is how to best split up the process of DNA sequence analysis to fit these computer clusters," Pop said. "The second is whether or not the benefits of cloud computing outweigh the costs of data transfer and storage."
The massive amounts of data generated by just one genome may take a significant amount of time to transfer over the internet. This, in addition to the data storage needed before analysis, might add costs that outweigh the benefits of using a remote computer cluster.
"Even if the analysis doesn't take long, the transfer may take forever and cost too much to make whole thing worthwhile," said Pop.
A Different Kind of Puzzle
DNA is made up of nucleotide bases that are abbreviated by the letters A, C, G, and T. Lined up in a double helix structure, they make up a code that is translated into the proteins that run our body processes. New technology can read this code and compare the genetic makeup of species and organisms.
However, the sequencing process cannot handle a whole genome at once. The DNA strands have to be chopped into small pieces, sequenced, and then those sequences have to be put back together again. Putting the pieces back together is what requires so much computing power.
There are two ways to put the pieces back together. If a reference genome is available from the same species, scientists can use the reference as a guide for piecing together the new sequence. However, if a reference is unavailable, the scientist faces the more difficult task of determining all possible combinations of the loosely fitting pieces and finding the best one.
Pop likens this process to completing a jigsaw puzzle. "If you have a reference genome, it's like having the box with the picture on the front to guide your assembly," he said. "With no reference, it's like having no picture and no idea what the finished product will look like; with lots of sky and ocean pieces that fit very loosely together."
Such a process requires a lot of computing power because of the number of possibilities and level of uncertainty. Computer clusters can do all the comparisons of sequence combinations and decide on the best one. But computer power and expense of systems are a limiting factor.
Pop's team will spend the next two years determining whether it is feasible and beneficial to do this analysis through cluster computers available on the internet. He will write software programs that, if successful, will be made available for researchers to use at no cost, and his results will be made available through journal articles and conference presentations.
Teaching and mentoring of both grads and undergrads will also be a large component of the grant, which Pop hopes will help entice talented computer science students to go into the biotechnology industry where their skills are needed.
Bioinformatics and Computational Biology at Maryland
Pop is a researcher in the University of Maryland Center for Bioinformatics and Computational Biology (CBCB), a multidisciplinary center dedicated to research on questions arising from the genome revolution. The center is a joint effort between the College of Chemical and Life Sciences and the College of Mathematical, Computer, and Physical Sciences, and is organized as a center within the University of Maryland Institute for Advanced Computer Studies (UMIACS).
The Center for Bioinformatics and Computational Biology is one of several highly interdisciplinary programs at the University of Maryland, bringing together scientists and engineers from many fields, including computer science, molecular biology, genomics, genetics, mathematics, statistics, and physics all of whom work toward the common goal of understanding life processes.
-----
Source: the University of Maryland
In quieter times, sounding the bell of funding big science with big systems tends to resonate further than when ears are already burning with sour economic and national security news. For exascale's future, however, the time could be ripe to instill some sense of urgency....
Read more...
In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Read more...
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
Read more...
May 23, 2013 |
The study of climate change is one of those scientific problems where it is almost essential to model the entire Earth to attain accurate results and make worthwhile predictions. In an attempt to make climate science more accessible to smaller research facilities, NASA introduced what they call ‘Climate in a Box,’ a system they note acts as a desktop supercomputer.
Read more...
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
Read more...
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
Read more...
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
Read more...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.