Visit additional Tabor Communication Publications
March 27, 2012
Tuscon, Ariz., March 23 -- What began as an initiative to give plant biologists access to the computing power necessary to analyze the extremely large sets of data that were newly permeating the field of biology, now has expanded to include projects and scientists from all fields of biology and biomedical sciences.
The University of Arizona leads the iPlant Collaborative, which is based at the University of Arizona's BIO5 Institute. Collaborating institutions include the Texas Advanced Computing Center, the University of Texas in Austin, Cold Spring Harbor Laboratory in New York, the University of North Carolina at Wilmington and Purdue University.
"iPlant is empowering people to use high-performance computing to analyze very large data sets," said Stephen Goff, the principal investigator and project director of the iPlant Collaborative. "A big function of iPlant is to bring together high-performance computing experts, build a cyberinfrastructure platform, and use it to advance life science research. Life science is more of a data-driven science now than it has been in the past."
Said Eric Lyons, a senior computational biologist working with iPlant: "We're at this interesting revolution with biology where we've become a much more quantitative science."
"And we're now able to generate gigabytes if not terabytes of data really easily. In order to get through that much data you need a lot of computational resources. iPlant is the first major investment by the National Science Foundation in order to build cyberinfrastructure for biologists to allow researchers to handle and cope with all of this information."
The five-year project received $50 million from the NSF in 2008. The iPlant team engaged with plant scientists from across the nation, as well as with some international scientists, to find out about their research computing needs: what types of data sets they work with and what kinds of questions they ask.
The iPlant team then used this information to develop computer software programs and projects that would be most useful to help the scientists store and process their data.
iPlant is about building cyberinfrastructure for life sciences, said Lyons: "Cyberinfrastructure is the essential 'plumbing' that we need in order to hook together different kinds of computational resources to make it easy for biologists to manage large amounts of data in terms of getting it someplace, keeping it organized, sharing it with their collaborators, and analyzing it to make scientific discoveries."
Traditionally, super computing centers have worked mainly with the physical sciences: modeling and dynamics of fluids, ocean currents, atmosphere, climate change, geological processes like earthquakes and fault-line stresses.
"They have advanced visualizations to deal with massive data sets, and now computer science experts are working together with biologists to develop software that makes it easy for biologists to store and analyze their data," Lyons said.
At the UA, a core group of software developers and computer science system engineers work in collaboration with engineers and scientists at XSEDE (Extreme Science and Engineering Discovery Environment), an NSF-funded super computing center that deals specifically with processing large amounts of data.
iPlant has developed ways for scientists to share information through software over the Internet and virtual servers created to store huge amounts of data.
"Let's say, for example, that you're a scientist and you're working at a site that manages a huge amount of data," said Lyons. "And you want to take a portion of your data and easily send it somewhere else to be processed, add more data to it and have it sent back, but have it all happen automatically. At iPlant, we leverage all the different technologies that we have, and evaluate which new computational technologies are going to help solve particular problems."
Said Matthew Helmke, who is a senior technical documentation specialist for iPlant: "We're also offering novel ways to interact with iPlant's computational systems."
"People can write their own software to interact with our systems. So you can go from: ‘I'm scared of the computer, but I'm a biologist and I have data and I want to do something with it, can you help me?' to the other extreme of: ‘I've been doing computational analysis since the 1980s; you just have really cool resources that I don't have, can I tap into those resources with my own analysis programs?' And I think we will make people happy coming from both extremes."
One of the flagship products is the iPlant Discovery Environment, which is a web interface.
"Let's say you have something that compares evolutionary relationships or traits with a common ancestral origin," Helmke said. "You can run the appropriate analysis on your own desktop system, and it will work reasonably well, but the analysis might take a considerable amount of time to complete. What if you could run the exact same program with the exact same parameters but have it leverage high performance computing resources, and still do it from your desktop using a web interface? That's what the iPlant Discovery Environment allows researchers to do."
The iPlant Discovery Environment gives researchers a way to store data, add new software that they design for their experiments, and collaborate with other scientists using that new software. They can also allow other scientists to use their software in the future.
"Once the data is in the public iPlant Data Store, and the software tools are made available in the iPlant Discovery Environment, they become available for other researchers to use," said Helmke. "You can also keep parts or all of your data and analysis tools confidential. Your data can be kept completely private or you may share it as you like, but if someone wants to replicate your experiment, how do they do that? The iPlant Discovery Environment and Data Store allows other researchers to use the exact same environment to replicate specific experiments."
One project iPlant is working on is the Taxonomic Name Resolution Service, or TNRS, a software system that compiles different classification schemes and creates links between the different systems, so that scientists can search for an organism's classification according to all of its classifications. This is important because many entries in collections can have multiple names, and the TNRS helps resolve naming discrepancies.
Said Shannon Oliver, a technical documentation specialist for iPlant: "It's really hard to push forward collaborative research without having some sort of standardization across the different studies. How do you collaborate when everyone's using a different name for different plants and how do you know that the data is still applicable across these different species?"
The iPlant Collaborative projects also have major educational components, funded by the NSF, BIO5 and Science Foundation Arizona. Students and teachers can access the iPlant technology resources and open-source data, and there are educational tools designed to help K-12, undergraduate and graduate students understand the data as well as how to use the computational tools provided by iPlant.
The main component of iPlant is it's collaborative nature, said Goff: "It's not about one person's research or even a small group's research. It's collaborative across plant biology disciplines, ecology, functional genomics, molecular genetics and evolution. There are many problems in biology that are beyond the scope of a single research lab, but within scope for multiple labs that have different levels of expertise in specific areas. iPlant is designed to empower collaborations across disciplines and facilitate major discoveries."
Source: The University of Arizona
In a recent solicitation, the NSF laid out needs for furthering its scientific and engineering infrastructure with new tools to go beyond top performance, Having already delivered systems like Stampede and Blue Waters, they're turning an eye to solving data-intensive challenges. We spoke with the agency's Irene Qualters and Barry Schneider about..
Large-scale, worldwide scientific initiatives rely on some cloud-based system to both coordinate efforts and manage computational efforts at peak times that cannot be contained within the combined in-house HPC resources. Last week at Google I/O, Brookhaven National Lab’s Sergey Panitkin discussed the role of the Google Compute Engine in providing computational support to ATLAS, a detector of high-energy particles at the Large Hadron Collider (LHC).
The Xeon Phi coprocessor might be the new kid on the high performance block, but out of all first-rate kickers of the Intel tires, the Texas Advanced Computing Center (TACC) got the first real jab with its new top ten Stampede system.We talk with the center's Karl Schultz about the challenges of programming for Phi--but more specifically, the optimization...
May 22, 2013 |
At some point in the not-too-distant future, building powerful, miniature computing systems will be considered a hobby for high schoolers, just as robotics or even Lego-building are today. That could be made possible through recent advancements made with the Raspberry Pi computers.
May 16, 2013 |
When it comes to cloud, long distances mean unacceptably high latencies. Researchers from the University of Bonn in Germany examined those latency issues of doing CFD modeling in the cloud by utilizing a common CFD and its utilization in HPC instance types including both CPU and GPU cores of Amazon EC2.
May 15, 2013 |
Supercomputers at the Department of Energy’s National Energy Research Scientific Computing Center (NERSC) have worked on important computational problems such as collapse of the atomic state, the optimization of chemical catalysts, and now modeling popping bubbles.
May 10, 2013 |
Program provides cash awards up to $10,000 for the best open-source end-user applications deployed on 100G network.
May 09, 2013 |
The Japanese government has revealed its plans to best its previous K Computer efforts with what they hope will be the first exascale system...
05/10/2013 | Cleversafe, Cray, DDN, NetApp, & Panasas | From Wall Street to Hollywood, drug discovery to homeland security, companies and organizations of all sizes and stripes are coming face to face with the challenges – and opportunities – afforded by Big Data. Before anyone can utilize these extraordinary data repositories, however, they must first harness and manage their data stores, and do so utilizing technologies that underscore affordability, security, and scalability.
04/15/2013 | Bull | “50% of HPC users say their largest jobs scale to 120 cores or less.” How about yours? Are your codes ready to take advantage of today’s and tomorrow’s ultra-parallel HPC systems? Download this White Paper by Analysts Intersect360 Research to see what Bull and Intel’s Center for Excellence in Parallel Programming can do for your codes.
In this demonstration of SGI DMF ZeroWatt disk solution, Dr. Eng Lim Goh, SGI CTO, discusses a function of SGI DMF software to reduce costs and power consumption in an exascale (Big Data) storage datacenter.
The Cray CS300-AC cluster supercomputer offers energy efficient, air-cooled design based on modular, industry-standard platforms featuring the latest processor and network technologies and a wide range of datacenter cooling requirements.