The Texas Advanced Computing Center (TACC) at the University of Texas has created a tool aimed at simplifying scientific computing. Staff at the center developed AGAVE (A Grid and Virtualization Environment), an API that assists researchers with Web-based science. A TACC article explains what drove the tool’s creation and how it has been utilized.
A primary reason for building the AGAVE API stemmed from scientists getting bogged down with software development. A University of Toronto study found that 84 percent of scientists considered development of scientific software as an important aspect of their research. However, researchers spent, on average, 30 percent of their time developing said software. If that amount of time were scaled back, those scientists would have more time to dedicate towards research.
TACC set out to create a tool that would simplify the process of developing scientific software. Similar to popular Web applications that provide file storage, online shopping and navigation services, the AGAVE API delivers science-as-a-service. Specifically, the API defines a number of rules and specifications that allow programs to communicate with one another. It gives researchers common tools including profile creation, software authorization and data migration. AGAVE also handles more complex tasks like job monitoring, metadata creation, and auditing.
The API is currently being used by the iPlant Collaborative, a group that develops cyber infrastructure and computational tools for research related to plant genetics. The collaborative uses AGAVE to receive compute capacity from the Pittsburgh Supercomputing Center (PSC), the San Diego Supercomputing Center (SDSC), and TACC through the Extreme Science and Engineering Discover Environment (XSEDE). XSEDE is a cyber infrastructure program funded through a $121 million investment from the National Science Foundation (NSF).
Through the API, scientists can access these resources for their own studies. At the University of South Dakota (USD), researchers created a program called BioExtract Server. The software leverages online informatics tools and databases, allowing users to create and share custom workflows for Web-based genomic analysis. The tool also handles search and analysis of online sequence data. While the software saved time, it had difficulties with more intense workloads.
“BioExtract Server couldn’t handle large datasets and it was really hard for our servers to execute analytic tools that are very CPU intensive,” said Dr. Carol Lushbough, computer science professor at the university. “We didn’t have the horsepower.”
iPlant on the other hand, excelled in these areas. This prompted Lushbough to implement AGAVE, which gave access to iPlant’s tools within BioExtract.
“When a service organization exposes its resources through APIs like iPlant’s Foundation API and our AGAVE platform, we can impact thousands of scientists, cutting across disciplines and demographics,” said Ryan Dooley, a research associate at TACC.
The example demonstrates the added capabilities AGAVE can offer, which may explain its popularity. The API has over 3,000 users as of July 2012. With 75 compatible applications, AGAVE has been used roughly 50,000 times per month.
TACC will be releasing the second major version of their API this summer. New features will focus on adding compatibility for public and private cloud infrastructures.