Thanks to an NSF award worth $1.4 million, bioinformatics researchers at Washington State University and Clemson University and elsewhere will soon have the kind of infrastructure capable of supporting 21st century big data genomics workloads.
As part of the Tripal Gateway project, the multi-university team will work to enhance the scope and capacity of their existing cyberinfrastructure in order to better share and process large data sets.
“In a single day, some modern DNA sequencers can output as much data as the human genome,” stated project lead Stephen Ficklin, a researcher at Washington State. “We expect the deluge of data to continue to grow exponentially.”
“Genomics scientists who can access large data sets but have limited resources for storing, sharing and analyzing them will benefit from this work,” he added.
Over the next three years, the group will leverage software-defined networking technology to transfer large data sets between computational resources, while databases will also be enhanced to support data sharing and analysis. The objective is to link existing community databases for fruit and hardwood trees, as well as legumes, into a larger network of online research databases.
Led by Washington State University (WSU), the project will employ an open-source toolkit developed by a team at Clemson, called Tripal, to implement the project’s biological database.
While researchers need shared workload tools that are accessible and easy to use, community databases often lack the kind of robust infrastructure to support effective and efficient collaboration. Having already been adopted by multiple community databases, Tripal is uniquely positioned to provide a common infrastructure.
The Tripal Gateway includes a set of modules (extensions) that will be developed by the team:
– Tripal Galaxy – a module integrating Galaxy workflows into a Tripal site, providing both next-generation analytical workflows and seamless transition of results into the community database.
– Tripal Exchange – a module to provide capabilities for cross-site querying, enabling collation and viewing of data from multiple sites, and integration of data into workflows.
– Tripal SDN – a module incorporating software defined networking (SDN) technology, providing mechanisms to improve speed of data exchange.
The 1.4 million award is one of 17 grants, worth $31 million in total, allocated by the NSF Data Infrastructure Building Blocks (DIBBs) program.