Soybean Science Blooms with Supercomputers

Aug. 16 — Knowledge of the soybean in the U.S. has come a long way since its humble start, namely as seeds smuggled by ship from China in the 1700s. A sanction back then from emperor Qianlong prevented trade outside of Canton. Undeterred, a former seaman with the East India Trading Company named Samuel Bowen first brought soybeans to Savannah, Georgia, in 1765. A couple of years later Bowen filed a patent for a new way of making sago (a starchy cake), vermicelli (noodles), and soy sauce from soybeans. Soybeans on colonial soil also got noticed by Benjamin Franklin, who wrote of their universal use in China as a cheese, which we now call tofu.

All the way through to the 20th century knowledge of soybeans came from the outside through selective breeding and manipulation of its environment — the warm weather, targeted water, loose soil, and full sunlight it needs to grow.

Today, an ambitious project called Soybean Knowledge Base (SoyKB) developed at the University of Missouri-Columbia (MU) aims to find and share comprehensive knowledge from within the soybean, its genetic and genomic data, all publicly available and achieved through the use of high-performance computing.

Dong Xu is one of the principal investigators of SoyKB, which he describes as a web resource for all soybean data from molecular data to field data including several analytical tools. Xu is a professor and department chair of computer science at MU.

“Our goal, first of all, is to provide a resource for people to find information about the soybean genes, their behavior, their gene expression, the metabolic pathways, and more,” Xu said. He added that it’s more than just a clearinghouse of data. SoyKB promotes deeper understanding through data analysis for scientists who want to improve crops to develop and verify their hypothesis. More than 2,000 unique users log on to the SoyKB website every month, and over 10,000 unique users have utilized SoyKB since it was developed in 2010.

SoyKB started small, initially focusing on the genomics aspects of soybean data, according to Co-PI Trupti Joshi. She is the director of Translational Bioinformatics at the School of Medicine Medical Research Office and assistant research professor in the Department of Molecular Microbiology and Immunology at MU.

“After a year or two,” said Joshi, “we added the USDA germplasm data set, which gives you phenotypic information for about 19,000 soybean germplasm lines.” Germplasm is basically the living genetic information from seed banks scientists use to improve their breeding. “That is when we started building a lot of tools in the informatics suite,” she said. These efforts, she added, are helping researchers find connections between the genomics data and variations in the germplasm lines.

“SoyKB has grown tremendously,” Joshi said. “Over the years, we have had users from academic and industry environments. We have both domestic and international users from Canada, Brazil, India, China, and a lot of different countries in Europe. It’s really been widely accessible.” Times have changed since the days of American colonist Samuel Bowen.

The ultimate goal of SoyKB, said Joshi, is to improve soybean traits and support researchers in facilitating more enhanced soybean breeding techniques. “Our focus has been mainly on integrating multi-omics data sets about gene expression, protein expression, variations in the soybean, and then bridging it from this translational genomics side to the molecular breeding side, where it affects the soybean researchers and farmers,” Joshi said.

The SoyKB project started its computation with NSF-sponsored XSEDE, the eXtreme Science and Engineering Discovery Environment, through an allocation awarded in 2014 on the Stampede supercomputer at the Texas Advanced Computing Center. In all, it has used about 370,000 core hours on a massive project to sequence and analyze the genomes of over 1,000 soybean germplasm lines.

Click here to read the rest of the article.

Source: Jorge Salazar, TACC