Staff in NSF’s XSEDE network have created a script that avoids file-system tangles seen when scaling some common scientific applications for use on HPC systems, according to Antonio Gomez Iglesias of the Texas Advanced Computing Center (TACC). The fix can be useful to many users employing similar applications, he said Tuesday in a presentation at the XSEDE15 conference in St. Louis, Mo.
“Many users in HPC just use scientific applications like black boxes,” said Gomez, a member of XEDE’s Extended Collaborative Support Service. While that can be a boon for researchers who don’t want to learn HPC techniques to do their work, “the implementation is not always the best” when applying programs written for desktop computers to the parallel computing environment.
The National Center for Biotechnology Information’s Basic Local Alignment Search Tool (NCBI-BLAST), for example, works by opening the database for a query, reading the data, and then closing the database, Gomez explained. In TACC’s Stampede system, this was creating a huge number of requests that stressed the file system.
The collaborators developed a script that stores the NCBI-BLAST data temporarily in the local disk rather than going back to the file system. This change greatly lowers the impact on the file system as well as reduces the time required for test computations by nearly four-fold when using 900 cores on Stampede, Gomez said.
The team that developed the new script includes Gomez, Arun Seetharam of Iowa State University, Catherine Purcell and John Hyde of NOAA, Philip Blood of the Pittsburgh Supercomputing Center and Andrew Severin of Iowa State University. Scripts and documentation for running the HPC-optimized BLAST can be found at https://github.com/ISUgenomics/StampedeBLAST.