Aug. 8, 2018 — The National Center for Supercomputer Applications’ (NCSA) Brown Dog project was awarded with “Best Technical Paper” at the annual Practice & Experience in Advanced Research Computing (PEARC) Conference in Pittsburgh, PA.
The award marks the NCSA ISDA Team’s third Best Paper award in as many years, following their “Best Accelerating Discovery in Scholarly Research Paper” award at PEARC 17 and XSEDE 16, PEARC’s predecessor.
This year’s paper, “Brown Dog: Making the Digital World a Better Place, a Few Files at a Time,” expands on the sustained improvement of Brown Dog, a project which seeks to make legacy long-tail data sets more accessible for modern research by providing a new level curation, transformation and storage tools.
Read the winning paper’s full abstract:
Brown Dog is a data transformation service for auto-curation of long-tail data. In this digital age, we have more data available for analysis than ever and this trend will only increase. According to most estimates, 70-80% of this data is unstructured, and together with unsupported data formats and inaccessible software tools, in essence, this data is not either easily accessible or usable to its owners in a meaningful way. Brown Dog aims at making this data more accessible and usable by auto-curation and indexing, leveraging existing and novel data transformation tools.
In this paper, we discuss the recent major component improvements to Brown Dog including transformation tools called extractors and converters; desktop, web and terminal-based clients which perform data transformations; libraries written in multiple programming languages which integrate with existing software and extend their data curation capabilities; an online tool store for users to contribute, manage and share data transformation tools and receive credit for developing them; cyberinfrastructure for deploying the system on diverse computing platforms leveraging scalability via Docker swarm; workflow management service for creatively integrating existing transformations to generate custom, reproducible workflows which meet research needs, and its data management capabilities.
This paper also discusses data transformation tools developed to support some scientific and allied use cases, thereby benefiting researchers in diverse domains. Finally, we briefly discuss our future directions with regard to production deployments as well as how users can access Brown Dog to manage their un-curated unstructured data.
Read the full paper, a collaboration between 20 authors from the University of Illinois, the University of Maryland, Boston University and Southern Methodist University.
NCSA’s Brown Dog project is supported by the National Science Foundation under Grant No. ACI-1261582.
The National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign provides supercomputing and advanced digital resources for the nation’s science enterprise. At NCSA, University of Illinois faculty, staff, students, and collaborators from around the globe use advanced digital resources to address research grand challenges for the benefit of science and society. NCSA has been advancing one third of the Fortune 50® for more than 30 years by bringing industry, researchers, and students together to solve grand challenges at rapid speed and scale.