We live on a planet of more than seven billion people who speak more than 7,000 languages. Most of these are “low-resource” languages for which there are a dearth of human translators and no automated translation capability. This presents a big challenge in emergency situations where information must be collected and communicated rapidly across linguistic barriers.
To address this problem, linguists at Ohio State University are using the Ohio Supercomputer Center’s Owens cluster to develop a general grammar acquisition technology.
The research is part of an initiative called Low Resource Languages for Emergent Incidents (LORELEI) that is funded through the Defense Advanced Research Projects Agency (DARPA). LORELEI aims to support emergent missions, e.g., humanitarian assistance/disaster relief, peacekeeping or infectious disease response by “providing situational awareness by identifying elements of information in foreign language and English sources, such as topics, names, events, sentiment and relationships.”
The Ohio State group is using high-performance computing and Bayseian methods to develop a grammar acquisition algorithm that can discover the rules of lesser-known languages.
“We need to get resources to direct disaster relief and part of that is translating news text, knowing names of cities, what’s happening in those areas,” said William Schuler, Ph.D., a linguistics professor at The Ohio State University, who is leading the project. “It’s figuring out what has happened rapidly, and that can involve automatically processing incident language.”
Schuler’s team is using Bayseian methods to discover a given language’s grammar and build a model capable of generating grammatically valid output.
“The computational requirements for learning grammar from statistics are tremendous, which is why we need a supercomputer,” Schuler said. “And it seems to be yielding positive results, which is exciting.”
The team originally used CPU-only servers but is now using the GPU computing capability of Ohio Supercomputing Center’s Owens cluster to model a larger number of grammar categories. The goal is to have a model that can be trained on a target language in an emergency response situation, so speed is critical. In August, the team ran two simulated disaster simulations in seven days using 60 GPU nodes (one Nvidia P100 GPU per node) but a real-world situation with more realistic configurations would demand even greater computational power, according to one of the researchers.
Read the full announcement here:
Owens Cluster technical specs here: