Billions of tweets, Facebook updates, location-enabled applications and web searches are leading to an unprecedented amount of data “byproduct” that an increasing number of business are mining through in search of new insights, trends and sentiments. While predictive and real-time analytics hold enormous value for business, as one might imagine, governments to see an opportunity to understand citizens far better than ever as well.
According to a report this week by John Markoff in the New York Times, this past summer, an obscure government intelligence agency solicited ideas from the academic community about how it might be able to automatically “scan the Internet in 21 Latin American countries.”
This three-year experiment, which is slated to begin in April, would devise an automated data collection system that looks for patterns of “communication, consumption and movement of populations.” Rather vague, yes?
This “data eye in the sky” will use publicly available data to take the digital pulse of an entire region. In their view, this includes everything from IP traffic and web searches to more “easily” available sources like blogs and social media streams.
This type of research has been in the news quite a bit over the last year. Stories have emerged about everything from mining Twitter for brand sentiments to using supercomputing resources to predict the future. What is different here is that this is no longer a branding-driven or academic institution-geared initiative, this is a project backed with public funds on behalf of a small agency that is refusing to comment about the scope of the analytics endeavor.
The group behind the effort, the Intelligence Advanced Research Projects Activity, is part of the office of the director of national intelligence in the United States. As the NYT report claimed, the agency’s research would “not be limited to political and economic events, but would also explore the ability to predict pandemics and other types of widespread contagion, something that has been pursued independently by civilian researchers and by companies like Google.”
As the article’s author noted, there are potential privacy and more general logistics concerns involved. Markoff writes that “the ease of acquiring and manipulating huge data sets charting Internet behavior causes many researchers to warn that the data mining technologies may be quickly outrunning the ability of scientists to think through questions of privacy and ethics.”