July 23, 2014

Big Data Needs Big Funding: NSF’s Jahanian Makes The Case

Ceci Jones Schrock
Screen Shot 2014-07-23 at 2.45.05 PM

There’s no doubt that advanced computing and cyberinfrastructure are accelerating the pace of discovery and innovation in almost every field of inquiry. For Farnam Jahanian, this point was driven home in a recent meeting with senior members of the US Department of Agriculture. Yes, even farmers are getting in on the data/computation revolution.

“At the Department of Agriculture, we talked about important issues such as food safety and vertical farming. You’d be surprised how many times data was mentioned: data integration, data management, data curation, computer simulation, and modeling,” said Jahanian, National Science Foundation assistant director for the Computer and Information Science and Engineering (CISE) Directorate. Jahanian shared this anecdote with attendees of his plenary talk, “The Transformative Impact of Computing and Communication in a Data-Driven World,”at the 2014 Extreme Science and Engineering Discovery Environment (XSEDE) conference in Atlanta July 13-18.

As the outgoing CISE assistant director whose tenure ends August 31, Jahanian talked about the societal advances cyberinfrastructure is enabling all over the world – and the need for the United States to increase funding for this important work.

“As a nation, we haven’t recognized that [funding cyberinfrastructure] is the cost of doing business. We have to fund it. … We have to make it a priority for the country,” said Jahanian. “I want to give you my assurance, as I’m entering the end of my tenure at NSF, that these dialogues are taking place within the foundation—they have been for a while—but the volume has increased and the recognition that we need to do more and more has grown.”

To Jahanian, increased funding will lead to advances for humans across the globe. From cancer treatments and renewable energy sources to weather models and ideas for reducing traffic congestion, data and computation are yielding more accurate predictions – saving money and lives.

As an example, Jahanian cited the work of Daphne Koller, a computer science professor at Stanford University, whose data mining research is yielding breakthroughs in the study of breast cancer. As the most common cancer among American women, breast cancer kills 40,000 people a year.

“Cyberinfrastructure resources are providing a deeper understanding of causal relationships, not just correlation,” he said. “By doing image analyses of breast cancer biopsies, Koller and her colleagues have identified feature sets that are great predictors for cancer survival. And here’s what’s amazing: these features are not from cancer tissues themselves – they are from adjacent tissues, something that had gone unnoticed by pathologists and clinicians for decades,” Jahanian noted.

Discoveries like these are made possible by learning to harness the data explosion that is all around us – what Jahanian calls a “data tsunami.” Consider these facts: the number of networked devices equaled the global human population just a few years ago. By 2015 or 2016, that number will be twice the population of earth. By 2015, it will take five years to view all the videos crossing IP networks in one second.

Mobile devices and social media are contributing to the data influx in huge ways. In 2012, approximately 1 billion people spent about 10 minutes a day on Facebook. By 2013, Facebook was collecting 500TB of data every day. In the Twitterverse, more than 250 million active users produce 400 million tweets daily.

The key to managing all that data, according to Jahanian, is a well-funded cyberinfrastructure system. “Discovery and intervention across disciplines demands advanced cyberinfrastructure, but demand has far exceeded our investments over the last decade,” Jahanian said. “The nation really does need another significant infusion of funding, along the scale of the early days of high performance computing. It’s time for us to do that, but by taking a much larger broader perspective of what cyberinfrastructure is: computation, data, people, and so on.”