TeraGrid ’09: Thriving in an Exponentially Changing World
Before he even took the podium, Ed Seidel was one of the buzz makers at the TeraGrid ’09 conference. The day before his keynote, it was announced that he was stepping in as acting assistant director of the National Science Foundation’s math and physical sciences directorate. For his talk at the conference, however, Seidel focused on the issues and efforts within his home at NSF, the Office of Cyberinfrastructure.
Computational science is “fundamentally changing in very serious ways and on exponential paths,” he said.
We can see those changes across many different disciplines. As an example, he used his own field of astrophysics.
In the early 1970s, “brilliant scientists were doing groundbreaking work” alone or in very small groups. Some of Stephen Hawking’s early work was done alone, based on mere kilobytes of data, and was illustrated with a hand-drawn sketch. About 20 years later, Seidel was part of a team of about 10 working with about 50 megabytes of data — five years after that, 50 gigabytes of data. Today, astrophysicists work on hundreds of terabytes of data, and they work on much more complicated problems that require researchers from a variety of subdisciplines.
Over the past three decades, the amount of data created by astrophysicists’ simulations has increased by 12 orders of magnitude, and “it’s happening everywhere.”
Geophysics problems have followed a similar curve. Atmospheric scientists, hydrodynamics researchers, data experts, economists, sociologists, and civil engineers all come together to understand severe weather like hurricanes and people’s reactions to them. They rely on tools like sensors delivering live data, computational models that couple several aspects of the problem to one another, and fast networks.
These sorts of “grand challenge communities” are intensely dynamic. Not every project requires every member of the team’s expertise nor every tool in the cyberinfrastructure toolbox. What they do require, according to Seidel, is a new way of thinking about computational science.
These communities have massive intellectual and technical capacity, and they drive larger simulations and need to process and make sense of growing amounts of data. Accordingly, they also require solutions to a set of what Seidel called “crises” in high-performance computing.
Despite these “crises,” Seidel remains optimistic. A key theme of his talk was that: “We are getting there. The TeraGrid is a good example of the fact that we are getting there. I believe it is the world’s best environment for advancing computationally oriented science and engineering research.”
“We are transforming science through cyberinfrastructure.”
The first of crisis and response Seidel discussed was of technology. Tomorrow’s leadership class supercomputing systems will have hundreds of thousands of processors, which presents two key challenges. Researchers using these systems to simulate the world around us will have to scale their codes to run on that many processors in order to get the full benefit of the systems. They’ll also have to employ advanced fault-tolerance strategies.
“With millions of components [processors, memory, disks, etc.], something is constantly failing,” he said. Keeping simulations running, even as components that are in use fail, requires gracefully moving the simulations to other components.
Efforts to address both of these are underway on resources throughout the TeraGrid. They’re also part of NSF’s Petascale Computing Resource Allocations program, which is supporting teams as they prepare their codes to run on NCSA’s Blue Waters sustained-petascale computing system.
Second, Seidel described a data and provenance crisis, pointing out that, by some reckonings, we have generated more unique digital data in the last year than we had collected in all prior human history. A cutting-edge simulation of a gamma-ray burst on a sustained-petascale supercomputer might output five petabytes of data. Moving, storing, and keeping proper track of those data would be a tremendous undertaking.
“Growth of data is biggest unsolved problem in cyberinfrastructure, in some ways in science,” he said. “We don’t know how to collect it from multiple communities and correlate it. Our goal is to catalyze a system of science collections that is open, extensive, and evolvable.”
Here, he pointed to the NSF DataNet program, which will fund five data nodes in a national network. They’ll be “not just repositories but community anchors for data-intensive science.” He also pointed to eXtreme Digital, the program that will be the third phase of the TeraGrid project.
Seidel described a crisis in software, as well. There are “millions of lines of code in a single complex application. And when we think about grand challenge communities, we’re connecting codes and modules. Software environments are with us for decades, so we need much more investment here,” Seidel said.
Finally, he discussed a crisis of education. After years of trying to educate students to think computationally, the question remains: How do we develop a workforce that can thrive professionally in this exponentially changing world? NSF is currently supporting a set of CAREER grants for young investigators at American universities who are tackling this issue in inventive ways.