SCIENCE & ENGINEERING NEWS
Gainesville, FLA. — The Universities of Florida and Chicago will lead an $11.9 million initiative that will lay the groundwork for a computer data grid of unprecedented speed and power, the National Science Foundation announced.
The initiative, called the Grid Physics Network, or GriPhyN, is funded by the largest grant in the National Science Foundation’s new Information Technology Research program, which supports long-term basic research on networking and information technology.
GriPhyN initially aims to give scientists a tool to interpret the vast amounts of data expected to flow from the world’s most ambitious physics and astronomy experiments, but it also could have applications in the business world and elsewhere, said Paul Avery, lead scientist and UF professor of physics.
“We need to plan for these experiments now, because we can’t wait till they start,” Avery said. “A personal computer today can do about a billion operations per second. The overall computing power we need is about 1 million times more than that.”
GriPhyN involves more than a dozen institutions nationally and will pioneer a new concept called virtual data, in which the entire resources of a scientific collaboration become a single vast computing and storage system. GriPhyN could be thought of as a Napster for scientists, where the tunes being downloaded are not purloined hits but crucial insights into the nature of the universe, said project co-leader Ian Foster, professor in computer science at the University of Chicago and associate director of the Mathematics and Computer Science Division of Argonne National Laboratory.
“Results will be computed only if and when needed,” Foster said. “Much of the time, the result you need will already have been computed by one of your colleagues, and the system will know where to find it.”
The initiative initially will benefit four physics experiments that will explore the fundamental forces of nature and the structure of the universe.
Two experiments at the European Laboratory for Particle Physics near Geneva will search for the origins of mass using the Large Hadron Collider, which will become the world’s highest-energy particle collider when it begins operation in 2005. The Laser Interferometer Gravitational-wave Observatory, based in Louisiana and Washington, will probe the gravitational waves of pulsars, supernovae and other phenomena. The Sloan Digital Sky Survey, conducted from Apache Point Observatory in New Mexico, is carrying out a massive automated survey of the stars.
Each of these experiments will produce huge amounts of data that scientists at different institutions around the world will want to search and manipulate. Genomics is another major area of science where data volumes are increasing much faster than analysis capabilities, Foster said. So large are the data collections that scientists anticipate they will be measured in petabytes, where one petabyte is roughly the amount of data that can be contained on 1 million personal computer hard drives. A personal computer hard drive contains approximately 1 gigabyte, which equals 1 billion bytes.
The world’s most powerful supercomputers today can store and process data measured in terabytes, each of which equals 1,000 gigabytes. By tapping into the computer power of multiple institutions around the world, a computational data grid could significantly boost both storage and calculating capacity. The result will not reside at one location or one supercomputer but rather will be spread throughout the institutions, much like power plants connected to an electrical grid.
“The electrical grid is a useful analogy, because users ranging from individuals to large organizations will consume computing and data resources in greatly differing amounts, and they will not care where those resources are located,” Avery said.
Scientists will need to have access to the data, but also the ability to carve out chunks of it and manipulate the chunks to produce results. Because of their size or the available computing power, the movement of these data chunks around the network will have to be scheduled at different times, a task that will require a kind of “intelligent” network.
“A worldwide community of perhaps thousands of physicists want to be able to have their combined computer, storage and network resources used as a single computing engine to solve their problems,” Foster said. “This requires new technology that can coordinate potentially thousands of processors, petabytes of storage and a variety of high-speed and low-speed networks and cause them to operate in some sense as a single analysis engine.”
GriPhyN will build on a base of proven grid technologies, in particular the Globus toolkit, to provide the basic services and capabilities of a computational grid.
Although intended initially for science, GriPhyN could also prove useful for large business applications, Avery said. For example, companies with multiple sales outlets don’t always store sales data in one central location. But marketers hoping to identify consumer buying habits may wish to comb through all the company’s sales data to ferret out buying habits.
“There’s a huge amount of interest in the technology that would allow companies to actually study these large archives of commerce data,” Avery said.
The $11.9 million NSF grant is for research and development only, with no money for hardware, Avery said. Researchers seek a total of $70 million in NSF grants for further research and equipment to build the system. Research and construction should take place simultaneously, with a target completion data of 2005, he said.