The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing
May 18, 2007
WEST LAFAYETTE, Ind., May 17 -- The world's largest science experiment, a physics experiment designed to determine the nature of matter, will produce a mountain of data. And because the world's physicists cannot move to the mountain, an army of computer research scientists is preparing to move the mountain to the physicists.
Thomas Hacker, a research assistant professor in Purdue University's Discovery Park Cyber Center and with Information Technology at Purdue (ITaP), says the particle physics collider experiment taking place at the European nuclear physics facility CERN will involve scientists around the world.
"Researchers usually have to be in the same location as the instrument to access to the data," Hacker says. "In this case, to bring the data to the researchers, we are building a huge scientific instrument that spans the globe to bring the data to the researchers."
At universities across the United States and at other institutions around the world, teams of computer research scientists and physicists are preparing for the largest physics experiment ever.
"Like an exercise session getting you ready for the big game, we've been going to the physics gym," Hacker says. "We are testing the ability of the infrastructure using simulation data. At Purdue, everyone is building and testing systems to make sure the computing infrastructure is ready when the detector comes online later this year."
The collider will give protons a pop hoping to catch a glimpse of the Big Bang, or at least the subatomic particles that are thought to have last been seen at the big event 10 billion to 15 billion years ago that led to the formation of the universe. The CERN collider will begin producing data in November, and from the trillions of collisions of protons it will generate 15 petabytes of data per year.
By comparison, 15 petabytes would be the equivalent of all of the information in all of the university libraries in the United States seven times over. It would be the equivalent of 22 Internets, or more than 1,000 Libraries of Congress. And there is no search function.
"Once this data is distributed to the physicists at the universities, they will require massive amounts of computing power and data storage in order to analyze it," Hacker says. "When the data transfer is live, we will stream data out to physicists as we quickly as we can - real time if possible."
The experiment has a name only a scientist could love: the CERN CMS project. CERN is the abbreviation for the European Organization for Nuclear Research, and CMS is the abbreviation for compact muon solenoid, a type of electromagnet.
CMS is an electronic detector that is searching for never-before-detected subatomic particles, especially a particle known as Higgs boson, which is a missing piece in the jigsaw puzzle of the theory of particle physics (boson is the name physicists give subatomic particles with particular properties). If discovered, it would be an entirely new type of matter.
Dubbed "the God Particle" nearly a decade ago by Nobel prize-winning physicist Leon Lederman, the Higgs boson would explain why some particles have any mass at all, while others, such as photons, do not. Discovery of the Higgs boson is one of the top prizes in modern physics, and its discovery would validate the Standard Model, a theory of particles physics in place since the 1970s.
Norbert Neumeister, assistant professor of physics and the principal investigator on the CMS project at Purdue, says the CMS experiment, along with a similar experiment also taking place at CERN called ATLAS, will bring new insights about the Standard Model and subatomic particles.
"We believe the unprecedented energy range and sensitivity of this new particle accelerator, combined with the special capabilities of the CMS experiment, will lead to a breakthrough understanding of nature," he says. "Everybody hopes to find the Higgs particle, but the ultimate goal is to discover something new and completely unexpected."
The experiments will take place in CERN's Large Hadron Collider, known as the LHC. In the United States, seven universities - known as Tier 2 sites - will receive the CMS data from Fermi National Accelerator Laboratory outside Chicago (the Tier 1 site). The data will be processed and then analyzed by university physicists. (Brookhaven National Laboratory is the Tier-1 site for the CERN ATLAS project.)
Internationally there are 11 Tier-1 sites and more than 100 Tier-2 sites, although outside the United States the Tier-2 sites are organized in a different, less centralized, manner.
In the United States, CMS Tier-2 facilities are Purdue; the University of California, San Diego; Caltech; University of Nebraska; University of Wisconsin; University of Florida; and Massachusetts Institute of Technology.
Frank Würthwein, professor of physics at the University of California, San Diego, says that although the experiment is taking place at CERN in Geneva, the U.S. Tier-2 sites play an integral role.
"The actual data analysis by physicists will take place at Tier-2 sites, so it's important that we can receive whatever data our physicists need," Würthwein says. "We will take data from CERN and push it across the worldwide networks to these seven places. They will receive it, analyze it, the whole gimbang. Once we have the data in all these places, a physicist will be able to submit jobs from their office computer, or even from a laptop in Starbucks."
In tests so far, the CMS Tier-2 sites have been able to support up to 50,000 jobs per day, and the goal is to be able to support 100,000 computing jobs per day by late spring.
"In an exercise last fall we were able to support 50,000 jobs, so we are getting there," Würthwein says. "The next six to nine months are going to be very hectic to get as close to good tools as we can possibly get. Putting the cyberinfrastructure together for this project is no easy feat. There's a lot of work yet to do, and a lot of people will have to do a lot of heavy lifting. This is not just pushing a few buttons."
Les Robertson, leader of the LHC Computing Grid project, based in CERN, says that the entire system is designed to be as user friendly for the physicists as possible.
"CERN, the Tier-1s and the Tier-2s together form a worldwide computing and data grid," Robertson says. "They are bound together by a layer of software called middleware that is designed to hide the complexity of this network from the user, and use resources at sites around the globe as effectively as possible."
Much of the behind-the-scenes middleware used at the Tier-2 sites is being developed by the Open Science Grid consortium.
Ruth Pordes, executive director of the Open Science Grid, says the middleware used for CMS and ATLAS is an enhancement of existing software.
"The software is useful now for any scientists who need to process, store and access vast amounts of data," Pordes says. "And given the ever-growing internationalism of science, we're working with our peers to create a worldwide interoperable grid."
Grid computing is essential to the success of the project, Hacker says. Purdue and UC-San Diego are the only two Tier-2 sites connected to the National Science Foundation's TeraGrid research network, and Purdue also connects to Fermilab through StarLight and Indiana's I-Light, which are both a high-speed fiberoptic networks.
"An excellent network infrastructure is critical for the success of this project," Hacker says. "Purdue is involved in many networking projects focused on high-performance networking for research, such as the Teragrid and I-Light."
Indiana University is playing a key role in CERN's ATLAS project, which, like the CMS project, aims to discover insights into subatomic physics and the nature of matter.
"Together, the two state universities in Indiana are playing a key role in experimental physics," Hacker says. "Because of its science grid connections and computational resources, the state of Indiana is helping to lead the way at the frontier of science."
-----
Source: Steve Tally, Purdue University
Even though the cost of servers still dominates the datacenter budget, storage is actually on a steeper growth curve. HPC storage, in particular, is being singled out as high-growth opportunity. Vendors are scrambling to keep up.
Read More...
Google datacenters most energy efficient; Cluster Resources to demo Moab Hybrid Cluster; Red Hat Linux releases HPC distro. John West recaps those stories and more in our weekly wrap-up.
Read More...
Last week, IBM and King Abdullah University of Science and Technology announced a collaboration to build "Shaheen," a 222 teraflop Blue Gene/P supercomputer. When deployed in 2009, it will represent the most powerful computer in the Middle East and one of the top systems in the world.
Read More...
Oct 06 | The Register | Does the HP Oracle Database Machine represent InfiniBand's big chance to break out its HPC niche? Read more...
Oct 06 | BusinessWeek | A body scan can save a lot of time in the fitting room, and fields from medicine to architecture are adopting 3D computing applications. Read more...
Oct 03 | UCSD News | Despite the evolution of computer science over the past 30 years, structural engineering -- hindered by a reluctance to adapt to digital innovations -- has remained relatively unchanged as a discipline. Read more...
Oct 02 | New York Times | Silcon Valley is starting to feel the effects of the credit crunch. Read more...
Oct 01 | Data Center Knowledge | Google today disclosed details of its data center energy usage, confirming that it operates some of the most efficient facilities in the world. Read more...
Sep 04 | | Disk drives are approximately 250 times denser today than a decade ago. This is good news for users who are creating, manipulating and storing more data than ever before. It gives them an opportunity to derive more value from their stored data and lowers the capital acquisition and operating expense associated with that data.
BlueArc's Titan architecture represents an evolutionary step in file servers by creating a hardware-based file system that can scale bandwidth, IOPS, and overall data capacity well beyond conventional software-based devices. With its ability to virtualize a massive storage pool of up to four usable petabytes of tiered storage, Titan can scale with growing data requirements, offering a competitive advantage for businesses, researchers, or other enterprises seeking to better manage data growth while still ensuring optimal performance.
Get updates and insights on the High Productivity Computing industry delivered driectly to your inbox.