In the Northwest suburbs of Geneva on the Franco–Swiss border sits CERN (the European Organization for Nuclear Research), the world’s largest particle physics laboratory. CERN is home to the most ambitious particle physics project of our time, the search for the Higgs Boson particle, a hypothetical sub-atomic particle that is thought to give mass to all other particles. This elusive speck is so treasured that some have dubbed it the God Particle. In order to coax the Higgs Boson out from hiding, detectors at the Large Hadron Collider (LHC), CERN’s giant particle accelerator, are smashing together beams of high-energy protons. The most promising collisions are converted into electronic signals, and sent to a computer farm where they undergo a digital reconstruction. But this is only the beginning of the data’s long journey.
An article in Nature looks at the path the data must travel in order to reach member research sites, where the analysis can commence.
Here’s a breakdown of the process:
Even after rejecting 199,999 of every 200,000 collisions, the detector churns out 19 gigabytes of data in the first minute. In total, ATLAS and the three other main detectors at the LHC produced 13 petabytes (13 × 10^15 bytes) of data in 2010, which would fill a stack of CDs around 14 kilometres high. That rate outstrips any other scientific effort going on today, even in data-rich fields such as genomics and climate science (see Nature 455, 16–21; 2008). And the analyses are more complex too. Particle physicists must study millions of collisions at once to find the signals buried in them — information on dark matter, extra dimensions and new particles that could plug holes in current models of the Universe. Their primary quarry is the Higgs boson, a particle thought to have a central role in determining the mass of all other known particles.
The data get sent to the Worldwide LHC Computing Grid, an extensive network of linked computers, comprising approximately 200,000 processing cores and 150 petabytes of disk space. From here it is distributed to 34 countries through leased data lines at a rate of 5 gigabytes per second. All the researchers need a copy of the data, but if they all logged into the system at the same time, it would overload and shut down. So instead, the grid automatically routes copies of the data to the participating research institutions.
The datasets are split up so that different research groups each get relevant pieces. When the information reaches its destination, the project partners will access it, and run their experiments. As more and more data are collected, a picture begins to form. With each petabyte of data that flows through the grid, the scientists could be one step closer to finding proof of the God Particle, and achieving a deeper understanding of the big bang.