Today high-performance computing is at the forefront of a new gold rush, a rush to discovery using an ever-growing flood of information and data. Computing is now essential to science discovery like never before. We are the modern pioneers pushing the bounds of science for the betterment of society. — SC17 General Chair Bernd Mohr, Jülich Supercomputing Centre
In an age defined and transformed by its data, several large-scale scientific instruments around the globe might be viewed as a mother lode of precious data.
With names seemingly created for a techno-speak glossary, these interferometers, cyclotrons, sequencers, solenoids, satellite altimeters, and cryo-electron microscopes are churning out data in previously unthinkable and seemingly incomprehensible quantities — billions, trillions and quadrillions of bits and bytes of electro-magnetic code.
Yet, policy-makers from the National Science Foundation (NSF) and others plotting future directions in science believe that hidden within these veritable mountain-sized mines of information are clues to questions that have confounded humanity since their first thoughts: answers about those bits of glitter in the night sky, the nature of matter, the causes of disease, the origins of life and even why and how we think about such things.
For this reason, the ability to convert this seemingly unintelligible digital data into rapid, meaningful discoveries has taken on added significance. Indeed, one of the NSF’s 10 Big Ideas for the future includes “Harnessing Data for the 21st Century Science and Engineering.”
Enter advanced or high-performance computing (HPC) which sifts and separates waste from valuable digital nuggets and, somewhat like a Rosetta Stone of the information age, decodes and translates this data into valuable insight.
“Advanced computing, along with experts charged with building and making the most of these HPC systems, has been critical to many Nobel Prizes, including work involving traditional modeling and simulation, to projects designed for more data-intensive workloads,” said Michael Norman, director of the San Diego Supercomputer Center (SDSC) at UC San Diego.
As evidence, Norman and others point to several recent Nobel Prizes in chemistry and physics — including international collaborations exploring the dark side of the universe and others delving into the dynamics of proteins critical for tomorrow’s targeted therapies.
Each has relied on the marriage of supercomputing technology and expertise with large-scale scientific instruments to achieve their goals, all connected by faster and faster high-speed communications networks. And each touches on other Big Ideas from the NSF, such as “The Era of Multi-Messenger Astrophysics” that include a collection of approaches to expand our observations and understandings of the universe; a “Quantum Leap” into the understanding the behavior of matter and energy at very small – atomic and subatomic – scales; and “Understanding the Rules of Life”, an initiative that will require convergence of research across biology, computer science, mathematics, behavioral sciences, and engineering.
Some of this effort is based on the solution of fundamental mathematical equations to create models or simulations using HPC systems now capable of generating quadrillions of calculations per second, such as Comet, funded by the NSF and housed at SDSC. Other HPC research requires the access, analysis, and interpretation of previously unfathomable amounts of data via a modality called high-throughput computing (HTC) being generated from a wide cross-section of sensors and detectors. Simulation and data analysis along with experimentation sometimes complement and even blend with one another for discovery.
“HTC is a way of consuming computer resources, including those we label as HPC,” said Frank Würthwein, professor of physics at UC San Diego and Distributed High-Throughput Computing Lead at SDSC. “The way these large-scale instruments do analysis requires the HTC ‘modality’ of computing. This is distinct from the standard ‘submit a job to the queue’ which is what people traditionally do for simulations.”
An Integrated Data Ecosystem
Those on the technological front line recognize that the challenges to keep up with the data explosion are enormous. Among other things, much of the science requires the integration of computational resources in an ecosystem that includes sophisticated workflow tools to orchestrate complex pathways for scheduling, data transfer, and processing. Massive sets of data collected through these efforts also require tools and techniques for filtering and processing, plus analytical techniques to extract key information. Moreover, the system needs to be effectively automated across different types of resources, including instruments and data archives.
Some suggest that all these components should be orchestrated into what’s being called a “super facility.” The goal, according to the U.S. Department of Energy, is to bring together users at multiple institutions “allowing geographically dispersed collaborators to tap into scientific resources and expertise, and analyze and share data with other users—all in real time and without having to leave the comfort of their office or lab.”
Said Würthwein: “These large-scale scientific instruments depend on large international cyberinfrastructures that a ‘super facility’ must integrate into seamlessly. The HPC system cannot be an island unto itself.”
The NSF concurs. “The grand challenges of today – protecting human health, understanding the food, energy, water nexus; exploring the universe on all scales – will not be solved by one discipline alone,” the agency stated in a 2017 report prepared for Congress. “They require convergence: the merging of ideas, approaches, and technologies from widely diverse fields of knowledge to stimulate innovation and discovery.”
Armed with ever-more powerful large-scale scientific instruments, research teams around the globe – some encompassing a wide variety disciplines – already are converging to build an impressive portfolio of scientific advances and discoveries, with supercomputers serving as critical linchpin for all these investigations.
On July 4, 2012, at the CERN laboratory for particle physics outside Geneva, Switzerland, a theory first proposed in 1964 by François Englert and Peter W. Higgs was confirmed with the discovery of a Higgs particle. The theory, which garnered the duo the 2013 Nobel Prize in physics, is a central part of the Standard Model of particle physics that describes how the world is constructed at its most fundamental level, from the intense waves of energy and primordial particles released from the “Big Bang,” to the planet we inhabit, to those glittering specks of light we observe in the night sky.
Under a partnership with UC San Diego physicists and the Open Science Grid (OSG), a multi-disciplinary research partnership funded by the U.S. Department of Energy and the NSF, SDSC’s Gordon supercomputer provided auxiliary computing capacity to process massive raw data generated by the Compact Muon Solenoid (CMS) — one of two general purpose particle detectors at the Large Hadron Collider (LHC). LHC experiments are among the largest ever seen in physics, with each experiment involving collaborations of close to 200 institutions in more than 40 countries, involving in excess of 3,000 scientists and engineers.
“Access to Gordon, and its excellent computing speed due to its flash-based memory, really helped push forward the processing schedule for us,” said Würthwein, a member of the CMS project and executive director of OSG “This was one of the first ever integrations of HTC with a large HPC system and with only a few weeks’ notice, we were able to gain access to Gordon and complete the runs, making the data available for analysis in time to provide crucial input toward the international planning meetings on the future of particle physics.”
In February 2016, an international team representing more than 20 countries announced the first-ever detection of gravitational waves in the universe, based on the tell-tale “chirp” signature of two black holes merging about 1.3 billion years ago. The collision sent what some referred to as a “ripple in the fabric of space time”: gravitational waves, hypothesized by Albert Einstein a century ago. The signal was detected on earth, first by the NSF-funded Laser Interferometer Gravitational Wave Observatory (LIGO) near Livingston, Louisiana; and then seven milliseconds later, and 1,890 miles away, at the second LIGO interferometer in Hanford, Washington. Three members of the team won the 2017 Nobel Prize in Physics for the discovery.
SDSC’s Comet was one of several supercomputers used by researchers to confirm the landmark discovery.
“LIGO’s discovery of gravitational waves from the binary black hole required large-scale data analysis to validate the discovery claim,” said Duncan Brown, The Charles Brightman Professor of Physics at Syracuse University’s Department of Physics who studies gravitational waveforms for black holes and neutron star binaries. “This includes measuring how significant the signal is compared to noise in the detector, and re-analyzing the data with simulated signals to ensure that we understand the astrophysical sensitivity of the search. Comet’s computer cycles were extremely important for us to complete large-scale simulations and fast validation of the search.”
Less than a year after the first discovery of gravitational waves, in October 2017 researchers announced they had detected gravitational waves generated by the collision of two neutron stars more than 130 light years from earth, via the two LIGO instruments and the Europe-based Virgo interferometer, followed shortly by multiple telescopes and satellites built to capture light from the universe. This combination of observational instruments bears testimony to what’s become known as multi-messenger astronomy (MMA), where multiple instruments — built to detect different forms of electromagnetic radiation – are choreographed with one another, essentially in real time, to view the same patch of sky. Once again, Comet was one of several HPC systems to verify the signal, with allocations from NSF’s Extreme Science and Engineering Discovery Environment (XSEDE) and the OSG.
“The correlation of the three interferometers, 2 from LIGO and one from Virgo significantly shrunk the area in the sky for where to look,” said Würthwein.
Added Syracuse University’s Brown: “Comet’s contribution through the OSG and XSEDE allowed us to rapidly turn around the offline analysis in about a day. That, in turn allowed us to do several one-day runs, as opposed to having to spend several weeks before publishing our findings.”
Since being postulated in December 1930 by Wolfgang Pauli, cosmologists have been hunting for neutrinos: subatomic particles that lack an electric charge, particles once described as “the most tiny quantity of reality ever imagined by a human being.” For the most part, cosmic neutrinos are believed to have been created about 15 billion years ago, soon after the birth of the universe. Others emerged more recently from some of the most violent actions in the universe, such as exploding stars, gamma ray bursts, black holes and neutron stars. But unlike photons and other charged particles, neutrinos can emerge from their sources and, like cosmological ghosts, pass through the universe unscathed.
To help catch these near-massless messengers from deep space, an international team of researchers funded by the NSF set up IceCube, an observatory containing an array of 5,160 optical sensors deep within a cubic kilometer of ice at the South Pole. Encompassing 300 physicists from 49 institutions in 12 countries, IceCube already has achieved its primary goal of detecting the extraterrestrial flux of very high-energy neutrinos.
Frank Halzen, principal investigator of the IceCube Observatory and physics professor at the University of Wisconsin-Madison, explained the importance of the Comet supercomputer for isolating the signature pattern of neutrinos: “The IceCube neutrino detector transforms natural Antarctic ice at the South Pole into a particle detector. Progress in understanding the precise optical properties of the ice leads to increasing complexity in simulating the propagation of photons in the instrument and to a better overall performance of the detector.”
“The photon propagation in the ice is very well-suited to run in graphics processing units (GPUs) hardware, such as those on Comet.” Halzen continued. “Pursuing efficient access to a large amount of GPU computing power is therefore of great importance to ensure that future IceCube analysis reaches the maximum precision and that the full scientific potential of the instrument is exploited.”
Stay tuned for Part II