You might have seen one of those YouTube videos: they begin on Earth, slowly zooming out to the Moon, the Solar System, the Milky Way, beyond – and suddenly, you’re looking at trillions of stars. It’s a lot to take in – even, apparently, for astronomers. According to a new report led by Michael Biehl and Kerstin Bunte of the University of Groningen, the massive amount of astronomical data being collected is growing unwieldy.
Thanks to next generation sky surveys, space missions, and instrumentation, astronomy is entering the realm of petascale computing. Astronomical data archives are expanding into the multi-terabyte and petabyte domains, and due to virtual observatory projects, that data is becoming more and more accessible. Existing datacenters can scale up to meet the storage demand – but that still leaves the matter of effectively handling the data, which is increasingly complex and heterogeneous.
The burgeoning field of time-domain astronomy – the study of how objects in space change over time – exemplifies this complexity. Synoptic sky surveys (which record wide portions of the sky repeatedly and often) are providing a huge amount of sparse, heterogeneous, noisy time series data. These data must be analyzed and sorted as quickly as possible to allow for follow-up observations of time-sensitive events – all without sacrificing quality or rigor.
The answer to these astronomical challenges? Astroinformatics. A relatively new discipline at the nexus of advanced statistics, astronomy, and computer science, astroinformatics is designed to address the challenge of more people analyzing more astronomical data that is more complex.
Through astroinformatics, astronomers seek to use new techniques based largely on machine learning at each step of the process: data acquisition, fusion with pre-existing data, data analysis and visualization, and data interpretation. This machine learning approach has proved invaluable, helping astronomers automatically classify stars and galaxies, sort galaxies into morphologies, and identify star-forming regions.
Astroinformatics remains relatively nascent – the authors outline a litany of key astroinformatics challenges, ranging from “outlier and novelty detection in observational data” to “simulation of astrophysical models and related inference problems.” Specifically, they warn, data management tools are not sufficiently scalable and are likely to begin buckling under the weight of astronomical data with greatly increased dimensionality.
The authors also highlight the looming importance of multi-messenger astronomy, which integrates the large and distributed data sets of electromagnetic radiation, gravitational waves, neutrinos, and cosmic rays, and reproducibility through standardized preservation of past results.
Still, the authors seem fairly hopeful – computer scientists have already begun to address some of these issues, they say, and the astronomical community has responded “well and in a timely manner” to challenges in the past. After all, as the authors write: “Data, no matter how large and complex they are, are just incidental to the real task of scientists: knowledge discovery.”
About the paper
The paper discussed in this article, “Machine Learning and Data Analysis in Astroinformatics,” was authored by Michael Biehl, Kerstin Bunte, G. Longo, and P. Tino. It can be found as a publicly-available publication of the University of Groningen.