NASA’s Solar Dynamics Observatory carefully watches our Sun for changes, hoping to better understand how changes in the Sun affect Earth and nearby space. The quantity of data, however, is staggering. The Solar Dynamics Observatory takes an image of the sun every 1.3 seconds, producing over 60,000 images per day and, to date, amassing over 18 petabytes of data – enough to fill the hard drives of around 18,000 standard laptops. Now, GPU-powered, high-performance data science workstations are helping to sift through that mountain of solar data.
NASA uses an algorithm to sort through the solar data, removing errors like bad pixels with high accuracy. However, across 150 million images, billions of pixels end up misclassified as “bad pixels” by the algorithm, hindering NASA’s research. NASA found itself computationally bottlenecked, looking at entire years for CPUs to sift through all the files.
“For scientists, a year still wouldn’t be enough time because we like to explore and iterate the results we find,” said Raphael Attie, a solar astronomer at NASA’s Goddard Space Flight Center. “Even with one year of computation, it would still take us up to ten years to find concrete results.”
So the researchers turned to GPU-powered workstations: specifically, Z by HP data science workstations, each equipped with two of Nvidia’s Quadro RTX 8000 GPUs. According to HP, the Z workstations are capable of interacting with up to 5 billion dataset rows in milliseconds. Suddenly, what was looming over NASA as a years-long task produced results in less than a week.
“The data science workstations completely changed the field of possibility for us,” said Michael Kirk, a research astrophysicist at NASA. “These computations that previously weren’t imaginable, we can now do 10-150x faster than we thought possible.”
The researchers opted for local workstations over cloud environments due to reliability concerns, aiming to avoid workflow interruptions in the data analysis.
“I find that a necessary condition for a responsive workflow is to have the input data rapidly accessible by your GPU devices,” Attie said. “If it’s not possible to have the data locally in the same machine as the GPU device, the network needs to be very fast and resilient, as AI applications often need fast access to the data.”
The researchers are still working on completing their filtering and analysis of the “bad pixels” in the 18 petabytes of solar images, but they already have plans for the future. Next, the team will perform the reverse of their current process: analyzing pixels marked as “good pixels” to check for false positives, rather than false negatives.