Supercomputing Pipeline Aids DESI’s Quest to Create 3D Map of the Universe

July 21, 2020

July 21, 2020 — As neuroscientists work to better understand the complex inner workings of the brain, a focus of their efforts lies in reimagining and reinventing one of their most basic research tools: the microscope. Likewise, as astrophysicists and cosmologists strive to gain new insights into the universe and its origins, they are eager to observe farther, faster, and with increasing detail via enhancements to their primary instrument: the telescope.

A view of the Mayall Telescope (tallest structure) and the Kitt Peak National Observatory site near Tucson, Arizona. The Dark Energy Spectroscopic Instrument is housed within the Mayall dome. (Image: Marilyn Sargent/Berkeley Lab)

In each case, to unravel scientific mysteries that are either too big or too small to see with a physical instrument alone, they must work in conjunction with yet another critical piece of equipment: the computer. This means more data and increasingly complex datasets, which in turn impacts how quickly scientists can sift through these datasets to find the most relevant clues about where their research should go next.

Fortunately, being able to do this sort of data collection and processing in near real time is becoming a reality for projects like the Dark Energy Spectroscopic Instrument (DESI), a multi-facility collaboration led by Lawrence Berkeley National Laboratory whose goal is to produce the largest 3D map of the universe ever created. Installed on the Mayall Telescope at Kitt Peak National Observatory near Tucson, Arizona, DESI is bringing high-speed automation, high-performance computing, and high-speed networking to its five-year galaxy-mapping mission, capturing light from 35 million galaxies and 2.4 million quasars and transmitting that data to the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy user facility based at Berkeley Lab that serves as DESI’s primary computing center.

“We turn the raw data into useful data,” said Stephen Bailey, a physicist at Berkeley Lab who is the technical lead and manager of the DESI data systems. “The raw data coming off the telescope isn’t the map, so we have to take that data, calibrate it, process it, and turn it into a 3D map that the scientists within the broader collaboration (some 600 worldwide) use for their analyses.”

Over the last several years the DESI team has been using NERSC to build catalogues of the most interesting observational targets, modeling the shapes and colors of more than 1.6 billion individual galaxies detected in 4.3 million images collected by three large-scale sky surveys. The resulting DESI Legacy Imaging Surveys, hosted at NERSC, have performed their catalogue generation at NERSC over the course of eight data releases. The DESI project also leverages the Cosmology Data Repository hosted at NERSC, which contains about 900TB of data, and NERSC’s Community File System, scratch, and HPSS storage systems.

“The previous big survey was a few million objects, but now we are going up to 35-50 million objects,” Bailey said. “It’s a big step forward in the size of the map and the science you can do with.”

But storage is only part of the services NERSC delivers for DESI. The supercomputing center has also been instrumental in developing and supporting DESI’s data processing pipeline, which facilitates the transfer of data from the surveys to the computing center and to users. The project uses 10 dedicated nodes on the Cori supercomputer, enabling the pipeline to run throughout each night during a survey and ensure that the results are available to users by morning for same-day analysis, often helping to inform the next night’s observation plan. The DESI team also uses hundreds of nodes for other processing and expects to scale to thousands of nodes as the dataset increases. To facilitate data I/O, DESI depends on the NERSC data transfer nodes, which are managed as part of a collaborative effort between ESnet and NERSC to enable high performance data movement over the high-bandwidth 100Gb ESnet wide-area network.

“DESI is using the full NERSC ecosystem: computing services, storage, the real-time queue, and real-time data transfer,” Bailey said. “It’s a real game changer for being able to keep up with the data.”

Optimizing Python for CPUs and GPUs

While gearing up for the five-year DESI survey, which is expected to begin in late 2020, NERSC worked with the DESI team to identify  the most computationally intensive parts of the data processing pipeline and implement changes to speed them up. Through the NERSC Exascale Science Applications Program (NESAP), Laurie Stephey, then a postdoctoral researcher and now a data analytics engineer at NERSC, began examining the code.

The pipeline is written almost exclusively in Python – a specialty of Stephey’s – which enables domain scientists to write readable and maintainable scientific code in a relatively short amount of time. Stephey’s goal was to improve the pipeline’s performance while satisfying the DESI team’s requirement that the software remain in Python. The challenge, she explained, was in staying true to the original code while finding new and efficient ways to speed its performance.

“It was my job to keep their code readable and maintainable and to speed it up on the Cori supercomputer’s KNL manycore architecture,” Stephey said. “In the end, we increased their processing throughput 5 to 7 times, which was a big accomplishment – bigger than I’d expected.” This means that something that previously took up to 48 hours now happens overnight, thus enabling analysis during the day and feedback to the following night’s observations, Bailey noted. It also saves the DESI project tens of millions of compute hours at NERSC annually.

“New experiments funded by DOE approach NERSC for support all the time,” said Rollin Thomas who runs NESAP for Data. “And experiments that already use NERSC are capitalizing on our diverse capabilities to do new and exciting things with data. DESI’s sustained engagement with NERSC, through NESAP for Data, the Superfacility initiative and so on, is a model for other experiments. What we learn from these engagements helps us serve the broader experimental and observational data science community better.

And the optimization effort isn’t over yet. The next challenge is to make the DESI code compatible with the GPUs in NERSC’s Perlmutter system, which is slated to arrive in late 2020. Bailey and Stephey began this process last year – “Stephen was instrumental in rewriting the algorithm in a GPU-friendly way,” Stephey noted – but in April NERSC hired one of its newest NESAP postdocs, Daniel Margala, to take over. As a graduate student, Margala had previously worked with Bailey on the Baryon Oscillation Spectroscopic Survey, a DESI predecessor project, “so I’m familiar with a lot of the data processing that needs to be done for DESI,” he said.

So far, Margala’s focus is on preparing DESI’s code for GPUs so that it will be ready to leverage the full potential of the Perlmutter system. He is currently working with a small subset of DESI data on Cori’s GPU testbed nodes; the long-term goal is to make sure the software is ready to handle DESI’s entire five-year dataset.

“The astrophysicists and scientists on DESI are pretty comfortable using Python, so we are trying to do all of this in Python so that they will be able to understand the code we are writing and learn from it, contribute back to it, and maintain it going forward,” Margala said.

Over the next few years, NERSC resources will also be critical to another, larger goal of the DESI project: reprocessing and updating the data.

“Every year we are going to reprocess our data from the very beginning using the latest version of all of our code, and those will become our data assemblies that will then flow into the science papers for the collaboration,” Bailey said. “We only need 10 nodes at NERSC to keep up with the data in real time through the night, but if you want to go back and process 2, 3, 5 years of data, that’s where being able to use hundreds or thousands of nodes will allow us to quickly catch up on all that processing.”

About NERSC and Berkeley Lab

The National Energy Research Scientific Computing Center (NERSC) is a U.S. Department of Energy Office of Science User Facility that serves as the primary high-performance computing center for scientific research sponsored by the Office of Science. Located at Lawrence Berkeley National Laboratory, the NERSC Center serves more than 7,000 scientists at national laboratories and universities researching a wide range of problems in combustion, climate modeling, fusion energy, materials science, physics, chemistry, computational biology, and other disciplines. Berkeley Lab is a DOE national laboratory located in Berkeley, California. It conducts unclassified scientific research and is managed by the University of California for the U.S. Department of Energy. »Learn more about computing sciences at Berkeley Lab.


Source: Kathy Kincade, NERSC and Berkeley Lab

 

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion XL — were added to the benchmark suite as MLPerf continues Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing power it brings to artificial intelligence.  Nvidia's DGX Read more…

Call for Participation in Workshop on Potential NSF CISE Quantum Initiative

March 26, 2024

Editor’s Note: Next month there will be a workshop to discuss what a quantum initiative led by NSF’s Computer, Information Science and Engineering (CISE) directorate could entail. The details are posted below in a Ca Read more…

Waseda U. Researchers Reports New Quantum Algorithm for Speeding Optimization

March 25, 2024

Optimization problems cover a wide range of applications and are often cited as good candidates for quantum computing. However, the execution time for constrained combinatorial optimization applications on quantum device Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at the network layer threatens to make bigger and brawnier pro Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HBM3E memory as well as the the ability to train 1 trillion pa Read more…

MLPerf Inference 4.0 Results Showcase GenAI; Nvidia Still Dominates

March 28, 2024

There were no startling surprises in the latest MLPerf Inference benchmark (4.0) results released yesterday. Two new workloads — Llama 2 and Stable Diffusion Read more…

Q&A with Nvidia’s Chief of DGX Systems on the DGX-GB200 Rack-scale System

March 27, 2024

Pictures of Nvidia's new flagship mega-server, the DGX GB200, on the GTC show floor got favorable reactions on social media for the sheer amount of computing po Read more…

NVLink: Faster Interconnects and Switches to Help Relieve Data Bottlenecks

March 25, 2024

Nvidia’s new Blackwell architecture may have stolen the show this week at the GPU Technology Conference in San Jose, California. But an emerging bottleneck at Read more…

Who is David Blackwell?

March 22, 2024

During GTC24, co-founder and president of NVIDIA Jensen Huang unveiled the Blackwell GPU. This GPU itself is heavily optimized for AI work, boasting 192GB of HB Read more…

Nvidia Looks to Accelerate GenAI Adoption with NIM

March 19, 2024

Today at the GPU Technology Conference, Nvidia launched a new offering aimed at helping customers quickly deploy their generative AI applications in a secure, s Read more…

The Generative AI Future Is Now, Nvidia’s Huang Says

March 19, 2024

We are in the early days of a transformative shift in how business gets done thanks to the advent of generative AI, according to Nvidia CEO and cofounder Jensen Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Nvidia Showcases Quantum Cloud, Expanding Quantum Portfolio at GTC24

March 18, 2024

Nvidia’s barrage of quantum news at GTC24 this week includes new products, signature collaborations, and a new Nvidia Quantum Cloud for quantum developers. Wh Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Leading Solution Providers

Contributors

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

Google Introduces ‘Hypercomputer’ to Its AI Infrastructure

December 11, 2023

Google ran out of monikers to describe its new AI system released on December 7. Supercomputer perhaps wasn't an apt description, so it settled on Hypercomputer Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Intel Won’t Have a Xeon Max Chip with New Emerald Rapids CPU

December 14, 2023

As expected, Intel officially announced its 5th generation Xeon server chips codenamed Emerald Rapids at an event in New York City, where the focus was really o Read more…

IBM Quantum Summit: Two New QPUs, Upgraded Qiskit, 10-year Roadmap and More

December 4, 2023

IBM kicks off its annual Quantum Summit today and will announce a broad range of advances including its much-anticipated 1121-qubit Condor QPU, a smaller 133-qu Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire