Argonne’s Nexus Initiative Pivotal in DOE’s Integrated Research Infrastructure Development

February 1, 2024

Feb. 1, 2024 — When the massive upgrade at the Advanced Photon Source (APS) at the U.S. Department of Energy’s (DOE) Argonne National Laboratory is completed later this year, experiments at the powerful X-ray light source are expected to generate 100-200 petabytes, or 100-200 million gigabytes, of scientific data per year.

Argonne’s Nexus effort is working to advance data-intensive science via an integrated research infrastructure that connects experimental facilities, supercomputing resources and data technologies. Credit: Argonne.

That’s a substantial increase over the approximately 5 petabytes that were being produced annually at the APS, a DOE Office of Science user facility at Argonne, before the upgrade. And if you consider the DOE’s four other light sources, the facilities are projected to yield an exabyte, or 1 billion gigabytes, of data per year in the coming decade.

“An exabyte of data is equivalent to streaming 1.5 million movies every day for a year,” said Nicholas Schwarz, Argonne computer scientist and lead for scientific software and data management at the APS. “But we need to do a lot more than simply move a lot of data around. For the X-ray experiments carried out at the APS, we need to use advanced computational tools to look at every pixel of every frame, analyze the data in near real time, and use the results to make decisions about the next experiment.”

“To process all this data quickly, we require a lot of computing capabilities, from big computers and data storage, to analysis software, to the computational fabric that ties all of these resources together,” he added.

The growing deluge of scientific data is not unique to light sources. Telescopes, particle accelerators, fusion research facilities, remote sensors and other scientific instruments also produce large amounts of data. And as their capabilities improve over time, the data generation rates will only continue to grow.

“The scientific community’s ability to process, analyze, store and share these massive datasets is critical to gaining insights that will spark new discoveries,” said Michael E. Papka, Argonne deputy associate laboratory director for computing, environment and life sciences. Papka also serves as director of the Argonne Leadership Computing Facility (ALCF), a DOE Office of Science user facility at Argonne, and is a professor of computer science at the University of Illinois Chicago.

Argonne’s Nexus effort is playing a pivotal role in advancing DOE’s vision to build an integrated research infrastructure (IRI). Developing an IRI would accelerate data-intensive research by seamlessly integrating DOE’s cutting-edge experimental facilities with its world-class supercomputing, artificial intelligence (AI) and data resources.

For over a decade, Argonne has been working to develop tools and methods to connect its powerful computing resources with large-scale experiments. Merging ALCF supercomputers with the APS has been a significant focus of the lab’s IRI-related research, but the work has also included collaborations with the DIII-D National Fusion Facility in California and CERN’s Large Hadron Collider in Switzerland. DIII-D is a DOE Office of Science user facility.

“We’ve been partnering with experimental facilities for several years now to help them use our supercomputing resources to process huge amounts of data more quickly,” Papka said. “With the launch of Nexus, we have a vehicle to coordinate all of our research and collaborations in this space to align with DOE’s broader efforts to lead the new era of integrated science.”

Rachana Ananthakrishnan of Globus (left) and Tom Uram of Argonne (right) give a talk on Nexus at the DOE booth at the SC23 Conference. Credit: Argonne.

Argonne’s ongoing work has led to the creation of tools for managing computational workflows and the development of new capabilities for on-demand computing, giving the lab valuable experience to support the DOE IRI initiative. Globus and the ALCF Community Data Co-Op (ACDC) are critical resources in enabling the IRI vision. Globus, a research automation platform created by researchers at Argonne and the University of Chicago, is used to manage high-speed data transfers, computing workflows, data collection and other tasks for experiments. ACDC provides large-scale data storage capabilities, offering a portal that makes it easy to share data with external collaborators across the globe.

The ALCF’s upcoming Aurora exascale supercomputer will also bolster the lab’s IRI efforts, providing a significant boost in computing power and advanced capabilities for AI and data analysis.

Streamlining Science

The IRI will not only enable experiments to analyze vast amounts of data, but it will also allow them to process large datasets quickly for rapid results. This is crucial as experiment-time analysis often plays a key role in shaping subsequent experiments.

For the Argonne-DIII-D collaboration, researchers demonstrated how the close integration of ALCF supercomputers could benefit a fast-paced experimental setup. Their work centered on a fusion experiment that used a series of plasma pulses, or shots, to study the behavior of plasmas under controlled conditions. The shots were occurring every 20 minutes, but the data analysis required more than 20 minutes using their local computing resources, so the results were not available in time to inform the ensuing shot. DIII-D researchers teamed up with the ALCF to explore how they could leverage supercomputers to speed up the analysis process.

“Every time they took a shot, we started a job at the ALCF. It fetched the data from DIII-D, ran the analysis, and pushed the results back to them in time to calibrate the next shot,” said Thomas Uram, Argonne computer scientist and the IRI lead at the ALCF. “Because we had more computing power than DIII-D had available locally, we could analyze their data faster and at a resolution 16 times greater than their in-house systems. Not only did they get the results in advance of the next shot, they also got significantly higher resolution analyses to improve the accuracy of their configuration.”

Scientific visualization of Bragg diffraction peaks in a 15×15 pixel patch of an undeformed bi-crystal gold sample. The height denotes photon counts. This data was acquired at the APS and processed on ALCF supercomputers. Credit: Argonne.

Many experiments at the APS will also benefit from near-real-time data analysis, including battery research, the exploration of materials failure and drug development.

“By getting analysis results in seconds or less instead of hours, days or even weeks, scientists can gain real-time insight into their experiments as they occur,” Schwarz said. “Researchers will be able to use this feedback to steer an experiment and zoom in on a particular area to see critical processes, like the molecular changes that occur during a battery’s charge and discharge cycles, as they are happening.”

A fully realized IRI would also impact the people conducting the research. Scientists must often devote considerable time and effort to managing data when running an experiment. This includes tasks like storing, transferring, validating and sharing data before it can be used to gain new insights.

“The IRI vision is to automate many of these tedious data management tasks so researchers can focus more on the science,” Uram said. “This would substantially streamline the scientific process, freeing up scientists so they have more time to form hypotheses while experiments are being carried out.”

Supercomputing on Demand

Getting instant access to DOE supercomputers for data analysis requires a shift in how the computing facilities operate. Each facility has established policies and processes for gaining access to machines, setting up user accounts, managing data and other tasks.

“If a researcher is set up at one computing facility but needs to use supercomputers at the other facilities, they would have to go through a similar set of steps again for each site,” Uram said. “And that takes time. It takes time away from doing actual science.”

Once a project is set up, researchers submit their “job” to a queue, where they wait their turn to run on the supercomputer. While the traditional queuing system helps optimize supercomputer usage at the facilities, it doesn’t support the rapid turnaround times needed for the IRI.

To make things easy for the end users, the IRI will require implementing a uniform way for experimental teams to gain quick access to the DOE supercomputing resources.

To that end, Argonne has developed and demonstrated methods for overcoming both the user account and job scheduling challenges. The co-location of the APS and the ALCF on the Argonne campus has offered an ideal environment for testing and demonstrating such capabilities. When the ALCF launched the Polaris supercomputer in 2022, four of the system’s racks were dedicated to advancing the integration efforts with experimental facilities.

In the case of user accounts, the existing process can get unwieldy for experiments involving several team members who need to use the computing facilities for data processing. The Argonne team has piloted the idea of employing “service accounts” that provide secure access to a particular experiment instead of requiring each team member to have an active account.

“This is important because many experiments have a team of people collecting data and running analysis jobs over the course of a few days or a week,” Uram said. “We need a way to support the experiment independent of who is operating the instruments that day.”

To address the job scheduling issue, the Argonne team has set aside a portion of Polaris nodes to run with “on-demand” and “preemptable” queues. This approach allows time-sensitive jobs to run on the dedicated nodes immediately.

The team has completed successful test runs of the service accounts and on-demand and preemptable queues on Polaris using data generated during an APS experiment. The runs were fully automated with no humans in the loop.

“This capability is truly exciting for the experimental integration efforts here at Argonne, but there is much work ahead to develop workable solutions that can be used across all DOE experimental and computing facilities,” Papka said.

Bringing It All Together

While Argonne and its fellow national labs have been working on projects to demonstrate the promise of an integrated research paradigm for the past several years, DOE’s Advanced Scientific Computing Research (ASCR) program made it a more formal initiative in 2020 with the launch of the IRI Task Force. Comprised of members from several national labs, including Argonne’s Schwarz, Uram, Jini Ramprakash and Corey Adams, the task force identified the opportunities, risks and challenges posed by such an integration.

In 2022, ASCR launched the IRI Blueprint Activity to create a framework for implementing the IRI. The blueprint team, which included Schwarz and Ramprakash, released an IRI report that describes a path forward from the lab’s individual partnerships and demonstrations to a broader long-term strategy that will work across the DOE ecosystem. Over the past year, the blueprint activities have started to formalize with the introduction of IRI testbed resources and environments. Now in place at each of the DOE computing facilities, the testbeds facilitate research to explore and refine IRI ideas in collaboration with teams from DOE experimental facilities.

“With the launch of the Nexus effort here at Argonne, we will continue to leverage our collective knowledge, expertise and resources to help DOE and the larger scientific community enable and scale this new paradigm across a diverse range of research areas, scientific instruments and user facilities,” Uram said.


Source: Jim Collins, ALCF

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

U.S. Quantum Director Charles Tahan Calls for NQIA Reauthorization Now

February 29, 2024

(February 29, 2024) Origin stories make the best superhero movies. I am no superhero, but I still remember what my undergraduate thesis advisor said when I told him that I wanted to design quantum computers in graduate s Read more…

pNFS Provides Performance and New Possibilities

February 29, 2024

At the cusp of a new era in technology, enterprise IT stands on the brink of the most profound transformation since the Internet's inception. This seismic shift is propelled by the advent of artificial intelligence (AI), Read more…

Celebrating 35 Years of HPCwire by Recognizing 35 HPC Trailblazers

February 29, 2024

In 1988, a new IEEE conference debuted in Orlando, Florida. The planners were expecting 200-300 attendees because the conference was focused on an obscure topic called supercomputing, but when it was announced that S Read more…

Forrester’s State of AI Report Suggests a Wave of Disruption Is Coming

February 28, 2024

The explosive growth of generative artificial intelligence (GenAI) heralds opportunity and disruption across industries. It is transforming how we interact with technology itself. During this early phase of GenAI technol Read more…

Q-Roundup: Google on Optimizing Circuits; St. Jude Uses GenAI; Hunting Majorana; Global Movers

February 27, 2024

Last week, a Google-led team reported developing a new tool - AlphaTensor Quantum - based on deep reinforcement learning (DRL) to better optimize circuits. A week earlier a team working with St. Jude Children’s Hospita Read more…

AWS Solution Channel

Shutterstock 2283618597

Deep-dive into Ansys Fluent performance on Ansys Gateway powered by AWS

Today, we’re going to deep-dive into the performance and associated cost of running computational fluid dynamics (CFD) simulations on AWS using Ansys Fluent through the Ansys Gateway powered by AWS (or just “Ansys Gateway” for the rest of this post). Read more…

Argonne Aurora Walk About Video

February 27, 2024

In November 2023, Aurora was ranked #2 on the Top 500 list. That ranking was with half of Aurora running the HPL benchmark. It seems after much delay, 2024 will finally be Aurora's time in the spotlight. For those cur Read more…

Royalty-free stock illustration ID: 1988202119

pNFS Provides Performance and New Possibilities

February 29, 2024

At the cusp of a new era in technology, enterprise IT stands on the brink of the most profound transformation since the Internet's inception. This seismic shift Read more…

Celebrating 35 Years of HPCwire by Recognizing 35 HPC Trailblazers

February 29, 2024

In 1988, a new IEEE conference debuted in Orlando, Florida. The planners were expecting 200-300 attendees because the conference was focused on an obscure t Read more…

Forrester’s State of AI Report Suggests a Wave of Disruption Is Coming

February 28, 2024

The explosive growth of generative artificial intelligence (GenAI) heralds opportunity and disruption across industries. It is transforming how we interact with Read more…

Q-Roundup: Google on Optimizing Circuits; St. Jude Uses GenAI; Hunting Majorana; Global Movers

February 27, 2024

Last week, a Google-led team reported developing a new tool - AlphaTensor Quantum - based on deep reinforcement learning (DRL) to better optimize circuits. A we Read more…

South African Cluster Competition Team Enjoys Big Texas HPC Adventure

February 26, 2024

Texas A&M University's High-Performance Research Computing (HPRC) hosted an elite South African delegation on February 8 - undergraduate computer science (a Read more…

A Big Memory Nvidia GH200 Next to Your Desk: Closer Than You Think

February 22, 2024

Students of the microprocessor may recall that the original 8086/8088 processors did not have floating point units. The motherboard often had an extra socket fo Read more…

Apple Rolls out Post Quantum Security for iOS

February 21, 2024

Think implementing so-called Post Quantum Cryptography (PQC) isn't important because quantum computers able to decrypt current RSA codes don’t yet exist? Not Read more…

QED-C Issues New Quantum Benchmarking Paper

February 20, 2024

The Quantum Economic Development Consortium last week released a new paper on benchmarking – Quantum Algorithm Exploration using Application-Oriented Performa Read more…

Training of 1-Trillion Parameter Scientific AI Begins

November 13, 2023

A US national lab has started training a massive AI brain that could ultimately become the must-have computing resource for scientific researchers. Argonne N Read more…

Alibaba Shuts Down its Quantum Computing Effort

November 30, 2023

In case you missed it, China’s e-commerce giant Alibaba has shut down its quantum computing research effort. It’s not entirely clear what drove the change. Read more…

Nvidia Wins SC23, But Gets Socked by Microsoft’s AI Chip

November 16, 2023

Nvidia was invisible with a very small booth and limited floor presence, but thanks to its sheer AI dominance, it was a winner at the Supercomputing 2023. Nv Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Analyst Panel Says Take the Quantum Computing Plunge Now…

November 27, 2023

Should you start exploring quantum computing? Yes, said a panel of analysts convened at Tabor Communications HPC and AI on Wall Street conference earlier this y Read more…

Royalty-free stock illustration ID: 1675260034

RISC-V Summit: Ghosts of x86 and ARM Linger

November 12, 2023

Editor note: See SC23 RISC-V events at the end of the article At this year's RISC-V Summit, the unofficial motto was "drain the swamp," that is, x86 and Read more…

China Deploys Massive RISC-V Server in Commercial Cloud

November 8, 2023

If the U.S. government intends to curb China's adoption of emerging RISC-V architecture to develop homegrown chips, it may be getting late. Last month, China Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

Leading Solution Providers

Contributors

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Chinese Company Developing 64-core RISC-V Chip with Tech from U.S.

November 13, 2023

Chinese chip maker SophGo is developing a RISC-V chip based on designs from the U.S. company SiFive, which highlights challenges the U.S. government may face in Read more…

Royalty-free stock illustration ID: 1182444949

Forget Zettascale, Trouble is Brewing in Scaling Exascale Supercomputers

November 14, 2023

In 2021, Intel famously declared its goal to get to zettascale supercomputing by 2027, or scaling today's Exascale computers by 1,000 times. Moving forward t Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire