HPC Still Looks Past Security, but New Guidelines and Tools Push it Ahead

By Agam Shah

August 5, 2024

A U.S. government working group has published final guidelines on implementing security in high-performance computers. To support that, government labs and research labs detailing tools and guidelines to implement security in HPC hardware and workflows.

Some projects, including breaking down processing into isolated slabs and centralized security management, were shared at the High-Performance Computing Workshop held at Wichita State University earlier this year.

The NSF and NIST (National Institute of Standards and Technology) are leading efforts to create a robust cybersecurity infrastructure around scientific computing. The efforts rely on delivering system stability and creating a robust and trustworthy environment for scientific computing. The workshop was funded by the National Science Foundation.

Security is not a priority in supercomputers as it can slows down systems. HPC users place a premium on raw performance and time-to-discovery. Security applications or measures may slow down system performance.

Vendors building supercomputers often included few security provisions in system contracts as the top priority is meeting the high-performance acceptance benchmarks, though that is changing. The onus still falls on labs to secure systems with measures that include two-factor authentication, limited root access, and monitoring logins and system usage.

The large labs have established a walled garden—a type of demilitarized zone—around supercomputers, which restricts access.

“The vendors are saying, ‘Users don’t want it,’ and the users are saying, ‘Don’t get in the way of my performance,’ and pushing back on the vendors. It’s a bit of an impasse,” Albert Reuther, a senior MIT Lincoln Laboratory Supercomputing Center staff member, told HPCwire at the Supercomputing 2023 conference.

Supercomputing security isn’t as simple as overlaying an antivirus to check processes and files. In February, an HPC security group at NIST finalized a new security architecture, which applies security layers to four security zones.

The top layer is the “access zone,” which authenticates user access to systems and authorizes data transfers into the systems. This layer could prevent network scanning and hijacking of user sessions.

The second zone is the “management zone,” which covers the management and configuration of the actual computing work.

The “data storage” zone includes security measures such as mounting file systems like GPFS and Lustre-based PFS within specific boundaries. The file systems store petabytes or exabytes of data accessed regularly for computations.

The “high-performance compute” zone includes security measures for the core hardware and software driving HPC. The security steps could include sanitizing GPUs and securing OS kernels.

Many security projects presented at the security workshop fell into one of those four buckets.

In a presentation, Los Alamos National Laboratory detailed how it used Splunk for security across supercomputers.

Splunk is helping LANL system administrators do various management and monitoring activities, including tracking network activity, identifying weaknesses and patching systems, administering systems, tracking the status, and identifying unauthorized logins.

Specifically, LANL has integrated Tenable’s Nessus into Splunk to scan for vulnerabilities and dashboards to manage vulnerabilities. An HPC Operations Center monitors cluster activity and status, system utilization, and hardware errors. The system also issues alerts if a system is down, if OSes are outdated, irregular firewall patterns, and out-of-pattern activities such as excessive login attempts.

Another project called Cicada, developed by researchers at Sandia National Laboratory, advances collaboration in high-performance computing, which may be relevant in the AI space involving multiple scientific data sets.

Cicada’s core concept is simple: enabling collaborative computing while protecting each participant’s input data. This approach is relevant to AI in which multiple HPC users are involved.

The principal approach is similar to confidential computing, which allows organizations to contribute datasets to AI projects while safeguarding them from unauthorized access or tampering. Cicada can scale to over 100 participants, which can secure large-scale AI and scientific computing projects.

The project involves the MMULT algorithm, which facilitates secure matrix multiplications among participants. MMULT allows for aggregation techniques on the partial inputs so participants can perform matrix multiplications without revealing individual data.

The Cicada library supports multiple communication patterns and incorporates fault tolerance and recovery mechanisms. The library can maintain good performance and improve security across different operational scenarios.

HPC systems can be targets of denial-of-service attacks, where a flood of network traffic from multiple sources can slow down or disrupt workloads. Researchers at Pacific Northwest National Laboratory developed DoDGE (Differential analysis of Generalized Entropy progressions), a lightweight technology designed to detect denial-of-service attacks in communication networks. This technology could potentially be applied to HPC systems without affecting performance.

The researchers use Tsallis entropy to measure and analyze the randomness of network traffic patterns and how they change over time. DoDGE performs local calculations to efficiently detect DoS attacks while preserving network bandwidth. The technique’s generic pattern allows it to be adapted to various scales and system types, potentially making it suitable for HPC environments.

The researchers preferred the Tsallis entropy over techniques like Shannon entropy because it can better capture complex patterns in network traffic, potentially leading to more accurate detection of sophisticated attacks without significantly increasing computational overhead.

Rickey Gregg of the High-Performance Computing Modernization Program, a leading voice in HPC security, presented ways to implement security in HPC.

The NIST working group aligns with approaches that are mandatory in some DoD computing projects. Those include the Risk Management Framework (RMF), which goes hand-in-hand with DoD’s RDT&E (Research, Development, Test, and Evaluation Appropriations).

The RMF policy involves protecting data and preventing system access to unauthorized users. That includes “documentation, configuration settings, vulnerability scanning, reviews, and tiered approvals,” according to a presentation slide.

The RDT&E asks questions such as “How do we develop the software or code? How do we build these systems or appliances? How do we accomplish the test?” according to a slide.

The workflows depend on whether a user inherits a system or is building a new one.

One presentation discussed how the HITRUST CSF (Common Security Framework), built on other standards like ISO 27001/27002, which was originally designed for healthcare, could potentially be adapted for HPC environments. The standard provides guidance on “establishing, implementing, maintaining and continually improving an information security management system,” ISO said on the standards page.

An ISO27001/27002 certification is a prerequisite for HPC and quantum companies to engage with commercial and government organizations. Quantum software company Q-CTRL, Microsoft’s Azure HPC Cache, and some of Google Cloud’s HPC offerings have already been certified for the standard.

The Ohio State University’s Dhabaleswar Panda presented some options for MPI security in the high-performance computing zone. OSU provides the MVAPICH software for intra-memory communication in HPC systems, which supports numerous interfaces and protocols. The latest MVAPICH MPI stack supports GPUs, DPUs, software, and most interconnects for most AI and HPC workloads. It has had 1.78 million downloads from the OSU site.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

Argonne’s HPC/AI User Forum Wrap Up

September 11, 2024

As fans of this publication will already know, AI is everywhere. We hear about it in the news, at work, and in our daily lives. It’s such a revolutionary technology that even established events focusing on HPC specific Read more…

Quantum Software Specialist Q-CTRL Inks Deals with IBM, Rigetti, Oxford, and Diraq

September 10, 2024

Q-CTRL, the Australia-based start-up focusing on quantum infrastructure software, today announced that its performance-management software, Fire Opal, will be natively integrated into four of the world's most advanced qu Read more…

Computing-Driven Medicine: Sleeping Better with HPC

September 10, 2024

As a senior undergraduate student at Fisk University in Nashville, Tenn., Ifrah Khurram's calculus professor, Dr. Sanjukta Hota, encouraged her to apply for the Sustainable Research Pathways Program (SRP). SRP was create Read more…

LLNL Engineers Harness Machine Learning to Unlock New Possibilities in Lattice Structures

September 9, 2024

Lattice structures, characterized by their complex patterns and hierarchical designs, offer immense potential across various industries, including automotive, aerospace, and biomedical engineering. With their outstand Read more…

NSF-Funded Data Fabric Takes Flight

September 5, 2024

The data fabric has emerged as an enterprise data management pattern for companies that struggle to provide large teams of users with access to well-managed, integrated, and secured data. Now scientists working at univer Read more…

xAI Colossus: The Elon Project

September 5, 2024

Elon Musk's xAI cluster, named Colossus (possibly after the 1970 movie about a massive computer that does not end well), has been brought online. Musk recently posted the following on X/Twitter: "This weekend, the @xA Read more…

Shutterstock 793611091

Argonne’s HPC/AI User Forum Wrap Up

September 11, 2024

As fans of this publication will already know, AI is everywhere. We hear about it in the news, at work, and in our daily lives. It’s such a revolutionary tech Read more…

Quantum Software Specialist Q-CTRL Inks Deals with IBM, Rigetti, Oxford, and Diraq

September 10, 2024

Q-CTRL, the Australia-based start-up focusing on quantum infrastructure software, today announced that its performance-management software, Fire Opal, will be n Read more…

NSF-Funded Data Fabric Takes Flight

September 5, 2024

The data fabric has emerged as an enterprise data management pattern for companies that struggle to provide large teams of users with access to well-managed, in Read more…

Shutterstock 1024337068

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Shutterstock 1897494979

What’s New with Chapel? Nine Questions for the Development Team

September 4, 2024

HPC news headlines often highlight the latest hardware speeds and feeds. While advances on the hardware front are important, improving the ability to write soft Read more…

Critics Slam Government on Compute Speeds in Regulations

September 3, 2024

Critics are accusing the U.S. and state governments of overreaching by including limits on compute speeds in regulations and laws, which they claim will limit i Read more…

Shutterstock 1622080153

AWS Perfects Cloud Service for Supercomputing Customers

August 29, 2024

Amazon's AWS believes it has finally created a cloud service that will break through with HPC and supercomputing customers. The cloud provider a Read more…

HPC Debrief: James Walker CEO of NANO Nuclear Energy on Powering Datacenters

August 27, 2024

Welcome to The HPC Debrief where we interview industry leaders that are shaping the future of HPC. As the growth of AI continues, finding power for data centers Read more…

Everyone Except Nvidia Forms Ultra Accelerator Link (UALink) Consortium

May 30, 2024

Consider the GPU. An island of SIMD greatness that makes light work of matrix math. Originally designed to rapidly paint dots on a computer monitor, it was then Read more…

Atos Outlines Plans to Get Acquired, and a Path Forward

May 21, 2024

Atos – via its subsidiary Eviden – is the second major supercomputer maker outside of HPE, while others have largely dropped out. The lack of integrators and Atos' financial turmoil have the HPC market worried. If Atos goes under, HPE will be the only major option for building large-scale systems. Read more…

AMD Clears Up Messy GPU Roadmap, Upgrades Chips Annually

June 3, 2024

In the world of AI, there's a desperate search for an alternative to Nvidia's GPUs, and AMD is stepping up to the plate. AMD detailed its updated GPU roadmap, w Read more…

Nvidia Shipped 3.76 Million Data-center GPUs in 2023, According to Study

June 10, 2024

Nvidia had an explosive 2023 in data-center GPU shipments, which totaled roughly 3.76 million units, according to a study conducted by semiconductor analyst fir Read more…

Shutterstock_1687123447

Nvidia Economics: Make $5-$7 for Every $1 Spent on GPUs

June 30, 2024

Nvidia is saying that companies could make $5 to $7 for every $1 invested in GPUs over a four-year period. Customers are investing billions in new Nvidia hardwa Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Google Announces Sixth-generation AI Chip, a TPU Called Trillium

May 17, 2024

On Tuesday May 14th, Google announced its sixth-generation TPU (tensor processing unit) called Trillium.  The chip, essentially a TPU v6, is the company's l Read more…

Shutterstock 1024337068

Researchers Benchmark Nvidia’s GH200 Supercomputing Chips

September 4, 2024

Nvidia is putting its GH200 chips in European supercomputers, and researchers are getting their hands on those systems and releasing research papers with perfor Read more…

Leading Solution Providers

Contributors

IonQ Plots Path to Commercial (Quantum) Advantage

July 2, 2024

IonQ, the trapped ion quantum computing specialist, delivered a progress report last week firming up 2024/25 product goals and reviewing its technology roadmap. Read more…

Intel’s Next-gen Falcon Shores Coming Out in Late 2025 

April 30, 2024

It's a long wait for customers hanging on for Intel's next-generation GPU, Falcon Shores, which will be released in late 2025.  "Then we have a rich, a very Read more…

Some Reasons Why Aurora Didn’t Take First Place in the Top500 List

May 15, 2024

The makers of the Aurora supercomputer, which is housed at the Argonne National Laboratory, gave some reasons why the system didn't make the top spot on the Top Read more…

Department of Justice Begins Antitrust Probe into Nvidia

August 9, 2024

After months of skyrocketing stock prices and unhinged optimism, Nvidia has run into a few snags – a  design flaw in one of its new chips and an antitrust pr Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

MLPerf Training 4.0 – Nvidia Still King; Power and LLM Fine Tuning Added

June 12, 2024

There are really two stories packaged in the most recent MLPerf  Training 4.0 results, released today. The first, of course, is the results. Nvidia (currently Read more…

xAI Colossus: The Elon Project

September 5, 2024

Elon Musk's xAI cluster, named Colossus (possibly after the 1970 movie about a massive computer that does not end well), has been brought online. Musk recently Read more…

Spelunking the HPC and AI GPU Software Stacks

June 21, 2024

As AI continues to reach into every domain of life, the question remains as to what kind of software these tools will run on. The choice in software stacks – Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire