When Nvidia announced its acquisition of Mellanox, the GPU leader noted that datacenters would eventually be built like high performance computers. Hence, it’s not surprising the first fruits of Nvidia’s 2019 acquisition of the networking specialist focuses on applying AI to security and predictive maintenance applications in InfiniBand datacenters.
The supercomputer datacenter “cyber-AI platform” unveiled this week is built around the Mellanox Unified Fiber Manager (UFM) that combines real-time network telemetry with AI-enhanced security and predictive analytics. The platform, called UFM Cyber-AI, is intended to reduce downtime in scale-out InfiniBand datacenters by leveraging analytics to detect cyber threats or operational issues and predict costly network failures.
The platform draws on both real-time and historic telemetry and workload data to learn a datacenter’s “operational cadence” and workloads patterns, the company said Monday (June 22).
The network monitor applies AI techniques to determine a datacenter’s “vital signs,” then uses those data to predict component failures or spot suspicious usage patterns that might indicate a cyberattack, said Gilad Shainer, Nvidia’s senior vice president of marketing for Mellanox networking.
The network security platform combines deep learning and network elements to reduce datacenter downtime by “bringing security into supercomputing and then, second, enable IT manager to predict failures before they actually happen,” Shainer added. The platform “reads all of the telemetry information it can from the adapters, from the cables from the switches… and it takes that information and stores that and then runs a deep learning algorithm on that database that is being created,” explained the former Mellanox marketing chief.
The deep learning algorithm gathers those vital signs to gauge how datacenters operate, particularly those configured to run as supercomputers with high-performance interconnects. Those operational characteristics represent the “heartbeat of the supercomputer, which is basically defining the network stamps for the workloads that are running on top of that,” said Shainer.
The platform would then generate alerts before a supercomputing datacenter goes down, allowing IT managers to schedule preventive maintenance.
Along with high-performance computing and interconnects, datacenters are also deploying distributed applications via software containers increasingly reliant on the de facto standard Kubernetes cluster orchestrator. Hackers have taken notice, probing microservices infrastructure for vulnerabilities such as container image registries.
The UFM platform is designed to track applications and detect suspicious or unauthorized workloads that might contain malware. Operators could then move to isolate those workloads before they can bring down a datacenter.
Networking devices from cables to boards to adaptors have individual signatures. Shainer added that the deep learning security approach also can detect any physical changes in a network configuration, then alert operators of a potential problem. “If anyone’s going to [unplug] a cable from a port, you’re going to get an immediate alarm,” he said.
Nvidia’s acquisition of Mellanox along with its May 2020 deal for Cumulus Networks underscores its concerted push into datacenter networking. “There’s a lot of AI and ML whitewashing by networking infrastructure providers, but Nvidia has a demonstrated track record and deep expertise that is now being applied to interconnect,” said Will Townsend, enterprise networking analyst with Moor Insights & Strategy.
As enterprise datacenters ramp up to handle HPC workloads, demand for automated security tools and operations monitoring is likely to increase. For example, a survey of U.S. government spending to protect critical infrastructure projects federal cyber defense budgets will total nearly $18.8 billion by 2021. Among the top spenders is the U.S. Energy Department, which operates many of the nation’s supercomputer centers.
A survey released this week by security tool vendor AtlasVPN forecasts the agency will spend more than $665 million on cyber defenses over the next year. That total represents the second fastest rate of growth among the two dozen federal agencies included in the forecast.
–Tiffany Trader contributed to this report.