Microsoft shared details on how it uses an AMD technology to secure artificial intelligence as it builds out a secure AI infrastructure in its Azure cloud service.
Microsoft has a strong relationship with Nvidia, but is also working with AMD’s Epyc chips (including the new 3D VCache series), MI Instinct accelerators, and also using Xilinx FPGAs internally for inferencing. The cloud provider has implemented a security layer in its AI computing infrastructure through a feature available only the company’s Epyc chips (specifically, the third-generation, “Milan” chips), said Mark Russinovich, chief technology officer at Microsoft’s Azure cloud division, during a presentation at the AI Hardware Summit in Santa Clara, California.
The security feature in AMD’s Epyc chips, called SEV-SNP, is tied to a relatively new concept of confidential computing – which secures sensitive data while it is being processed through encryption – into Azure. Russinovich hailed the feature as a breakthrough, which filled a giant hole of securing data as it goes through the AI processing cycle.
AMD’s feature encrypts AI data when it is loaded into a CPU or GPU. That’s important as verticals are looking to mix proprietary and third-party datasets for richer insights. The security feature in AMD’s chip will ensure the data can’t be tampered with as it goes through the AI cycle.
“Confidential computing allows people to trust the code and the Trusted Execution Environment to protect the confidentiality of their data. Which means that you can combine your datasets if you trust the code, and you trust the data,” Russinovich said.
AMD’s SEV-SNP supports virtual machines and containers. Chips already encrypt data when at rest and in transport, but AMD’s chip security feature fills a big hole of encrypting and protecting data when it is being processed.
“What’s been missing is when that data gets loaded on the CPU or the GPU, it is protecting the data there while it’s in use,” Russinovich said.
AMD’s security feature is important for multi-party computations and analytics, Russinovich said. He shared the example of Royal Bank of Canada, which mixes in secure merchant data, consumer buying habits and data gleaned from credit cards in real time.
“RBC is able to combine the datasets, the merchants, the consumers, and the bank, in a way that no party has access to the data … but yet be able to have very targeted advertisements and offers to those consumers. This is the future of advertising,” Russinovich said.
The feature was needed as the compute capacity for AI models has gone up 200,000-fold from seven years ago until today. Russinovich compared AI’s compute capacity with Moore’s law, which is a corollary that states the number of transistors on a chip doubles every two years.
AI hardware requirements are “doubling roughly every two and a half years,” when tracking against Moore’s law, Russinovich said.
Moore’s law was initially tied to CPUs, but has evolved as more accelerators like GPUs and FPGAs are included in AI chip packages.
Microsoft has built a dedicated backend network on Azure for AI computing. Microsoft offers AI compute instances with eight Nvidia A100 GPUs on each server, on which customers can provision virtual machines that use any number of GPUs. The servers are connected with NVSwitch and NVLink, and also has thousands of servers linked up with an InfiniBand HDR network of 200 gigabit links.
Azure has added an “intelligence” layer in the form of pre-trained AI models as a service to customers. The intelligence layer, which is a software layer called Singularity, orchestrates efficient use of hardware.
“You really need an extra software infrastructure that is able to effectively and efficiently utilize that, and to provide reliability as well as efficiency,” Russinovich said.
A critical feature of Singularity is “checkpoint,” which provides elasticity and reliability of the AI computing network. The checkpoint system can migrate low-priority jobs to systems in other regions when high-priority jobs come in. This is important for large scale AI models, which can take weeks or months to train.
The checkpoint process involves creative synchronization between the states of the CPU and GPU, which Singularity does in many ways through custom techniques and code incorporated in Linux.
“Traditionally that low priority job just gets killed. If you checkpoint, you can resume it at some point later when a job is completed,” Russinovich said, adding, “this is abstracted away from the developers through the … runtime and the Singularity infrastructure,” Russinovich said.
Microsoft has AI accelerators across its 65 public Azure regions. Depending on the AI job, an AI scheduler in Singularity can localize a job to a specific accelerator in the region, or migrate the job to another region depending on the capacity available.
The Singularity system supports both inference and training, and can scale jobs up and down, and also suspend and resume jobs.
Russinovich’s thumbs up to AMD hardware surprised the chipmaker’s Victor Peng, president of adaptive and embedded computing group at AMD – and former CEO of Xilinx, who presented next at the AI Hardware Summit. He looked pleased, but also told the audience that their back-to-back presentations were a coincidence, and not a marketing stunt.
“We did not coordinate in any way,” Peng said.