Artificial intelligence is transforming every industry and creating new opportunities for innovation and growth. On top of this, AI models are continually advancing and becoming more complex and accurate. More powerful computers with purpose-built AI accelerators that have resources like high bandwidth memory (HBM), specialized data formats, and exceptional compute performance are needed to fuel these technological advances.
To meet this need, Azure is proud to be the first cloud service to offer general availability of the new Azure ND MI300X v5 virtual machine (VM) series based on AMD’s latest Instinct GPU, MI300X. This new VM series is the first cloud offering of its kind and is designed to give the highest bandwidth memory (HBM) capacity of any available VM with industry-leading speeds, letting customers serve larger models faster, and with fewer GPUs
Unmatched infrastructure optimized at every layer
The new Azure ND MI300X virtual machine series is a product of a long collaboration with AMD to build powerful cloud systems for AI with open-source software. This collaboration includes optimizations across the entire hardware and software stack. For example, these new VMs are powered by 8x AMD MI300X GPUs, each VM with 1.5 TB of high bandwidth memory (HBM) and 5.3 TB/s of HBM bandwidth. HBM is essential for AI applications due to its high bandwidth, low power consumption, and compact size. It is ideal for AI applications that need to quickly process vast amounts of data. The result is a VM with industry-leading performance, HBM capacity, and HBM bandwidth, enabling you to fit larger models in GPU memory and/or use less GPUs. In the end, you save power, cost, and time-to-solution.
On the software side, Azure ND MI300X uses the AMD ROCm open-source software platform, which provides a comprehensive set of tools and libraries for AI development and deployment. The ROCm platform supports popular frameworks such as TensorFlow and PyTorch, as well as Microsoft libraries for AI acceleration like ONNX Runtime, DeepSpeed, and MSCCL. The ROCm platform also enables seamless porting of models and solutions from one platform to another, lowering your engineering costs and speeding up time to market for your AI solutions.
For customers looking to scale out efficiently to thousands of GPUs, it’s as simple as using Azure ND MI300X v5 VMs with a standard Azure Virtual Machine Scale Set (VMSS). Azure ND MI300X v5 VMs feature high-throughput, low latency InfiniBand communication between different VMs. Each GPU has its own dedicated 400 Gb/s NVIDIA Quantum-2 CX7 InfiniBand link to give 3.2 Tb/s of bandwidth per VM. InfiniBand is the standard for AI workloads needing to scale out to large numbers of VMs/GPUs.
Scalable AI infrastructure running capable OpenAI models
These Azure ND MI300X VMs and the software that powers them, were purpose-built for our own Azure AI services production workloads. We have already optimized the most capable natural language model in the world, GPT-4 Turbo, for these VMs. If you want to generate text, answer questions, summarize documents, or create new applications, you can leverage the power and scalability of the Azure AI infrastructure to run these models at lightning speed, huge scale, and, optimized efficiency.
ND MI300X v5 VMs offer leading cost performance for popular OpenAI and open-source models.
Leading with innovation to advance the ecosystem
We are also working closely with our partners and customers so they can take full advantage of these new VMs and accelerate their AI projects and applications. One of our partners, Hugging Face, is a popular provider of natural language processing open-source models. Hugging Face easily ported their models to Azure ND MI300X VMs without any code changes and achieved 2x to 3x performance gains over AMD’s MI250 using these VMs. Now you can use these open-source models and Hugging Face libraries on Azure ND MI300X VMs to create and deploy your own NLP applications with ease and efficiency.
Get started today
We’re excited to see what our customers will do with the new VMs. Whether you want to bring your own models, use our models through the Azure OpenAI Service, or use open models from Azure AI catalog or from Hugging Face, you can get the best performance at the best price on the new Azure AI infrastructure VMs. You can also scale up or down your VMs as needed, thanks to the flexibility and elasticity of the Azure cloud.
Learn more about Azure ND MI300X and get started today.
Learn more about Azure and AMD
Azure AI Infrastructure
Azure High Performance Computing
Achieve more with Microsoft Azure and AMD