An AMD call to discuss its $4.9 billion acquisition of ZT Systems provided an inside look into how Lisa Su is building her AI empire. She laid down an AMD AI landscape that is polar opposite to Nvidia’s proprietary approach.
In her view, customers have a choice: choose a dystopian Nvidia world in which the company owns the assets, or select AMD’s world, where you can select your partners, hardware, technologies, and AI tools.
The ZT Systems acquisition was in that spirit: to provide engineers with the ability to build systems optimized for AI processing and power consumption.
Su thinks its AI offerings will be quite differentiating.
“We actually can use our systems capability to allow customers to use whatever they believe is the best capability for their workload and their data center environment,” Su said.
To be sure, the full-stack vendor may not seem new. AMD has been building up its systems vendor capability by acquiring all critical parts of computing: software, hardware, and networking.
Copying Nvidia’s Strategy
Earlier this year, AMD announced that it would release a new GPU each year, similar to Nvidia. ZT Systems gives AMD 1,000 engineers to build systems, much like Nvidia’s engineers building DGX systems.
“ZT ships hundreds of thousands of servers and tens of thousands of AI racks per year to the largest hyperscale cloud companies with industry-leading quality,” Su said.
That sounds like Nvidia’s current strategy — all major cloud providers gave Nvidia space to install DGX systems. Nvidia has built its own parallel cloud service that links up its GPU systems across all cloud providers.
“We’re trying to give our customers choice while giving them best-in-class design capability with our technology,” Su said.
While AMD is getting accolades, a lot of things need to come together for it to be the next Nvidia.
It took decades for Nvidia to reach where it is today. The transition included:
- Building a software framework with CUDA in 2007.
- Envisioning AI capabilities.
- Delivering the first hardware that allowed OpenAI to test its AI models.
AMD is no Nvidia, and a lot of things need to line up for the company to be the next Nvidia
It’s a good time to look at issues the company needs to resolve.
AMD’s GPU Still Facing Issues
Getting the GPU right ensures AMD’s AI universe hangs on to Nvidia’s onslaught.
AMD is happy with its GPU progress. The MI300X did well with top customers, including Microsoft and Meta.
But a quick reality check: two of the top three cloud providers still don’t want MI300 or MI300X GPUs. Google and AWS haven’t ordered AMD GPUs. That may be a reason AMD bought ZT Systems — to get more cloud providers on board.
AMD’s GPUs may be a poor man’s Nvidia, with no customers desperate to acquire the hardware. However, AMD’s GPU is the only legitimate alternative to Nvidia, and orders are going up.
“We now expect data center GPU revenue to exceed $4.5 billion in 2024, up from the $4 billion we guided in April,” Su said.
Earlier this year, AMD revealed it would release a new GPU each year, much like Nvidia, that includes the MI325X and, next year, the MI400.
“Our MI400 series powered by the CDNA Next architecture is making great progress in development and is scheduled to launch in 2026,” Su said.
The good news is that AMD has a GPU roadmap and customers now have a clear vision of what they buy into. The scene may drastically change by 2026 if everything goes in the right direction for AMD.
“It’s about CPUs, GPUs, networking, systems, and clusters. How do you ensure that they are reliable? This team will help us do that because they’ve done it,” Su said.
Systems with AMD’s MI350, which is coming next year, and MI400 will result in complex systems that will require experts acquired from ZT Systems, Su said.
AMD is keeping up with Nvidia on hardware features, memory, and manufacturing.
Bumbling Benchmarking and Software
AMD’s benchmarking has been all over the map. The company hasn’t submitted its AI benchmarks to MLPerf, but Microsoft and Meta testified that AMD’s Instinct GPUs performed well.
AMD recently met with criticism from Intel for not being honest about its upcoming Turin CPUs. Its Zen 5 PC CPUs were recently criticized for poor performance gains.
Benchmarks are tricky, but they are best taken with a grain of salt. However, the company’s software ecosystem is nowhere near the CUDA stack established by Nvidia.
AMD has spent years developing ROCm, which is a standard set of tools, libraries, drivers, and compilers. But it is still in its early days.
“ROCm from a standpoint of features… we’ve gained a lot of confidence and learned a lot in that whole process,” Su said during the earnings call.
AMD executives at many conferences repeat the same line on ROCm, meaning it has been a work in progress for years.
AMD is still stuck at the programming level and is behind UXL Foundation’s parallel programming framework, which is based on OneAPI.
However, ROCm’s open nature meets AMD’s goal of being able to work with workloads. The question is whether developers will adapt to ROCm.
ROCm vs CUDA
Nvidia is light years ahead of ROCm with CUDA, which has matured into computing programs and data sets. CUDA executables for major verticals include robotics, autonomous cars, healthcare, finance, and quantum computing.
CUDA tools are being used to generate synthetic data that is not available in the real world. Those tools and others are wrapped into Nvidia’s AI Enterprise software.
But without a doubt, Nvidia’s CUDA is an expensive proposition. But it is also easier to deploy — customers just have to feed data and get the output. The technical difficulty of CUDA tools can be ramped up for those needing further customization.
AMD’s ROCm is complex, but it offers more flexibility on tools and the development of models. AMD is also backing open networking technologies.
“We’re working closely across our consortiums, with Ultra Ethernet Consortium, as well as the UA Link group, to ensure that we have very strong networking technologies that are industry standard,” Su said.
The Right Steps
AMD’s acquisition of ZT Systems is the latest in the line of strategic acquisitions to fill holes in the company.
AMD made interesting acquisitions to create its master AI plan. In 2022, AMD doled out $49 billion for Xilinx for FPGAs and software. AMD had the CPUs and GPUs, and Xilinx gave it the trifecta of FPGAs and ASICs.
The company also bought software companies Pensando Systems, Silo.AI, and Nod.
“The Silo team significantly expands our capability to service large enterprise customers looking to optimize their AI solutions for AMD hardware,” Su said on an earnings call.
The company will continue to look for strategic acquisitions.
“We’ll continue to look at how we aggressively add to our capabilities, and that’s on both an organic and inorganic basis,” Su said.