If you are waiting in a giant line for Nvidia’s H100 GPUs, be advised that the next-generation H200 chip is already on its way. The GPU maker earlier this month released its product roadmap of AI GPUs leading into 2025. The new GPUs are intended to continue Nvidia’s dominance in artificial intelligence, a market that took off after ChatGPT’s magic shook the world.
The most surprising takeaway from the roadmap is Nvidia’s plan to release GPUs every year. Nvidia’s H100 was released in 2023, two years after its predecessor, A100.
Nvidia has no one seriously challenging its competition, and revving up to yearly GPU upgrades puts pressure on rivals to keep up, analysts said.
Intel is already behind with its Falcon Shores GPU coming out in 2025, and AMD’s MI300 is due out by the end of this year. Even Nvidia’s customers, such as OpenAI, are considering developing their own AI chips as the cost of running AI becomes untenable.
“Nvidia may be speeding up its roadmap because it wants to put more distance between itself and the competition, and even its customers, many of whom are now designing their own AI chips,” said Kevin Krewell, principal analyst at Tirias Research.
Nvidia is already getting a premium for its H100 and can charge even more for next-generation GPUs. Customers deploying AI workloads, like gamers, will shell out the cash for the latest and greatest hardware.
Chip development is also at an inflection point, making it possible for Nvidia to upgrade GPUs yearly. From an architectural standpoint, Nvidia has many variables and options to play with when integrating processing, IO, memory, and communications and packaging them vertically.
“The semiconductor market is also entering a new era with chiplets, which could be changing how Nvidia designs its chips,” Krewell said.
The yearly upgrades also include CPUs and networking products. That cadence for upgrades may not apply to AI chip competitors, who are still trying to find chip customers.
The new roadmap was mentioned in an investor presentation published earlier this month but could easily change.
The New Roadmap
Nvidia’s new roadmap lists yearly products related to compute and communications. It breaks down GPUs used with x86 chips and GPUs and CPUs based on its own ARM processor designs.
For x86-related GPUs, Nvidia’s successor to the H100 GPU will be the H200, which will be released in 2024. In the same year, Nvidia will also release the B100 GPU and, finally, the X100 GPU in 2025.
The roadmap also has a lineup of successors for the L40s, which is based on the Ada Lovelace architecture. The L40s is a poor man’s version of the H100 but is faster than previous-generation A100 GPUs in AI training and inference. Nvidia is redirecting H100 customers in urgent need of GPUs to buy the L40S GPU.
The successor to L40S will be B40 in 2024 and the X40 in 2025. The roadmap shows the L40S-B40-X40 lineup for “X86 enterprise and inferencing,” meaning it is optimized for inferencing.
Nvidia’s CPU roadmap provides yearly upgrades on its ARM processors, which can be paired with the GPUs mentioned above.
For inferencing, the current GH200, which has HBM3E memory, will ship next year and is tied to the H200 GPUs. The GB200, also due next year, is designed to be used with the B100 GPU. The GX200, coming in 2025, is designed for use with the X100 GPU.
Nvidia will add NVLink interconnects for AI training to provide high-speed links between ARM CPUs and related GPUs. The GH200NVL (H200 GPU) and GB200NVL (B100 GPU) will ship in 2024, and the GX200NVL (X100 GPU) will ship in 2025.
CPUs are much less relevant for large-scale models, but Nvidia’s ARM CPU and GPU package is a great combination for AI training, said Naveen Rao, vice president for generative AI at Databricks. Rao previously was CEO of MosaicML, an AI startup sold to Databricks for $1.3 billion earlier this year.
“The CPU could become relevant as a programming interface, though…GH200 essentially uses ARM as the programming interface with a fast GPU sitting very close. This design could be an awesome combo,” Rao said.
Technologies like AMX in Intel CPUs could also be super relevant, but they need to go much bigger and enable multi-chip scaling in a big way, Rao added.
The new roadmap also speeds up the networking bandwidth from 400G in 2024 to 1,600G in 2026 in its Quantum product for Infiniband interfaces and Spectrum-X for Ethernet and hyperscale infrastructures. Nvidia’s DGX systems use both Infiniband and Ethernet network technologies.
How Chiplets Could Define the Roadmap
Historically, Intel upgraded chips yearly with either new manufacturing technologies or new features on the same manufacturing processes. But that slowed down as scaling manufacturing became challenging.
Chip design is now at an inflection point with conventional technologies like FINFET running out of steam, said David Kantor, principal analyst at Real World Technologies.
Conventional chip designs focused on integrating all components into single chips. An emerging trend is to decompose SOCs and chips into chiplets, or smaller compute, I/O, and communications units, which can be assembled in 2.5D or 3D packages, Kantor said.
Nvidia declined to comment on its next-generation GPUs. But industry experts said Nvidia has many options it can explore on GPUs, DRAM, I/O, and SRAM integration, and suggested many possibilities of what chips may look like. Nvidia may also have the option of Intel as a manufacturing partner by 2025 for X100.
The H200 chip upgrade includes a new type of HBM3E memory. Nvidia earlier this year announced the GH200 chip, connecting it in the roadmap to the H200 GPU.
After H200, Nvidia could start using chiplets, modularizing the GPU into blocks with a choice of CPUs, accelerators, and interconnects in a package. The blocks, also called tiles, could be manufactured using various processes.
This method would allow Nvidia to conceive heterogeneous chips with separate tiles for compute, I/O, and SRAM, made on different manufacturing processes.
For example, the compute tiles and memory of B100 and B40 chips could be made on TSMC’s N3 process and SRAM tiles on an older process. N3 does not scale well for SRAM and analog portions of GPUs, so the possibility of integrating these modules made on older manufacturing processes will be an advantage.
By 2025, Intel’s 18A process will come online and possibly surpass TSMC, and Nvidia could turn to Intel’s fabs to manufacture X100. Nvidia has manufactured test chips on Intel’s next-generation processes, and CEO Jensen Huang was happy with the outcome.
TSMC is expected to shift from N3 to the two-nanometer N2 by 2025. Both Intel and TSMC will have gate-all-around technologies on those nodes.