AMD Launches Milan-X CPU with 3D V-Cache and Multichip Instinct MI200 GPU

By Tiffany Trader

November 8, 2021

At a virtual event this morning, AMD CEO Lisa Su unveiled the company’s latest and much-anticipated server products: the new Milan-X CPU, which leverages AMD’s new 3D V-Cache technology; and its new Instinct MI200 GPU, which provides up to 220 compute units across two Infinity Fabric-connected dies, delivering an astounding 47.9 peak double-precision teraflops.

“We’re in a high-performance computing megacycle, driven by the growing need to deploy additional compute performance delivered more efficiently and at ever-larger scale to power the services and devices that define modern life,” said Su.

AMD’s new third-generation Epyc CPU with AMD 3D V-Cache, codenamed Milan-X, is the company’s first server CPU with 3D chiplet technology. The processors have three times the L3 cache compared to standard Milan processors. In Milan, each complex core die (CCD) had 32 megabytes of cache; Milan-X adds 64 megabytes of 3D stacked cache on top for a total of 96 megabytes per CCD. With eight CCDs, that adds up to 768 megabytes of L3 cache. Adding in L2 and L1 cache, there is a total of 804 megabytes of cache per socket.

Milan-X is built on the same 7nm Zen 3 cores as Milan, and will also have a max core count of 64 total cores. The enhanced processors are compatible with existing platforms after a BIOS upgrade. 

Milan-X with 3D V-Cache employs a hybrid bond plus through silicon vias approach, providing more than 200 times the interconnect density of 2D chiplets and more than 15 times the density compared to the existing 3D stacking solutions, according to AMD. The die-to-die interface uses a direct copper-copper bond with no solder bumps to improve thermals, transistor density and interconnect pitch. 

AMD is reporting a 50 percent performance improvement for Milan-X on targeted technical computing workloads compared to Milan processors. The chipmaker demonstrated Milan-X’s performance speedup on an EDA workload, running Synopsys’ verification solution VCS. A 16-core Milan-X with AMD’s 3D V-Cache delivered 66 percent faster RTL verification compared to the standard Milan without V-Cache. VCS is used by many of the world’s top semiconductor companies to catch defects early in the development process before a chip is committed to silicon.

Microsoft Azure is the first announced customer for Milan-X, with upgraded HBv3 instances in preview today, and a planned refresh on the way for its entire HBv3 deployment. Traditional OEM and ODM server partners Dell Technologies, HPE, Lenovo, and Supermicro are preparing Milan-X products for the first quarter of 2022. Named ISV ecosystem partners include Altair, Ansys, Cadence, Siemens and Synopsys.

Manufactured on TSMC’s 6nm process, the MI200 is the world’s first multichip GPU, designed to maximize compute and data throughput in a single package. The MI200 series contains two CDNA 2 GPU dies harnessing 58 billion transistors. It features up to 220 compute units and 880 second-generation matrix cores. Eight stacks of HBM2e memory provide a total 128 gigabytes memory at 3.2 TB/s, four times more capacity and 2.7 times more bandwidth than the MI100. Connecting the two CDNA2 dies are Infinity Fabric links running at 25 Gbps for a total of 400 GB/s of bidirectional bandwidth.

The MI200 accelerator – with up to 47.9 peak double-precision teraflops ostensibly answers the question, what if a chip designer dramatically optimized the GPU architecture for double-precision (FP64) performance? The MI250X ramps up peak double-precision 4.2 times over the MI100 in one year (47.9 teraflops versus 11.5 teraflops). By comparison, AMD pointed out that Nvidia grew its traditional double-precision FP64 peak performance for its server GPUs 3.7 times from 2014 until 2020. In a side by side comparison, the MI200 OAM is nearly five times faster than Nvidia’s A100 GPU in peak FP64 performance, and 2.5 times faster in peak FP32 performance.

Further, the Instinct MI250X delivers 47.9 teraflops of peak single-precision (FP32) performance and provides 383 teraflops of peak theoretical half-precision (FP16) for AI workloads. That dense computational capability doesn’t come without a power cost. The top of stack part, the OAM MI250X, consumes up to 560 watts, while air-cooled and other configurations will require somewhat less power. However, remember you’re essentially getting two GPUs in one package with that 500-560 watt TDP, and based on some of the disclosed system specs (like Frontier), the flops-per-watt targets are impressive.

During this morning’s launch event, Forrest Norrod, senior vice president and general manager of the datacenter and embedded solutions business group at AMD, showed head-to-head comparisons for the MI200 OAM versus Nvidia’s A100 (80GB) GPU on a range of HPC applications. In AMD testing, a single-socket 3rd gen AMD Eypc server with one AMD Instinct MI250X OAM 560 watt GPU achieved a median score of 42.26 teraflops on the High Performance Linpack benchmark.

Norrod also showed a competitive comparison of the MI200 OAM versus the Nvidia A100 (80GB) on the molecular simluation code LAMMPS, running a combustion simulation of a hydrocarbon molecule. In the timelapse of the simulation, four MI250X 560 watt GPUs can be seen completing the job in less than half the time of four A100 SXM 80GB 400 watt GPUs.

The MI200 accelerators introduce the third-generation AMD Infinity Fabric architecture. Up to eight Infinity Fabric links connect the AMD Instinct MI200 with 3rd generation Epyc Milan CPUs and other GPUs in the node to deliver up to 800 GB/s of aggregate bandwidth and enable unified CPU/GPU memory coherency. 

AMD is also introducing its Elevated Fanout Bridge (EFB) technology. “Unlike substrate embedded silicon bridge architectures, EFB enables use of standard substrates and assembly techniques, providing better precision, scalability and yields while maintaining high performance,” said Norrod.

 

Three form factors were announced for the new MI200 series: the MI250X and MI250, available in an open-hardware compute accelerator module or OCP Accelerator Module (OAM) form factor; and a PCIe card form factor, the AMD Instinct MI210, that will be forthcoming in OEM servers.

The AMD MI250X accelerator is currently available from HPE in the HPE Cray EX Supercomputer. Other MI200 series accelerators, including the PCIe form factor, are expected in Q1 2022 from server partners, including ASUS, ATOS, Dell Technologies, Gigabyte, HPE, Lenovo and Supermicro.

The MI250X accelerator will be the primary computational engine of the upcoming exascale supercomputer Frontier, currently being installed at the DOE’s Oak Ridge National Laboratory in partnership with HPE. Each of Frontier’s 9,000+ nodes will include one “optimized 3rd Gen AMD Epyc CPU” not Milan-X linked to four AMD MI250X accelerators over AMD’s coherent Infinity Fabric.

During this morning’s proceedings, ORNL Director Thomas Zacharia noted that a single MI250X GPU is more powerful than an entire node of ORNL’s Summit supercomputer, which is currently the fastest system in the United States. With a promised performance target of >1.5 peak double-precision exaflops, Frontier could achieve greater than 1.72 exaflops peak just owing to its GPUs (9,000 x 4 x 95.7 teraflops).

As we detailed recently, the MI200 will be powering three giant systems on three continents. In addition to Frontier, expected to be the United States’ first exascale computer coming online next year, the MI200 was selected for the European Union’s pre-exascale LUMI system and Australia’s petascale Setonix system.

AMD Instinct MI200 OAM accelerator

“The adoption of Milan has significantly outpaced Rome as our momentum grows,” said Su. Looking ahead on the roadmap, the next-gen “Genoa” Epyc platform will have up to 96 high-performance 5nm “Zen 4” cores, and will support next-generation memory and IO capabilities DDR5, PCIe Gen 5 and CXL. Genoa is now sampling to customers with production and launch anticipated next year, AMD said.

“We’ve worked with TSMC to optimize 5nm for high performance computing,” said Su. “[The new process node] offers twice the density, twice the power efficiency and 1.25x the performance of the 7nm process we’re using in today’s products.”

Su also unveiled a new version of Zen 4 for cloud native computing, called “Bergamo.” Bergamo features up to 128 high performance “Zen 4 C” cores, and will come with the other features of Genoa: DDR5, PCIe Gen 5, CXL 1.1, and the full suite of Infinity Guard security features. Further, it is socket compatible with Genoa with the same Zen 4 instruction set. Bergamo is on track to start shipping in the first half of 2023, Su said.

“Our investment in multi-generational CPU core roadmaps combined with advanced process and packaging technology enables us to deliver leadership across general purpose technical computing and cloud workloads,” said Su. “You can count on us to continue to push the envelope in high-performance computing.”

AMD also announced version 5.0 of ROCm, its open software platform that supports environments across multiple accelerator vendors and architectures. “With ROCm 5.0, we’re adding support and optimization for the MI200, expanding ROCm support to include the Radeon Pro W6800 workstation GPUs, and improving developer tools that increase end user productivity,” said AMD’s Corporate Vice President, GPU Platforms, Brad McCredie in a media briefing last week.

The company also introduced AMD Infinity Hub, an online portal where developers can access documentation, tools and education materials for HIP and OpenMP, and system administrators and scientists can download containerized HPC apps and ML frameworks that are optimized and supported on AMD platforms.

Commenting on today’s news raft, market watcher Addison Snell, CEO of Intersect360 Research, said, “AMD has set the new bar for performance in HPC – in CPU, in GPU, and in packaging both together. Either Milan-X or MI200 makes a statement on its own – multiple statements, based on the benchmarks. Having coherent memory over Infinity Fabric is a game-changer that neither Intel nor Nvidia is going to be able to match soon.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

SC21 Was Unlike Any Other — Was That a Good Thing?

December 3, 2021

For a long time, the promised in-person SC21 seemed like an impossible fever dream, the assurances of a prominent physical component persisting across years of canceled conferences, including two virtual ISCs and the virtual SC20. With the advent of the Delta variant, Covid surges in St. Louis and contention over vaccine requirements... Read more…

The Green500’s Crystal Anniversary Sees MN-3 Crystallize Its Winning Streak

December 2, 2021

“This is the 30th Green500,” said Wu Feng, custodian of the Green500 list, at the list’s SC21 birds-of-a-feather session. “You could say 15 years of Green500, which makes it, I guess, the crystal anniversary.” Indeed, HPCwire marked the 15th anniversary of the Green500 – which ranks supercomputers by flops-per-watt, rather than just by flops – earlier this year with... Read more…

AWS Arm-based Graviton3 Instances Now in Preview

December 1, 2021

Three years after unveiling the first generation of its AWS Graviton chip-powered instances in 2018, Amazon Web Services announced that the third generation of the processors – the AWS Graviton3 – will power all-new Amazon Elastic Compute 2 (EC2) C7g instances that are now available in preview. Debuting at the AWS re:Invent 2021... Read more…

Nvidia Dominates Latest MLPerf Results but Competitors Start Speaking Up

December 1, 2021

MLCommons today released its fifth round of MLPerf training benchmark results with Nvidia GPUs again dominating. That said, a few other AI accelerator companies participated and, one of them, Graphcore, even held a separ Read more…

HPC Career Notes: December 2021 Edition

December 1, 2021

In this monthly feature, we’ll keep you up-to-date on the latest career developments for individuals in the high-performance computing community. Whether it’s a promotion, new company hire, or even an accolade, we’ Read more…

AWS Solution Channel

Running a 3.2M vCPU HPC Workload on AWS with YellowDog

Historically, advances in fields such as meteorology, healthcare, and engineering, were achieved through large investments in on-premises computing infrastructure. Upfront capital investment and operational complexity have been the accepted norm of large-scale HPC research. Read more…

At SC21, Experts Ask: Can Fast HPC Be Green?

November 30, 2021

HPC is entering a new era: exascale is (somewhat) officially here, but Moore’s law is ending. Power consumption and other sustainability concerns loom over the enormous systems and chips of this new epoch, for both cost and compliance reasons. Reconciling the need to continue the supercomputer scale-up while reducing HPC’s environmental impacts... Read more…

SC21 Was Unlike Any Other — Was That a Good Thing?

December 3, 2021

For a long time, the promised in-person SC21 seemed like an impossible fever dream, the assurances of a prominent physical component persisting across years of canceled conferences, including two virtual ISCs and the virtual SC20. With the advent of the Delta variant, Covid surges in St. Louis and contention over vaccine requirements... Read more…

The Green500’s Crystal Anniversary Sees MN-3 Crystallize Its Winning Streak

December 2, 2021

“This is the 30th Green500,” said Wu Feng, custodian of the Green500 list, at the list’s SC21 birds-of-a-feather session. “You could say 15 years of Green500, which makes it, I guess, the crystal anniversary.” Indeed, HPCwire marked the 15th anniversary of the Green500 – which ranks supercomputers by flops-per-watt, rather than just by flops – earlier this year with... Read more…

Nvidia Dominates Latest MLPerf Results but Competitors Start Speaking Up

December 1, 2021

MLCommons today released its fifth round of MLPerf training benchmark results with Nvidia GPUs again dominating. That said, a few other AI accelerator companies Read more…

At SC21, Experts Ask: Can Fast HPC Be Green?

November 30, 2021

HPC is entering a new era: exascale is (somewhat) officially here, but Moore’s law is ending. Power consumption and other sustainability concerns loom over the enormous systems and chips of this new epoch, for both cost and compliance reasons. Reconciling the need to continue the supercomputer scale-up while reducing HPC’s environmental impacts... Read more…

Raja Koduri and Satoshi Matsuoka Discuss the Future of HPC at SC21

November 29, 2021

HPCwire's Managing Editor sits down with Intel's Raja Koduri and Riken's Satoshi Matsuoka in St. Louis for an off-the-cuff conversation about their SC21 experience, what comes after exascale and why they are collaborating. Koduri, senior vice president and general manager of Intel's accelerated computing systems and graphics (AXG) group, leads the team... Read more…

Jack Dongarra on SC21, the Top500 and His Retirement Plans

November 29, 2021

HPCwire's Managing Editor sits down with Jack Dongarra, Top500 co-founder and Distinguished Professor at the University of Tennessee, during SC21 in St. Louis to discuss the 2021 Top500 list, the outlook for global exascale computing, and what exactly is going on in that Viking helmet photo. Read more…

SC21: Larry Smarr on The Rise of Supernetwork Data Intensive Computing

November 26, 2021

Larry Smarr, founding director of Calit2 (now Distinguished Professor Emeritus at the University of California San Diego) and the first director of NCSA, is one of the seminal figures in the U.S. supercomputing community. What began as a personal drive, shared by others, to spur the creation of supercomputers in the U.S. for scientific use, later expanded into a... Read more…

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed

November 24, 2021

Details about two previously rumored Chinese exascale systems came to light during last week’s SC21 proceedings. Asked about these systems during the Top500 media briefing on Monday, Nov. 15, list author and co-founder Jack Dongarra indicated he was aware of some very impressive results, but withheld comment when asked directly if he had... Read more…

IonQ Is First Quantum Startup to Go Public; Will It be First to Deliver Profits?

November 3, 2021

On October 1 of this year, IonQ became the first pure-play quantum computing start-up to go public. At this writing, the stock (NYSE: IONQ) was around $15 and its market capitalization was roughly $2.89 billion. Co-founder and chief scientist Chris Monroe says it was fun to have a few of the company’s roughly 100 employees travel to New York to ring the opening bell of the New York Stock... Read more…

Enter Dojo: Tesla Reveals Design for Modular Supercomputer & D1 Chip

August 20, 2021

Two months ago, Tesla revealed a massive GPU cluster that it said was “roughly the number five supercomputer in the world,” and which was just a precursor to Tesla’s real supercomputing moonshot: the long-rumored, little-detailed Dojo system. Read more…

Esperanto, Silicon in Hand, Champions the Efficiency of Its 1,092-Core RISC-V Chip

August 27, 2021

Esperanto Technologies made waves last December when it announced ET-SoC-1, a new RISC-V-based chip aimed at machine learning that packed nearly 1,100 cores onto a package small enough to fit six times over on a single PCIe card. Now, Esperanto is back, silicon in-hand and taking aim... Read more…

US Closes in on Exascale: Frontier Installation Is Underway

September 29, 2021

At the Advanced Scientific Computing Advisory Committee (ASCAC) meeting, held by Zoom this week (Sept. 29-30), it was revealed that the Frontier supercomputer is currently being installed at Oak Ridge National Laboratory in Oak Ridge, Tenn. The staff at the Oak Ridge Leadership... Read more…

AMD Launches Milan-X CPU with 3D V-Cache and Multichip Instinct MI200 GPU

November 8, 2021

At a virtual event this morning, AMD CEO Lisa Su unveiled the company’s latest and much-anticipated server products: the new Milan-X CPU, which leverages AMD’s new 3D V-Cache technology; and its new Instinct MI200 GPU, which provides up to 220 compute units across two Infinity Fabric-connected dies, delivering an astounding 47.9 peak double-precision teraflops. “We're in a high-performance computing megacycle, driven by the growing need to deploy additional compute performance... Read more…

Intel Reorgs HPC Group, Creates Two ‘Super Compute’ Groups

October 15, 2021

Following on changes made in June that moved Intel’s HPC unit out of the Data Platform Group and into the newly created Accelerated Computing Systems and Graphics (AXG) business unit, led by Raja Koduri, Intel is making further updates to the HPC group and announcing... Read more…

Intel Completes LLVM Adoption; Will End Updates to Classic C/C++ Compilers in Future

August 10, 2021

Intel reported in a blog this week that its adoption of the open source LLVM architecture for Intel’s C/C++ compiler is complete. The transition is part of In Read more…

Killer Instinct: AMD’s Multi-Chip MI200 GPU Readies for a Major Global Debut

October 21, 2021

AMD’s next-generation supercomputer GPU is on its way – and by all appearances, it’s about to make a name for itself. The AMD Radeon Instinct MI200 GPU (a successor to the MI100) will, over the next year, begin to power three massive systems on three continents: the United States’ exascale Frontier system; the European Union’s pre-exascale LUMI system; and Australia’s petascale Setonix system. Read more…

Leading Solution Providers

Contributors

Hot Chips: Here Come the DPUs and IPUs from Arm, Nvidia and Intel

August 25, 2021

The emergence of data processing units (DPU) and infrastructure processing units (IPU) as potentially important pieces in cloud and datacenter architectures was Read more…

D-Wave Embraces Gate-Based Quantum Computing; Charts Path Forward

October 21, 2021

Earlier this month D-Wave Systems, the quantum computing pioneer that has long championed quantum annealing-based quantum computing (and sometimes taken heat fo Read more…

HPE Wins $2B GreenLake HPC-as-a-Service Deal with NSA

September 1, 2021

In the heated, oft-contentious, government IT space, HPE has won a massive $2 billion contract to provide HPC and AI services to the United States’ National Security Agency (NSA). Following on the heels of the now-canceled $10 billion JEDI contract (reissued as JWCC) and a $10 billion... Read more…

The Latest MLPerf Inference Results: Nvidia GPUs Hold Sway but Here Come CPUs and Intel

September 22, 2021

The latest round of MLPerf inference benchmark (v 1.1) results was released today and Nvidia again dominated, sweeping the top spots in the closed (apples-to-ap Read more…

Ahead of ‘Dojo,’ Tesla Reveals Its Massive Precursor Supercomputer

June 22, 2021

In spring 2019, Tesla made cryptic reference to a project called Dojo, a “super-powerful training computer” for video data processing. Then, in summer 2020, Tesla CEO Elon Musk tweeted: “Tesla is developing a [neural network] training computer... Read more…

Three Chinese Exascale Systems Detailed at SC21: Two Operational and One Delayed

November 24, 2021

Details about two previously rumored Chinese exascale systems came to light during last week’s SC21 proceedings. Asked about these systems during the Top500 media briefing on Monday, Nov. 15, list author and co-founder Jack Dongarra indicated he was aware of some very impressive results, but withheld comment when asked directly if he had... Read more…

2021 Gordon Bell Prize Goes to Exascale-Powered Quantum Supremacy Challenge

November 18, 2021

Today at the hybrid virtual/in-person SC21 conference, the organizers announced the winners of the 2021 ACM Gordon Bell Prize: a team of Chinese researchers leveraging the new exascale Sunway system to simulate quantum circuits. The Gordon Bell Prize, which comes with an award of $10,000 courtesy of HPC pioneer Gordon Bell, is awarded annually... Read more…

Quantum Computer Market Headed to $830M in 2024

September 13, 2021

What is one to make of the quantum computing market? Energized (lots of funding) but still chaotic and advancing in unpredictable ways (e.g. competing qubit tec Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire