IBM at Hot Chips: What’s Next for Power

By Tiffany Trader

August 23, 2018

With processor, memory and networking technologies all racing to fill in for an ailing Moore’s law, the era of the heterogeneous datacenter is well underway, and IBM is positioning its chips to be the air traffic controller at the center of it all. That was the high-level takeway of our interview with IBM Power architects Jeff Stuecheli and Bill Starke at Hot Chips this week.

The accomplished engineers were at the 30th iteration of Hot Chips to focus on the Power9 scale-up chips and servers, but they also provided details on upcoming developments in the roadmap, including a new buffered memory system suitable for scale-out processors.

Source: IBM slide (Hot Chips 30)

Having launched both the scale-out and scale-up Power9s, IBM is now working on a third P9 variant with “advanced I/O,” featuring IBM’s 25 GT/s PowerAXON signaling technology with upgraded OpenCAPI and NVLink protocols, and a new open standard for buffered memory.

AXON is an inspired appellation that the Power engineering team came up with and the IBM marketing team signed off on. The A and X are designations for IBM’s SMP buses – X links are on-module and A links are off; the O and N stand for OpenCAPI and NVLINK, respectively. The convenient acronym would be fine at that, but aligning with IBM’s penchant for cognitive computing, axons are the brain’s signalling devices, allowing neurons to communicate, so you could say, as Witnix founder and former Harvard HPC guy James Cuff did, that AI is literally “built right into the wire.”

You can see on these annotated Power9 die shots how going from a DDR memory controller to a redrive memory controller and going to smaller PHYs enabled IBM to double the number of AXON lanes.

“The PowerAXON concept gives us a lot of flexibility,” said Stuecheli. “One chip can be deployed to be a big SMP, it can be deployed to talk to lots of GPUs, it can talk to a mix of FPGAs and GPUs – that’s really our goal here is to build a processor that can then be customized toward these domain specific applications.”

For its future products, IBM is focusing on lots of lanes and lots of frequency. Its Power10 roadmap incorporates 32 GT/s signalling technology that will be able to run in 50 GT/s mode.

The idea that IO is composable is what OpenCAPI and PowerAXON are all about – and now IBM is bringing this same ethos to memory through the development of an open standard for buffered memory, appropriately called OpenCAPI memory.

With both Power8 and Power9, the chips made for scale-out boxes support direct-attached memory, while the scale-up variants, intended for machines with more than two sockets, employ buffered memory. The buffered memory system puts DRAM chips right next to IBM’s Centaur buffer chip (see figure below-right), enabling a large number of DDR channels to be funneled into one processor over SERDES. The agnostic interface hides the exact memory technology that’s on the DIMM from the processor, so the processor can work with different kinds of memory. This decoupling of memory technology from the processor technology means that, for example, enterprise customers upgrading from Power8 to Power9 can keep their existing DDR4 DRAM DIMMs.

Power9 Scale Up chipset block diagram

Stuecheli shared that the current buffered memory system (on Power8 and Power9 SU chips) adds a latency of approximately 10 nanoseconds compared to direct attached. This minimal overhead was accomplished “through careful framing of the packets as they go across the SERDES and bypasses in the DDR scheduling,” said Stuecheli.

While the Centaur-based approach is enterprise-focused, IBM wanted to offer the same buffered memory in its scale-out products. They are planning to introduce this capability as an open standard in the third (and presumably final) Power9 variant, due out in 2019. “We’ve been working through JEDEC to build memory DIMMs based around a thin buffer design,” said Stuecheli. “If you have an accelerator and you don’t like having that big expensive DDR PHY on it and you want to use just traditional SERDES to talk to memory you can do so with the new standardized memory interface we’re building,” he told the audience at Hot Chips. The interface spans from 1U small memory form factors all the way up to big tall DIMMs. The aim is to have an agnostic interface that attaches to a variety of memory types to it, whether that’s storage-class memory, or very high bandwidth, low capacity memory.

While the latency add was 10 nanoseconds on the proprietary design (with one port going to four DDR ports with a 16MB cache lookup), the new buffer IBM is building is a single port design with a single interface. It’s a much smaller chip without the cache, and IBM thinks it can reduce this latency to 5 nanoseconds. Stuecheli said that company-run simulations with loaded latency showed it doesn’t take much load at all before providing much lower latency than a direct-attached solution.

The roadmap shows the anticipated increase in memory bandwidth owing to the new memory system. Where the Power9 SU chip offers 210 GB/s of memory bandwidth (and Stuecheli says it’s actually closer to 230 GB/s), the next Power9 derivative chip, with the new memory technology, will be capable of deploying 350 GB/s per socket of bandwidth, according to Stuecheli.

“If you’re in HPC and disappointed in your bytes-per-flop ratio, that’s a pretty big improvement,” he said, adding “we’re taking what was essentially the Power10 memory subsystem and implementing that in Power9.” With Power10 bringing in DDR5, IBM expects to surpass 435 GB/s sustained memory bandwidth.

IBM believes that it has the right approach to push past DDR limitations. “When you think of Moore’s law kind of winding down, slowing down, you think of single-ended signaling with DDR memory slowing down,” Bill Starke said in a pre-briefing. “This composable system construct [that IBM is architecting] is enabling a proliferation of more heterogeneity in compute technology, along with a wider variation of memory technologies, all in this composable plug-and-play, put-it-together-how-you-want way where it’s all about a big high-bandwidth low-latency switching infrastructure.”

“With the flexibility of the attach on the memory side and on the compute acceleration side, it really boils down to thinking of the CPU chip as this big switch,” Stuecheli followed, “this big data switch that’s just one big pile of bandwidth connectivity that’s enabling any kind of memory to talk to any kind of acceleration, and it all plumbs right past the powerful general-purpose processor cores, so you’re pulling that whole compute estate together.”

HPC analyst Addison Snell (CEO of Intersect360 Research) came away from Tuesday’s Hot Chips talk with a favorable impression of the Power play. “IBM’s presentation at Hot Chips underscored two major themes,” Snell commented by email. “One, Power9 has excellent memory bandwidth and performance. Two, it is a great platform for attaching accelerators or co-processors. It’s an odd statement of direction, but maybe a visionary one, essentially saying a processor isn’t about computation per se, but rather it’s about feeding data to other computational elements.”

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industy updates delivered to you every week!

Q&A with Altair CEO James Scapa, an HPCwire Person to Watch in 2021

May 14, 2021

Chairman, CEO and co-founder of Altair James R. Scapa closed several acquisitions for the company in 2020, including the purchase and integration of Univa and Ellexus. Scapa founded Altair more than 35 years ago with two Read more…

HLRS HPC Helps to Model Muscle Movements

May 13, 2021

The growing scale of HPC is allowing simulation of more and more complex systems at greater detail than ever before, particularly in the biological research spheres. Now, researchers at the University of Stuttgart are le Read more…

Behind the Met Office’s Procurement of a Billion-Dollar Microsoft System

May 13, 2021

The UK’s national weather service, the Met Office, caused shockwaves of curiosity a few weeks ago when it formally announced that its forthcoming billion-dollar supercomputer – expected to be the most powerful weather and climate-focused supercomputer in the world when it launches in 2022... Read more…

AMD, GlobalFoundries Commit to $1.6 Billion Wafer Supply Deal

May 13, 2021

AMD plans to purchase $1.6 billion worth of wafers from GlobalFoundries in the 2022 to 2024 timeframe, the chipmaker revealed today (May 13) in an SEC filing. In the face of global semiconductor shortages and record-high demand, AMD is renegotiating its Wafer Supply Agreement and bumping up capacity. Read more…

Hyperion Offers Snapshot of Quantum Computing Market

May 13, 2021

The nascent quantum computer (QC) market will grow 27 percent annually (CAGR) reaching $830 million in 2024 according to an update provided today by analyst firm Hyperion Research at the HPC User Forum being held this we Read more…

AWS Solution Channel

Numerical weather prediction on AWS Graviton2

The Weather Research and Forecasting (WRF) model is a numerical weather prediction (NWP) system designed to serve both atmospheric research and operational forecasting needs. Read more…

Hyperion: HPC Server Market Ekes 1 Percent Gain in 2020, Storage Poised for ‘Tipping Point’

May 12, 2021

The HPC User Forum meeting taking place virtually this week (May 11-13) kicked off with Hyperion Research’s market update, covering the 2020 period. Although the HPC server market had been facing a 6.7 percent COVID-re Read more…

Behind the Met Office’s Procurement of a Billion-Dollar Microsoft System

May 13, 2021

The UK’s national weather service, the Met Office, caused shockwaves of curiosity a few weeks ago when it formally announced that its forthcoming billion-dollar supercomputer – expected to be the most powerful weather and climate-focused supercomputer in the world when it launches in 2022... Read more…

AMD, GlobalFoundries Commit to $1.6 Billion Wafer Supply Deal

May 13, 2021

AMD plans to purchase $1.6 billion worth of wafers from GlobalFoundries in the 2022 to 2024 timeframe, the chipmaker revealed today (May 13) in an SEC filing. In the face of global semiconductor shortages and record-high demand, AMD is renegotiating its Wafer Supply Agreement and bumping up capacity. Read more…

Hyperion Offers Snapshot of Quantum Computing Market

May 13, 2021

The nascent quantum computer (QC) market will grow 27 percent annually (CAGR) reaching $830 million in 2024 according to an update provided today by analyst fir Read more…

Hyperion: HPC Server Market Ekes 1 Percent Gain in 2020, Storage Poised for ‘Tipping Point’

May 12, 2021

The HPC User Forum meeting taking place virtually this week (May 11-13) kicked off with Hyperion Research’s market update, covering the 2020 period. Although Read more…

IBM Debuts Qiskit Runtime for Quantum Computing; Reports Dramatic Speed-up

May 11, 2021

In conjunction with its virtual Think event, IBM today introduced an enhanced Qiskit Runtime Software for quantum computing, which it says demonstrated 120x spe Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Fast Pass Through (Some of) the Quantum Landscape with ORNL’s Raphael Pooser

May 7, 2021

In a rather remarkable way, and despite the frequent hype, the behind-the-scenes work of developing quantum computing has dramatically accelerated in the past f Read more…

IBM Research Debuts 2nm Test Chip with 50 Billion Transistors

May 6, 2021

IBM Research today announced the successful prototyping of the world's first 2 nanometer chip, fabricated with silicon nanosheet technology on a standard 300mm Read more…

AMD Chipmaker TSMC to Use AMD Chips for Chipmaking

May 8, 2021

TSMC has tapped AMD to support its major manufacturing and R&D workloads. AMD will provide its Epyc Rome 7702P CPUs – with 64 cores operating at a base cl Read more…

Intel Launches 10nm ‘Ice Lake’ Datacenter CPU with Up to 40 Cores

April 6, 2021

The wait is over. Today Intel officially launched its 10nm datacenter CPU, the third-generation Intel Xeon Scalable processor, codenamed Ice Lake. With up to 40 Read more…

Julia Update: Adoption Keeps Climbing; Is It a Python Challenger?

January 13, 2021

The rapid adoption of Julia, the open source, high level programing language with roots at MIT, shows no sign of slowing according to data from Julialang.org. I Read more…

CERN Is Betting Big on Exascale

April 1, 2021

The European Organization for Nuclear Research (CERN) involves 23 countries, 15,000 researchers, billions of dollars a year, and the biggest machine in the worl Read more…

HPE Launches Storage Line Loaded with IBM’s Spectrum Scale File System

April 6, 2021

HPE today launched a new family of storage solutions bundled with IBM’s Spectrum Scale Erasure Code Edition parallel file system (description below) and featu Read more…

10nm, 7nm, 5nm…. Should the Chip Nanometer Metric Be Replaced?

June 1, 2020

The biggest cool factor in server chips is the nanometer. AMD beating Intel to a CPU built on a 7nm process node* – with 5nm and 3nm on the way – has been i Read more…

Saudi Aramco Unveils Dammam 7, Its New Top Ten Supercomputer

January 21, 2021

By revenue, oil and gas giant Saudi Aramco is one of the largest companies in the world, and it has historically employed commensurate amounts of supercomputing Read more…

Quantum Computer Start-up IonQ Plans IPO via SPAC

March 8, 2021

IonQ, a Maryland-based quantum computing start-up working with ion trap technology, plans to go public via a Special Purpose Acquisition Company (SPAC) merger a Read more…

Leading Solution Providers

Contributors

AMD Launches Epyc ‘Milan’ with 19 SKUs for HPC, Enterprise and Hyperscale

March 15, 2021

At a virtual launch event held today (Monday), AMD revealed its third-generation Epyc “Milan” CPU lineup: a set of 19 SKUs -- including the flagship 64-core, 280-watt 7763 part --  aimed at HPC, enterprise and cloud workloads. Notably, the third-gen Epyc Milan chips achieve 19 percent... Read more…

Can Deep Learning Replace Numerical Weather Prediction?

March 3, 2021

Numerical weather prediction (NWP) is a mainstay of supercomputing. Some of the first applications of the first supercomputers dealt with climate modeling, and Read more…

Livermore’s El Capitan Supercomputer to Debut HPE ‘Rabbit’ Near Node Local Storage

February 18, 2021

A near node local storage innovation called Rabbit factored heavily into Lawrence Livermore National Laboratory’s decision to select Cray’s proposal for its CORAL-2 machine, the lab’s first exascale-class supercomputer, El Capitan. Details of this new storage technology were revealed... Read more…

African Supercomputing Center Inaugurates ‘Toubkal,’ Most Powerful Supercomputer on the Continent

February 25, 2021

Historically, Africa hasn’t exactly been synonymous with supercomputing. There are only a handful of supercomputers on the continent, with few ranking on the Read more…

GTC21: Nvidia Launches cuQuantum; Dips a Toe in Quantum Computing

April 13, 2021

Yesterday Nvidia officially dipped a toe into quantum computing with the launch of cuQuantum SDK, a development platform for simulating quantum circuits on GPU-accelerated systems. As Nvidia CEO Jensen Huang emphasized in his keynote, Nvidia doesn’t plan to build... Read more…

New Deep Learning Algorithm Solves Rubik’s Cube

July 25, 2018

Solving (and attempting to solve) Rubik’s Cube has delighted millions of puzzle lovers since 1974 when the cube was invented by Hungarian sculptor and archite Read more…

The History of Supercomputing vs. COVID-19

March 9, 2021

The COVID-19 pandemic poses a greater challenge to the high-performance computing community than any before. HPCwire's coverage of the supercomputing response t Read more…

Microsoft to Provide World’s Most Powerful Weather & Climate Supercomputer for UK’s Met Office

April 22, 2021

More than 14 months ago, the UK government announced plans to invest £1.2 billion ($1.56 billion) into weather and climate supercomputing, including procuremen Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire