At the IEEE International Devices Meeting being held (virtually) this week IBM is rolling key research aimed at boosting AI and hybrid cloud technology. One of the more prominent efforts showcased is IBM’s success fabbing the first 14nm node embedded Spin-Transfer-Torque (STT) MRAM (eMRAM). IBM noted the work in a blog posted yesterday.
“The circuit functionality was demonstrated with read/write tests having write pulses as short as 4 ns, and with much reduced write bias for pulse widths in the 10-20ns range. These and other performance metrics indicate great potential for this technology in mobile cache and similar applications,” write IBM researchers in a paper being presented tomorrow at the conference.
The latest demonstration is compatible with existing CMOS logic design rules according to IBM researchers Abu Sebastian, Griselda Bonilla, and Dan Edelstein, authors of the blog.
Initial STT-MRAM products have focused on eFlash replacement and standalone storage products. STT-MRAM also has the potential to be used as a working memory in more advanced embedded applications, including mobile cache at ~15 ns write times, and ultimately last-level cache at ~2 ns write times, reported IBM.
“However, these advanced applications have been limited by two key challenges: 1) improving MTJ performance to reduce the write currents while controlling distributions; and 2) increasing the MRAM/CMOS circuit and cell density for advanced-node scaling. Previous leading work, all at the 28nm – 22nm nodes, highlighted the challenge of integrating tight-pitch MTJs within the short vertical space available between BEOL metal levels – a challenge which has so far prevented 14nm node eMRAM from being developed,” according to the IBM paper (A 14 nm Embedded STT-MRAM CMOS Technology),” say IBM researchers in the paper. (See figures from paper below.)
IBM was able to mitigate these issues. “Using a 2Mb eMRAM macro, we achieve an integration at tight MTJ pitch (160 nm), which fits vertically between M1 and M2. This placement maximizes eMRAM circuit performance by eliminating stacked BEOL parasitics, and reduces chip size and cost by clearing upper wiring tracks for logic, and reducing total number of levels to wire large arrays (these may need n+3 Cu levels for MTJs placed on level Mn, hence the advantage of n=1). We demonstrate read and write functionality, including write performance down to 4ns, and show that the eMRAM process module can be added while maintaining the logic BEOL reliability requirements,” reported the researchers.
The blogpost noted, “Data transfer bottlenecks have long been a problem for large workloads and create a challenge for running AI workloads in hybrid cloud environments. STT-MRAM uses electron spin to store data in magnetic domains, combining the high speed of Static RAM (SRAM) and the high density of DRAM—both of which rely on electrical charges for storage—to offer a more dependable storage solution.”
IBM will further discuss the technology in a second STT-MRAM paper, “Demonstration of Narrow Switching Distributions in STT-MRAM Arrays for LLC Applications at 1x nm Node.” This work demonstrates advanced magnetic materials with high-speed of 3 ns switching and tight distributions of the switching current. “Optimizing switching speed characteristics is another key step toward use of MRAM as last-level cache. By speeding up the exchange between memory and compute, this enhanced design promises to deliver a much more efficient, higher-performing system.
“Together, these advances point to MRAM’s steady march toward achieving superior density and increased speed needed to replace SRAM for CPU caches. That would be a whole new application for MRAM, which is typically used today as either a replacement for NAND flash memory or as a stand-alone storage chip, and significantly increase data retrieval performance,” write Sebastian, Bonilla, and Edelstein.
IBM will also report advances in phase change memory:
- The accurate mapping of synaptic weights onto analog non-volatile memory devices for deep learning inference is a considerable challenge to developing analog AI cores. Synaptic weight indicates the strength of a connection between two nodes in a neural network. In the paper, “Precision of Synaptic Weights Programmed in Phase-Change Memory Devices for Deep Learning Inference,” IBM researchers discuss how analog resistance-based memory devices such as PCM in in-memory computing applications could address the mapping challenge. Their work addresses how to accurately map the synaptic weights analytically and through array-level experiments. The paper also analyzes the impact of inaccuracy associated with synaptic weight storage on a range of networks for some common AI applications: image classification and language modeling.
- A second analog AI paper, “Unassisted True Analog Neural Network Training Chip,” details the first analog neural network training chip—a resistive processing unit, or RPU—to demonstrate the elusive “analog advantage” in AI training. Analog advantage occurs when analog neural network training is faster than a comparable digital system in real time. The researchers achieved this speedup by performing all Multiply and Accumulate (MAC) functions in analog cross-point arrays and updating all weights in parallel.
Link to IBM blog, https://www.ibm.com/blogs/research/2020/12/iedm2020-memory-analog-ai/
Link to IEEE IDEM 2020, https://ieee-iedm.org/program/