Samsung today announced what it’s calling the industry’s first high bandwidth memory (HBM) memory with built-in AI processing capability. The new device – HBM-PIM (processing in memory) – has embedded programmable compute units (PCU) with each memory bank.
“[The HBM-PIM] is tailored for diverse AI-driven workloads such as HPC, training and inference. We plan to build upon this breakthrough by further collaborating with AI solution providers for even more advanced PIM-powered applications,” said Kwangil Park, SVP, memory product planning, Samsung, in the announcement.
Samsung says the HBM-PIM brings processing power directly to where the data is stored by placing a DRAM-optimized AI engine inside each memory bank — a storage sub-unit — enabling parallel processing and minimizing data movement. “When applied to Samsung’s existing HBM2 Aquabolt solution, the new architecture is able to deliver over twice the system performance while reducing energy consumption by more than 70%. The HBM-PIM also does not require any hardware or software changes, allowing faster integration into existing systems,” reported Samsung.
According to a report from Tom’s HARDWARE, “[E]ach memory bank has an embedded Programmable Computing Unit (PCU) that runs at 300 MHz. This unit is controlled via conventional memory commands from the host to enable in-DRAM processing, and it can execute various FP16 computations. The memory can also operate in either standard mode, meaning it operates as normal HBM2, or in FIM mode for in-memory data processing.
“Naturally, making room for the PCU units reduces memory capacity — each PCU-equipped memory die has half the capacity (4Gb) per die compared to a standard 8Gb HBM2 die. To help defray that issue, Samsung employs 6GB stacks by combining four 4Gb die with PCUs with four 8Gb dies without PCUs (as opposed to an 8GB stack with normal HBM2).”
Samsung’s paper (not yet freely available) reportedly describes the underlying tech as: “[A] Function-In Memory DRAM (FIMDRAM) that integrates a 16-wide single-instruction multiple-data engine within the memory banks and that exploits bank-level parallelism to provide 4× higher processing bandwidth than an off-chip memory solution. Second, we show techniques that do not require any modification to conventional memory controllers and their command protocols, which make FIMDRAM more practical for quick industry adoption.”
In the Samsung announcement, Rick Stevens, associate laboratory director for computing, environment and life sciences, Argonne National Laboratory, is quoted saying, “[The] HBM-PIM design has demonstrated impressive performance and power gains on important classes of AI applications, so we look forward to working together to evaluate its performance on additional problems of interest to Argonne National Laboratory.” Argonne has become an aggressive tester of new AI chip and system technology.