Samsung has announced the availability of a new Aquabolt variation. Unlike the typical clock speed jump or capacity improvement you’d expect, this new HBM-PIM can perform calculations directly on-chip that would well be handled by a connected CPU, GPU, or FPGA.

PIM stands for Processor-in-Memory, and it’s a noteworthy achievement for Samsung to drag this off. Processors currently burn a whole lot of power moving data in one place to another. Moving data takes time and costs power. The less time a CPU spends moving data (or waiting on another chip to provide data), the greater time it can spend performing computationally useful work.

CPU developers have worked for this problem for a long time by deploying various cache levels and integrating functionality that once lived in its own socket. Both FPUs and memory controllers were once installed on the motherboard instead of directly built-into the CPU. Chiplets work well directly from this aggregation trend, which is why AMD has already established to be careful that it is Zen 2 and Zen 3 design could boost overall performance while disaggregating the CPU die.

If bringing the CPU and memory closer together is good, building processing elements straight into memory could be better still. Historically, it has been difficult because logic and DRAM are typically built very differently. Samsung has apparently solved this problem, and it’s leveraged the die-stacking capabilities of HBM to help keep available memory density sufficiently high to interest customers. Samsung claims it may generate a a lot more than 2x performance improvement with a 70 % power reduction at the same time, with no required hardware or software changes. The company expects validation to become complete by the end of the first 1 / 2 of this season.

Image by THG

THG has some information regarding the brand new HBM-PIM solution, gleaned from Samsung’s ISSCC presentation now. The brand new chip incorporates a Programmable Computing Unit (PCU) which is just 300MHz. The host controls the PCU via conventional memory commands and can use it to do FP16 calculations directly in-DRAM. The HBM itself can operate either as normal RAM or in FIM mode (Function-in-Memory).

Including the PCU reduces the total available memory capacity, and that's why the FIMDRAM (that’s another term Samsung is applying with this solution) only offers 6GB of capacity per stack instead of the 8GB you’d get with standard HBM2. All of the solutions shown are made on a 20nm DRAM process.

Image by THG

Samsung’s paper describes the look as “Function-In Memory DRAM (FIMDRAM) that integrates a 16-wide single-instruction multiple-data engine inside the memory banks which exploits bank-level parallelism to supply 4× higher processing bandwidth than an off-chip memory solution.”

Image by THG.

One question Samsung hasn’t answered is how it deals with thermal dissipation, a key reason why it’s been historically difficult to build processing logic inside DRAM. This could be doubly difficult with HBM, in which each layer is stacked on top of another. The relatively low clock speed on the PIM can be a method of keeping DRAM cool.

We haven’t seen HBM deployed for CPUs much, Hades Canyon notwithstanding, but multiple high-end GPUs from Nvidia and AMD have tapped HBM/HBM2 as primary memory. It’s unclear if your conventional GPU would take advantage of this offload capability, or how this type of feature could be integrated into the GPUs own impressive computational capacity. If Samsung can offer the performance and power improvements it claims to a range of customers, however, we’ll undoubtedly see this new HBM-PIM appearing in products a year or two from now. A 2x performance boost coupled with a 70 % power consumption decrease may be the type of old-school improvement lithography node transitions accustomed to deliver on a regular basis. It’s not clear if Samsung’s PIM will specifically catch on, but any commitment of a vintage full-node improvement will draw attention, if nothing else.