Samsung HBM-PIM - Processing in Memory

Discussion in 'Graphics and Semiconductor Industry' started by Jawed, Feb 17, 2021.

  1. Jawed

    Jawed Legend

    Wow, literally in the memory:

    Samsung's New HBM2 Memory Has 1.2 TFLOPS of Embedded Processing Power | Tom's Hardware

    Now it can be argued that this much compute is barely worth bothering with (it's only FP16 FLOPS being counted).

    The cost/packaging/thermal constraints of HBM and the 20nm node being used here seem to indicate this is a prototyping sample for partners and I suppose there'll be a couple of years of testing...
     
    Tags:
  2. Tkumpathenurpahl

    Tkumpathenurpahl Oil Monsieur Geezer Veteran

    As an HBM fanboy, this excites me enormously
     
    Mitchings likes this.
  3. HLJ

    HLJ Regular

    Important detail:
    This cut the memory amount in half per chip...performance is never free ;)
     
    Lightman and pharma like this.
  4. Jawed

    Jawed Legend

    By a quarter.
     
  5. iceberg187

    iceberg187 Regular

  6. HLJ

    HLJ Regular

    "Naturally, making room for the PCU units reduces memory capacity — each PCU-equipped memory die has half the capacity (4Gb) per die compared to a standard 8Gb HBM2 die. To help defray that issue, Samsung employs 6GB stacks by combining four 4Gb die with PCUs with four 8Gb dies without PCUs (as opposed to an 8GB stack with normal HBM2)."

     
  7. Jawed

    Jawed Legend

    If you'd said die, I wouldn't have posted :) You said chip though.
     
    HLJ likes this.
  8. HLJ

    HLJ Regular

    Ahhh :D
     
  9. FP16 calculations in the HBM doesn't seem to be very useful for GPUs, and if the target was AI applications I wonder if it would be better served by Samsung's own NPUs consisted of CPU cores and MAC engines.
    At first this looks like more of an academic exercise, but I can't figure out if Samsung intends to sell these half-RAM/half-FP16 ALUs dies as they are right now.

    For GPUs, I wonder if they could put e.g. ROPs in there instead of the FP16 ALUs. It would be akin to Xenos' eDRAM die.
    It would allow for a more modular approach to memory channels on a GPU, as total bandwidth is often scaled according to the number of ROPs enabled (and vice versa).
     
  10. Rootax

    Rootax Veteran

    So the fp16 alus have like as low latency as possible access to the ram ?
     
Loading...

Share This Page

Loading...