P
Reaction score
526

Latest activity Postings About

    • P
      HWS and ACE have known to be firmware running on embedded cores for a few generations, dated back to at least “HWS” was introduced...
    • P
      If the GPU and the platform supports resizable BAR (“Smart Access Memory”), the GPU can expose its local memory in full in the system...
    • P
      pTmdfx replied to the thread RDNA4.
      You see CP itself as a separable freestanding block. But the graphics pipeline is a monolithic state machine spanning from the central...
    • P
      pTmdfx replied to the thread RDNA4.
      By the looks of the GFX12 LLVM patches so far: 1. No patch having mentioned hardware traversal (yet?); still only image bvh intersect +...
    • P
      pTmdfx replied to the thread AMD Execution Thread [2023].
      I just realized optimizing for LLC locality via access pattern seems a long stretch for MI300A, which has one-forth of the LLC and...
    • P
      pTmdfx replied to the thread AMD Execution Thread [2023].
      It is interleaving across stacks every 4KiB. Each HBM3 stack has 16 channels, so that most likely implies 256B interleaving within a...
    • P
      pTmdfx replied to the thread AMD Execution Thread [2023].
      It looks like memory interleaving is fairly fine-grained (every 256B like many discrete GPUs). Memory access pattern can still be...
  • Loading…
  • Loading…
Back
Top