Recent content by fellix

  1. F

    AMD RDNA4 Architecture Speculation

    https://old.chipsandcheese.com/2025/03/23/rdna-4s-out-of-order-memory-accesses/
  2. F

    AMD RDNA4 Architecture Speculation

    Yep. Modern GPUs just threw a ton of on-chip cache at the problem.
  3. F

    AMD RDNA4 Architecture Speculation

    Or simply the benefit of HiZ has already diminished to the point of marginal performance obsolescence? May be framebuffer color compression more than compensates for the loss.
  4. F

    Dynamic register allocation in GPUs

    With DXR, Radeon is executing RT shaders in the compute queue as indicated by AMD's own profiler:
  5. F

    The AMD 9070 / 9070XT Reviews and Discussion Thread

    Those rates are for vector op's not WMMA. here's what RDNA3 vector ALU is capable for WMMA:
  6. F

    The AMD 9070 / 9070XT Reviews and Discussion Thread

    Path-tracing is still high bar for team Radeon. Full hardware BHV traversal in RDNA5 maybe?
  7. F

    AMD RDNA4 Architecture Speculation

    I guess in RDNA4 the WMMA op's are handled by an actual dedicated ALU, but very tightly integrated with the vector units i.e. sharing the same data path and issue port so concurrent execution is not possible, just like on RDNA3, but the capabilities are significantly enhanced -- extended type...
  8. F

    AMD RDNA4 Architecture Speculation

    At 22:40 mark: "Dynamic register allocation..." Optimizations for better VOPD handling?
  9. F

    AMD RDNA4 Architecture Speculation

    RDNA3 already had INT4 support for WMMA op's. But the TOPS rates quoted in those RDNA4 specs are too high even ignoring sparsity. Dedicated WMMA units?
  10. F

    AMD RDNA4 Architecture Speculation

    Benchmark leak:
  11. F

    Nvidia Blackwell Architecture Speculation

    Well, the total L2 size was known since the die shot of GB202 was published.
  12. F

    Nvidia Blackwell Architecture Speculation

    Dunno. Could be that INT multiplication is implemented at half rate and this drags the IMAD score down? The programming guide haven't updated the instruction rates chart since Hopper.
  13. F

    Nvidia Blackwell Architecture Speculation

    Nvidia's own documentation still claims half INT throughput for Blackwell: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capability-12-0
  14. F

    The Playstation 3 Before RSX

    Wasn't Intel Larrabee project aiming for the same goal as the Cell Visualizer? Anyway, both were killed by the advancements in GPU Compute by 2009 ~ 2010, when IBM gave up on PowerXCell after more and more enterprise customers began exploring GPU solutions for their deployments.
Back
Top