Recent content by fellix

F
AMD RDNA4 Architecture Speculation

https://old.chipsandcheese.com/2025/03/23/rdna-4s-out-of-order-memory-accesses/
- fellix
- Post #782
- Mar 24, 2025
- Forum: Architecture and Products
F
AMD RDNA4 Architecture Speculation

Yep. Modern GPUs just threw a ton of on-chip cache at the problem.
- fellix
- Post #780
- Mar 14, 2025
- Forum: Architecture and Products
F
AMD RDNA4 Architecture Speculation

Or simply the benefit of HiZ has already diminished to the point of marginal performance obsolescence? May be framebuffer color compression more than compensates for the loss.
- fellix
- Post #767
- Mar 13, 2025
- Forum: Architecture and Products
F
Dynamic register allocation in GPUs

With DXR, Radeon is executing RT shaders in the compute queue as indicated by AMD's own profiler:
- fellix
- Post #19
- Mar 9, 2025
- Forum: Architecture and Products
F
The AMD 9070 / 9070XT Reviews and Discussion Thread

Those rates are for vector op's not WMMA. here's what RDNA3 vector ALU is capable for WMMA:
- fellix
- Post #157
- Mar 7, 2025
- Forum: Architecture and Products
F
The AMD 9070 / 9070XT Reviews and Discussion Thread

Path-tracing is still high bar for team Radeon. Full hardware BHV traversal in RDNA5 maybe?
- fellix
- Post #18
- Mar 5, 2025
- Forum: Architecture and Products
F
AMD RDNA4 Architecture Speculation

I guess in RDNA4 the WMMA op's are handled by an actual dedicated ALU, but very tightly integrated with the vector units i.e. sharing the same data path and issue port so concurrent execution is not possible, just like on RDNA3, but the capabilities are significantly enhanced -- extended type...
- fellix
- Post #740
- Mar 1, 2025
- Forum: Architecture and Products
F
AMD RDNA4 Architecture Speculation

At 22:40 mark: "Dynamic register allocation..." Optimizations for better VOPD handling?
- fellix
- Post #668
- Feb 28, 2025
- Forum: Architecture and Products
F
AMD RDNA4 Architecture Speculation

RDNA3 already had INT4 support for WMMA op's. But the TOPS rates quoted in those RDNA4 specs are too high even ignoring sparsity. Dedicated WMMA units?
- fellix
- Post #638
- Feb 27, 2025
- Forum: Architecture and Products
F
AMD RDNA4 Architecture Speculation

Benchmark leak:
- fellix
- Post #600
- Feb 13, 2025
- Forum: Architecture and Products
F
DirectStorage GPU Decompression, RTX IO, Smart Access Storage
- fellix
- Post #418
- Feb 10, 2025
- Forum: Rendering Technology and APIs
F
Nvidia Blackwell Architecture Speculation

Well, the total L2 size was known since the die shot of GB202 was published.
- fellix
- Post #1,698
- Feb 6, 2025
- Forum: Architecture and Products
F
Nvidia Blackwell Architecture Speculation

Dunno. Could be that INT multiplication is implemented at half rate and this drags the IMAD score down? The programming guide haven't updated the instruction rates chart since Hopper.
- fellix
- Post #1,696
- Feb 5, 2025
- Forum: Architecture and Products
F
Nvidia Blackwell Architecture Speculation

Nvidia's own documentation still claims half INT throughput for Blackwell: https://docs.nvidia.com/cuda/cuda-c-programming-guide/#compute-capability-12-0
- fellix
- Post #1,694
- Feb 5, 2025
- Forum: Architecture and Products
F
The Playstation 3 Before RSX

Wasn't Intel Larrabee project aiming for the same goal as the Cell Visualizer? Anyway, both were killed by the advancements in GPU Compute by 2009 ~ 2010, when IBM gave up on PowerXCell after more and more enterprise customers began exploring GPU solutions for their deployments.
- fellix
- Post #11
- Jan 31, 2025
- Forum: Architecture and Products