AMD RDNA4 Architecture Speculation

And do they definitely not support this for graphics shaders at all, or might that be redacted from the public ISA because they consider it too limited/complicated to be exposed publicly?
Now that I think more about it…

It might have to do with how registers are allocated for Wave64 as well. I recall reading that RDNA 3 seems to:

1. swizzles the VGPR banks used for the wave64 high lanes (0<->1, 2<->3), so e.g., V0_lo is on bank 0, but V0_hi is on bank 1.
2. stores the low halves and high halves as two separate contiguous ranges (drawn in diagram but not called out in text, so perhaps not)

This scheme allows wave64 to feed both 32-lane VALU pipes with two 64-lane operands bank conflict free.

e.g., V0_lo, V0_hi, V1_lo, V1_hi can be read in a single cycle with the scheme.

Depending on the scheme's actual implementation, it could be at odds (?) with how s_alloc_vgpr works.
Though naively speaking, it could have worked still if the lo & high are stored as physically adjacent pairs.
 
Last edited:
For those who have better knowledge of AMD/RDNA's architecture than I do: when they say it doesn't apply to graphics, does that include raytracing? Or is that 'compute' and this is expected to be aggressively used in raytracing? And do they definitely not support this for graphics shaders at all, or might that be redacted from the public ISA because they consider it too limited/complicated to be exposed publicly?
May depend on the API in question. If you're using RTPSOs, ray tracing is definitely ran on their compute pipelines. If you're using inline RT/ray query API with graphics shaders, I don't know if it's truly running on the graphics pipeline or if they just perform a compute dispatch during within those API but I'm inclined to believe the latter. They don't have a specialized HW pipeline for RT like you would see on Intel HW where they have a callable shading pipeline where all RT shaders can be compiled to callable shaders but using inline RT/ray query prevents their HW from making use of this pipeline thus forcing them to run RT on their compute pipeline ...

Don't know if the gfx pipeline exception is an inherent HW design limitation or an artificial driver/compiler limitation yet ...
 
Back
Top