Having re-read the GCN and RDNA whitepapers, they seemed to indicate that the VRF has been multi-banked (at least 4, I reckon?) with simple bank design (probably 1R1W), and operand gathering/collecting logic does exist at least since RDNA 1. A research paper that has simulation machine modelled after GCN/RDNA also seemed to support the theory. This as well explain how they managed to have transcendental (8 lanes) and DPFP (2 lanes) as separate narrower execution units since RDNA 1 — full 32-lane input operands are read out of the VRF altogether, held in the operand buffer, and spoon fed to these narrower execution units.Yes, that other stuff has been an unknown to me. But, it might be 3R2W on the basis that 3-operand reads in VALU are rare, so just make these other instructions wait or read slowly when reading. <snip>
In this case, if they do need extra VRF bandwidth, they have the options in both upping the # of banks, or amping up the bank design (e.g., +1 read port).
Though it is still unclear to me what role VOPD serves though, considering it seems to be for wave32 only too. A no-dependency cue for operand gatherer? But it could also deduce it by itself, couldn't it? Heh. A theory I can come up with is that the SIMD frontend still defaults to prioritize issuing from two different wave32s. So VOPD exists to force a co-issuing of (most) VALU instructions from the same wave, eh?
Last edited: