They customized the Command Processor on X1 and customized it further in the X1X so I would expect further customizations here too.This slide references a "multi core command processor". Can't find any mention of the command processor in RDNA 1 being multi core.
In the past MS do seem to have have liked their custom command processors, perhaps this one is too...
The "Geometry Engine" appearing here is also interesting...This slide references a "multi core command processor". Can't find any mention of the command processor in RDNA 1 being multi core.
In the past MS do seem to have have liked their custom command processors, perhaps this one is too...
Not really. It's part of rdna1 so it makes sense it's in rdna2 as well.The "Geometry Engine" appearing here is also interesting...
I think RDNA 1 does this as well, but perhaps not as broken down as this.
This is some interesting insight into RDNA 2 too, 7-issue superscalar?
This is a similar rate to GCN and RDNA. Basically it means the arbitrator selects up to N instructions from all non-blocked wavefronts, but only one instruction will ever be selected per wavefront. Otherwise, either the ISA needs to be compiler scheduled VLIW (definitely not the RDNA ISA) for co-issuing from the same wavefront, or the hardware itself needs to do scoreboarding which most GPU vendors deliberately avoid.This is some interesting insight into RDNA 2 too, 7-issue superscalar?
A bit OT, but how is this normally defined? Doesn't superscalar just mean multiple instructions being executed in parallel on different subsystems?So it is not superscalar by the books, and it is unlikely to change.
Yeah, so still new insight, not necessarily RDNA2-specific.I think RDNA 1 does this as well, but perhaps not as broken down as this.
RDNA issues 2 Vector ALU and 2 scalar as per what the white paper says. The 1 Vector Data and 2 Control may belong to the CU. So that's still 7.
Is scoreboarding less costly to implement than Tomasulo's? Although it seems unnecessary, like you said, GPUs don't need to deal with name&data dependence...This is a similar rate to GCN and RDNA. Basically it means the arbitrator selects up to N instructions from all non-blocked wavefronts, but only one instruction will ever be selected per wavefront. Otherwise, either the ISA needs to be compiler scheduled VLIW (definitely not the RDNA ISA) for co-issuing from the same wavefront, or the hardware itself needs to do scoreboarding which most GPU vendors deliberately avoid.
So it is not superscalar by the books, and it is unlikely changing.
Traditionally I think it implies multiple instructions that belong to the same thread, therefore control and data hazards arise?A bit OT, but how is this normally defined? Doesn't superscalar just mean multiple instructions being executed in parallel on different subsystems?
Yea, I think I understand where he's going with this.Traditionally I think it implies multiple instructions that belong to the same thread, therefore control and data hazards arise?
In another example, a n-way SMT core launches <= n instructions per cycle, but it's not called superscalar.Yea, I think I understand where he's going with this.
The dispatcher needs to look at all the commands in queue and figure out which ones to run in parallel together and which ones individually. So in effect there is some form of scoreboarding of instructions happening.
I believe there is no real dispatcher for the CU or a single instruction queue for it to look at wrt RDNA 1, the instructions (vector, memory and scalar) are split up into separate memory pools on the CU iirc..
I am not sure how you can prove this, neither did I remark to its nature as hardware or software, I simply state that its custom....
On that you can play with shader disassembly here, just select Radeon GPU Analyzer as compiler....and does not require manually inserted wait state even though now it exposes the multi-cycle execution pipeline (unlike GCN).
But hardware is involved. So perhaps I'm not understanding how their patent is only software basedI didn’t say you said it was hardware
Software is inherently custom unless it’s OSS.
Once AMD confirmed RDNA 2 was full Tier 2 VRS, I went and re-read the MS VRS patent and came away with the above interpretation you quoted.
Of course hardware is involved. RDNA 2 is VRS tier 2 compatible.But hardware is involved. So perhaps I'm not understanding how their patent is only software based
- Tiny area cost for 10-30% performance gain
Hot Chips is not about marketing or about consumers, its about technical matters, and that slide is entirely accurate.