Leoneazzurro5
Regular
I already pointed out all these are rumors. But, of course, everything we can do at the moment is to speculate.
How is it valid? Separating INT SIMD from FP32 SIMD while still having one 32T/clock scheduler will result in the exact same math throughputs as those on Ampere because said scheduler will still be able to load only two (out of three) SIMDs with work.In any case, the point about Ada/Ampere differences is still valid.
How is it valid? Separating INT SIMD from FP32 SIMD while still having one 32T/clock scheduler will result in the exact same math throughputs as those on Ampere because said scheduler will still be able to load only two (out of three) SIMDs with work.
Well, Kopite was posting about a new SM level scheduler so maybe it is option "b" here - a separate launch port for non-FP32 instructions?Exactly. Lots of hype over nothing.
How is it valid? Separating INT SIMD from FP32 SIMD while still having one 32T/clock scheduler will result in the exact same math throughputs as those on Ampere because said scheduler will still be able to load only two (out of three) SIMDs with work.
Complicating SM level scheduling h/w to provide a separate launch port for some execution units (INT+SFU?) would help with filling all three of them with work per each cycle but will probably lead to lower overall h/w utilization due to scheduling conflicts (Kepler, anyone?).
Neither of these two options seem like an improvement in comparison to Ampere, especially since moving INT load on a separate h/w is essentially a regression back to Turing which would result in more h/w idling inside the GPU while performance gains are likely to be rather limited.
Where was the hype?
I’ve seen the “true double FP32” claim in multiple forums, Reddit, Twitter with no explanation of how it would be fed alongside INT.
Well it was not my intention to hype anything, it was only a mere comment on Ada having an improved architecture with increased performance, which is clearly expected by the competition.
Instead of being related to VOPD, I would rather argue that the removal of SDWA is more an inevitable cleanup, now that FP16 packed math has been around for multiple generations.FeatureTrue16BitInsts - my theory is that this is the basis of VOPD. This also makes SDWA redundant as a concept.
Navi 31 not MCM ? interesting twist
Still MCM, just not the way people have been hoping with 2 GCDs. It means 2.5x 6900XT claims are no longer valid with just one compute chiplet.
Still MCM, just not the way people have been hoping with 2 GCDs. It means 2.5x 6900XT claims are no longer valid with just one compute chiplet.
196 CUs would still mean 2.45x more CUs than Navi 21. Granted that doesn't mean it translates to 2.5x gaming performance, but at the same clocks it is roughly 2.5x "compute" (well theoretical FP32 anyways).