nAo said:
If I were a hardware designer I wouldn't put a registers optmizer in my GPU, that's a sw work.
Well I don't have a Cell dev kit to hand, so I don't know what kind of software might be interceding between your "to the metal" code and what the hardware actually executes.
On PC there's always a driver, so that's likely always going to optimise-out such dummy registers.
By the way, I hypothesised, a while back,
http://www.beyond3d.com/forum/showpost.php?p=715055&postcount=17
about this mechanism of using dummy registers in order to improve dynamic branching efficiency - but I'd like to see evidence of this working before blindly accepting that it works like that. It's a pretty interesting work-around
I suppose NVidia has the option to perform some interesting in-driver (on PC) shader replacements for games that do use DB to tweak performance.
Supposedly there are less stages in the G71 pipeline, due to 90nm tech - perhaps this actually means that G71 has nominally less fragments in flight than G70. And RSX would be the same as G71. That might explain 220 as opposed to 256.
(Though, separately, Bob noted that there are actually about 800-odd fragments in flight for NV4x/G70 per quad - so, erm G71/RSX may have a yet-smaller count due to 90nm pipeline-shortening.)
---
An alternative point of view is that the fixed length of the G7x/RSX pipeline is enough to hide double the memory latency of typical GDDR3. So even with half the fragments in flight, texturing latency would still be completely hidden by the pipeline. Though that still doesn't solve the problem that arithmetic operations will now proceed at half-rate.
It would, on the other hand, mean that longer-latency texturing from XDR memory wouldn't have a negative impact on RSX. Without knowing how tightly bound to latency G7x/RSX's pipeline is, it's hard to know whether XDR texturing would tip performance into doom and gloom.
To be honest, it seems to me there's a fair chance that XDR texturing won't adversely affect RSX performance.
Jawed