DemoCoder said:Because there are still register limitations, it's just that the limits have increased to the point where for the average case, FP32 will run at full speed. There are still scenarios where FP16 will run faster, but it's not like it was on the NV3x where you *had* to use FP16 and the penatly for going over 2 FP32 registers was a 50% reduction in throughput.
Their suggestion is more a rule of thumb. Use only what you need.
ERP said:But the important word here is latency, does this mean that the right selection of instructions can absorb that latency?
DemoCoder said:If you use texture loads inside the branches, it's much harder to predict.
Luminescent said:I don't recall reading an answer to this in any reviews, so I'll ask: how does NV40s theoretical fillrate change when the FP framebuffer is used? Does it remain the same, bandwith limitations aside? What about the fillrate with FP texture filtering?
ChrisRay said:*has a little sadistic grin on my face knowing something cool has happened*
Hyp-X said:DemoCoder said:If you use texture loads inside the branches, it's much harder to predict.
AFAIK you can't do that on the NV40 ...
demonic said:ChrisRay said:*has a little sadistic grin on my face knowing something cool has happened*
for the nvidiots or the rest of us?
Xmas said:MS removed texture loads inside branches from the spec.
Excellent interview.
ERP said:Actually they claimed each dynamic branch instruction has a latency of 2 cycles....
So IF ENDIF is 4 cycles and IF ELSE ENDIF is 6.
But the important word here is latency, does this mean that the right selection of instructions can absorb that latency?
It would be interesting to see some timings with a lot of ops between the IF ELSE ENDIF clauses, to see what the actual timing impact of the instrctions is when there are ops to abosrb the latency.