Xmas said:The driver couldn't do such a fine-grained decision in the shader pipeline. The hardware decides whether to run one or both branches.DiGuru said:Well, the last time we discussed this, we came to the conclusion that the dynamic branching of the NV6x00 is essentially your first method, with batches of about a 1000 pixels each. Only if all pixels in that area take the same path and this can be determined by the driver are the instructions skipped. Which doesn't sound very dynamic to me.
Ok. But Tridam's results showed, that the batches are expanded to cover the whole area that uses the shader if there are pixels that run the other branch, while there is still a penalty of 9 clocks for each branch instruction.
So, in how far wouldn't it be better to use a single (lineair) shader or multipassing if branches are actually taken? And if they aren't, why not use a different shader for that frame? That would save you at least 9 clocks per pixel.
I'm still trying to understand in what cases that would be useful.