Mintmaster
Veteran
Indeed you are right:I wouldn't expect G80 to branch better than R520.I hope you realize that even on CPUs, branching is not always a win. Try branching around a simple assignment, for example, then look at the resulting disassembly. You might be surprised what the compiler does with it.
You do need to skip around some non-trivial amount of code for branching to be a win. Back in GPU land, the number of instructions you need to skip is different for different GPUs. It happens to be more on G7x than on R5xx, but less on G80 than either of those.
This is independent of branch coherence, which is another issue that needs to be considered.
X1800XT: http://www.behardware.com/articles/592-3/ati-radeon-x1800-xt-xl.html
G80: http://www.behardware.com/articles/644-6/nvidia-geforce-8800-gtx-8800-gts.html
Of course, this is a very simple test that's like an if statement. A more practical test is here in the "New PS" test:
http://www.digit-life.com/articles2/video/g80-part2.html
http://www.digit-life.com/articles2/video/r580-part2.html
Here G80 is only about 1.5x faster than R520, so I doubt dynamic branching is better on G80. Not that it particularly matters, though, because R520 is way too big for its performance.