Dynamic branching on X1800GTO vs Go7800

I hope you realize that even on CPUs, branching is not always a win. Try branching around a simple assignment, for example, then look at the resulting disassembly. You might be surprised what the compiler does with it.

You do need to skip around some non-trivial amount of code for branching to be a win. Back in GPU land, the number of instructions you need to skip is different for different GPUs. It happens to be more on G7x than on R5xx, but less on G80 than either of those.

This is independent of branch coherence, which is another issue that needs to be considered.
I wouldn't expect G80 to branch better than R520.
Indeed you are right:

X1800XT: http://www.behardware.com/articles/592-3/ati-radeon-x1800-xt-xl.html
G80: http://www.behardware.com/articles/644-6/nvidia-geforce-8800-gtx-8800-gts.html

Of course, this is a very simple test that's like an if statement. A more practical test is here in the "New PS" test:
http://www.digit-life.com/articles2/video/g80-part2.html
http://www.digit-life.com/articles2/video/r580-part2.html

Here G80 is only about 1.5x faster than R520, so I doubt dynamic branching is better on G80. Not that it particularly matters, though, because R520 is way too big for its performance.
 
Maybe instruction granularity on G80 is better, though. I.e. it might be able to branch after every scalar operation.
 
Indeed you are right:

X1800XT: http://www.behardware.com/articles/592-3/ati-radeon-x1800-xt-xl.html
G80: http://www.behardware.com/articles/644-6/nvidia-geforce-8800-gtx-8800-gts.html

Of course, this is a very simple test that's like an if statement. A more practical test is here in the "New PS" test:
http://www.digit-life.com/articles2/video/g80-part2.html
http://www.digit-life.com/articles2/video/r580-part2.html

Here G80 is only about 1.5x faster than R520, so I doubt dynamic branching is better on G80. Not that it particularly matters, though, because R520 is way too big for its performance.
Yes but considering we're comparing 16 to 32 pixel batches I'd say the G80's amazingly close.
 
R580 is even faster than R520, and it has 48 pixel batches.

Anyway, I was just replying to Bob's statement that G80 has the best branching performance of any GPU. I suppose it depends on how you define "branching performance", but relative to math and sampling ability (which are a fraction of G80's ability), R520 does better.

Like I said, though, it doesn't really matter. R520 was not a very good design in terms of performance for its die size.
 
R580 is even faster than R520, and it has 48 pixel batches.
Faster overall, but in those branching tests it does worse. The G80 gets amazingly close to the R520, in the BeHardware tests (which is what I was mostly looking at).

I don't care much for the R520 either, IMO it should have been the card to release in the R420's place back in 2004 (which means I care even less for the R420).
 
Back
Top