Can someone tell me why ATI's PS 3.0 is better than Nvidia's?

In a sunlit field with a slight breeze, butterflies flitter over the streams of Diet Coke, and squirrels sniff questioningly at the fruit of the pizza trees. A developer sits cross-legged in the field with his wireless laptop, coding furiously, and humming to himself "I'll tell you want I want, you worry about how to do it".

We'll get the pizza trees first, alas.
 
trinibwoy said:
It just occurred to me that I don't particularly understand why dynamic branching is now getting so much attention. I mean, branching and looping are fundamental programming concepts - why wasnt this built into the earliest shader models - back in the GF3 days?

Or was it just a matter of transistor budget?

Think of it as the heritage of the fixed pipeline ;)
 
trinibwoy said:
It just occurred to me that I don't particularly understand why dynamic branching is now getting so much attention. I mean, branching and looping are fundamental programming concepts - why wasnt this built into the earliest shader models - back in the GF3 days?

Or was it just a matter of transistor budget?
Just? That is far and away the major constraint behind GPU design! You want to maximize performance while not losing face through lack of features.

Imagine if ATI used R520's transistor budget to double the shader/texture units of R420 instead, and hit the same clock speed (very likely, too). They'd probably have enough transistors left to add FP blending as well. That would be an absolute monster.

You need to be reasonably sure that developers will use your features before you sacrifice so much die space for them. NV40 made many devs fool around with dynamic branching, but it likely gave little or even negative performance advantage for use in current games. However, they are planning on using it for upcoming games. This made ATI feel the time was right to implement it properly. Obviously, though, it cost them, since their 320M transistor part has only 16 pixel pipes.
 
Tridam said:
A quick example I just tried : mandelbrot set algorithm rendered on moving mid size triangles. G7x architecture really likes it (lot of mads, scalar and vec2 instructions). I compute 129 iterations (-> ~400 instructions) :

7800GTX : 35.6 Mpix/s
X1800XL (my XT is back to ATI) : 13.1 MPix/s

Now I use a loop with a break under condition to early out when more iterations are not needed :

7800GTX : 17.7 MPix/s
X1800XL : 29.4 MPix/s


So yes the dynamic branching advantage can compensate for the ALU throughput but only in specific cases. There is no objective average here.

Highly interesting; thanks a lot :)
 
This made ATI feel the time was right to implement it properly. Obviously, though, it cost them, since their 320M transistor part has only 16 pixel pipes.

Let's not also forget the new HQ less angle dependent AF algorithm; high angle-dependency saves more transistors than performance IMO.
 
Back
Top