G7x vs R580 Architectural Efficiency

Razor1 · Mar 7, 2006

RobertR1 said:
Razor, just because nvidia mem controller is "old" it should be ignored and just because nvidia hasn't spent a lot of time on dynamic branching, it also should be ignored???

If you're going to argue, atleast to be reasonable enough to take everything applicable into consideration. If nvidia has a lead with AA/AF and that lead turn into a big loss with AA/AF applied, that's not a convincing arugment for an efficeint architecture. Your idea of efficeincy is to pick and choose areas where Nvidia does well but you're quick to disregard it's flaws or Ati's strong points since they do not support your effeceincy theroies.

We all know that nvidia helped your company with developement where as Ati didn't even reply back to the request but you gotta let that go at some point......

Andy made a mistake in his calculations, thats why his numbers were skewed

Jawed · Mar 7, 2006

Razor1 said:
FEAR which so far is the most shader intensive game nV leads all the way up to 1600x1200 this is a similiar effect with SCCT.

FEAR is stencil shadowing, and with AA off, it's fillrate that's dominating performance, not shader rate.

Jawed

andypski · Mar 7, 2006

Razor1 said:
Andy made a mistake in his calculations, thats why his numbers were skewed

What mistake?

Razor1 · Mar 7, 2006

andypski said:
What mistake?

If you would like to look at another example of pure ALU performance you can do the same scaling experiment with the Perlin Noise test from the same review.[/quote]

Ok then don't muliply and divide your results to recieve precentages by the pipeline count see what you get. You will end up with the same precentages I just got since I used 48/48 where you used 24/16 multiplying by 1 won't change the outcome

This is your calculations

Frozen Glass (partial precision)
X1800 at same clock rate as G70 with same pipe count = 632 * 550 / 625 * 24 / 16 = 834.2 fps
Per pipe performance for X1800 compared to 7800GTX = 834.2/766* 100 = 109%
X1900 at same clock rate as G70 with same pipe count = 683 * 550/650 * 24/16 = 866.9 fps
Per pipe performance for X1900 compared to 7800GTX = 866.9 / 766 * 100 = 113%

This is what I was doing

X1900 at same clock rate as G70 with same ALU count = 683 * 550/650 * 48/48 = 577.9 fps
Per ALU performance for X1900 compared to 7800GTX =577.9 / 766 * 100 = 75.3%

You had in there per pipeline performance and your saying that these shaders have nothing to do with per pipleine performance so they don't belong in the equation. Yes I did the same mistake but my calculations used 48 alus/ 48 alus. Mine will not effect the over all %.

Razor1 · Mar 7, 2006

Jawed said:
FEAR is stencil shadowing, and with AA off, it's fillrate that's dominating performance, not shader rate.

Jawed

If it was fill rate limited due to fillrate limited due to stencil shadows the higher the res goes it should go in favor of nV not the oppoiste. Personally this was a question I had along time ago, and wasn't able to figure out, so its good that we are talking about.

andypski · Mar 7, 2006

Razor1 said:
Ok then don't muliply and divide your results to recieve precentages by the pipeline count see what you get. You will end up with the same precentages I just got since I used 48/48 where you used 24/16 multiplying by 1 won't change the outcome

From the point of view of raw performance, per-engine clock on these (the texture-intensive versions) of the shaders the digit-life numbers show the 7800 to be faster clock-for-clock, but it has 50% more texture units and 10% more raw memory bandwidth (not accounted for in your scaling), so I guess this shouldn't be a major surprise. The full precision version of the same tests show that the gap narrows when running more apples:apples.

On the ALU-intensive versions of these shaders, even after the "rescaling of the rescaling" the best numbers for the 7900 show it equalling X1900 performance at the same clocks, and that's with partial precision. In the full precision cases the numbers show the 7900 losing in per-clock performance, in one case by 50%.

So it seems that you're demonstrating that in heavily texture-bound cases G70 may have an advantage, and in more ALU-bound cases X1900 may have the advantage. Perhaps. (Although this conclusion doesn't really sit well with benchmarks that show that the X1900 performs very strongly with high levels of anisotropic filtering enabled)

Anyway, no major argument here I think, but I can't reconcile that to your earlier statement -

The scary part is that nV is still more optimal for shader per mhz effeciency with having less ALU's to work with and having less ADD/MUL operation capabilities

The indications from the shader cases discussed here seem to be that as the ALU portion of a shader becomes heavier this is clearly not the case, and moreover they also indicate that in these cases 7800 still needs partial precision to perform well.

If our "forward thinking" assumption is correct, and ALU performance becomes gradually more and more important with respect to texturing, then I believe that X1900 will only get stronger in the future. On the other hand, if we all go back to playing UT2004 with lots of textures and hardly any shading then maybe things won't change very much.

andypski · Mar 7, 2006

Razor1 said:
You had in there per pipeline performance and your saying that these shaders have nothing to do with per pipleine performance so they don't belong in the equation. Yes I did the same mistake but my calculations used 48 alus/ 48 alus. Mine will not effect the over all %.

And that had nothing to do with me "making a mistake" - in the context of the original thread (talking about per-pipe-efficiency, whatever that is) my calculations were a perfectly reasonable way of looking at things. Not the only way, perhaps, but still perfectly reasonable, particularly since I think that the numbers you qute here are the texture-intensive versions where the G70 has a 24:16 texture unit differential.

So I take exception to your comment that I made a mistake - in the original thread context everything was reasonable, it was only when you took the numbers out of the context of the original discussion that they do not fit, and I'm certainly quite happy to discuss the results after your rescaling (for a pure per-clock comparison).

Anyway, in the context of this thread I have really only talked about per-clock performance, ignoring pipelines.

Razor1 · Mar 7, 2006

andypski said:
From the point of view of raw performance, per-engine clock on these (the texture-intensive versions) of the shaders the digit-life numbers show the 7800 to be faster clock-for-clock, but it has 50% more texture units and 10% more raw memory bandwidth (not accounted for in your scaling), so I guess this shouldn't be a major surprise. The full precision version of the same tests show that the gap narrows when running more apples:apples.

On the ALU-intensive versions of these shaders, even after the "rescaling of the rescaling" the best numbers for the 7900 show it equalling X1900 performance at the same clocks, and that's with partial precision. In the full precision cases the numbers show the 7900 losing in per-clock performance, in one case by 50%.

So it seems that you're demonstrating that in heavily texture-bound cases G70 may have an advantage, and in more ALU-bound cases X1900 may have the advantage. Perhaps. (Although this conclusion doesn't really sit well with benchmarks that show that the X1900 performs very strongly with high levels of anisotropic filtering enabled)

Anyway, no major argument here I think, but I can't reconcile that to your earlier statement -

The indications from the shader cases discussed here seem to be that as the ALU portion of a shader becomes heavier this is clearly not the case, and moreover they also indicate that in these cases 7800 still needs partial precision to perform well.

If our "forward thinking" assumption is correct, and ALU performance becomes gradually more and more important with respect to texturing, then I believe that X1900 will only get stronger in the future. On the other hand, if we all go back to playing UT2004 with lots of textures and hardly any shading then maybe things won't change very much.

Well I see you point as well, but I think both are neccasary, more texture ops and more ALU's will give a better overall performance then just more ALU's. Even more complex shaders they are still using at least the same amount of texture ops that today's shaders are using. So that will still be the bottleneck for some time.

Jawed · Mar 7, 2006

Razor1 said:
If it was fill rate limited due to fillrate limited due to stencil shadows the higher the res goes it should go in favor of nV not the oppoiste. Personally this was a question I had along time ago, and wasn't able to figure out, so its good that we are talking about.

But then bandwidth starts cutting in.

There's roughly a 2:1 relationship between core:memory clocks and FPS in FEAR with AA/AF off.

It would be interesting if someone with access to NVidia and ATI cards compared FEAR performance with shadowing on and off.

Jawed

Jawed · Mar 7, 2006

andypski said:
If our "forward thinking" assumption is correct, and ALU performance becomes gradually more and more important with respect to texturing, then I believe that X1900 will only get stronger in the future. On the other hand, if we all go back to playing UT2004 with lots of textures and hardly any shading then maybe things won't change very much.

I wonder if Oblivion will blaze a trail...

Jawed

OpenGL guy · Mar 7, 2006

Razor1 said:
Well I see you point as well, but I think both are neccasary, more texture ops and more ALU's will give a better overall performance then just more ALU's. Even more complex shaders they are still using at least the same amount of texture ops that today's shaders are using. So that will still be the bottleneck for some time.

I don't agree. The most heavily used shader in F.E.A.R. for example:

Code:

    ps_2_0
    def c3, -0.5, 1, 0, 0
    dcl t0.xy
    dcl t1.xyz
    dcl t2.xyz
    dcl_2d s0
    dcl_2d s1
    dcl_2d s2
    texld r2, t0, s2
    texld r1, t0, s1
    texld r0, t0, s0
    dp3 r6.x, t1, t1
    dp3 r3.x, t2, t2
    rsq r2.w, r6.x
    rsq r0.w, r3.x
    mul r3.xyz, r2.w, t1
    mad r5.xyz, t2, r0.w, r3
    nrm r4.xyz, r5
    add r5.xyz, r2, c3.x
    nrm r2.xyz, r5
    dp3_sat r4.x, r2, r4
    dp3_sat r2.x, r2, r3
    mul r0.w, r1.w, c2.x
    mul r1.xyz, r1, c1
    pow r1.w, r4.x, r0.w
    mov_sat r0.w, r6.x
    mul r1.xyz, r1, r1.w
    mul r0.xyz, r0, c0
    add r0.w, -r0.w, c3.y
    mad r0.xyz, r0, r2.x, r1
    mul r0.w, r0.w, r0.w
    mul r0.xyz, r0, r0.w
    mov r0.w, c3.y
    mov oC0, r0

That's 3 tex and 21 ALU instructions. Other recent games show similar trends.

Agreed with your last statement, pertaining to the previous thread yes your numbers are ok, but pertaining to what we are talking about now, they aren't. But these numbers show, nV and ATi are headed in a similiar direction, just that ATi jumped the gun a bit with the number of ALU's which won't be seen useful at least in the short term of 1 year.

We are already seeing the benefits. Have you not noticed the performance difference between the X1800s vs. the X1900s?

P.S. Nice try with the stealth edit

RobertR1 · Mar 7, 2006

Jawed said:
I wonder if Oblivion will blaze a trail...

Jawed

Oblivion was being Demo'd on a x1900xt/x.....I suppose they could have just as well used some 7800GTX 512's if they were to perform better.

Moloch · Mar 7, 2006

Why doesn't f.e.a.r show a bigger lead for the R580 :???:

The R520 also outperforms the 512MB G70 which given the alu:tex ratio seems odd to me.

Razor1 · Mar 7, 2006

True the x1800 was very ALU limited though. Damn just gets more complex as we look into it more lol, can't purely say shader limited, since texture ops are part of that too.

Well lets take Doom 3 and ATi's shader replacement. Yes Doom 3 was shader intensive when it came out. But ATi's shader replacement removed a texture op well lookup, and replaced it with ALU calculations. Now Fear, Oblivion, anything made on the Unreal Engine 3 this can't be done since it will not end up with a similiar look if this done, since the lightmaps are hand made, unless the developer wants yesteryears graphics. I see it going both ways, and even at 1 to 7 it doesn't seem that the g70's are ALU limited, but it looks like the x1900's are texture op's limited otherwise according to the pixel shader tests the x1900's should have a much larger lead then 5% at high res.

OpenGL guy · Mar 7, 2006

radeonic2 said:
Why doesn't f.e.a.r show a bigger lead for the R580

Because much of the workload in F.E.A.R. is stencil shadows. As I stated previously, if you look at R520 performance in F.E.A.R., you can easily see that the stencil shadows account for about half of the rendering time (turn off shadows and notice your performance doubles). Thus, tripling the speed of the other half of the rendering can net a maximum of 50% performance improvement. (If x is the time to render a frame on R520, x/2 is the time to render the shadows and x/2 is the time to render the shaders. Thus, x/6 is the time to render the shaders on R580, assuming everything can take advantage of 3 ALUs, thus total time is 2x/3. So framerate will be at most 50% higher.)

dizietsma · Mar 7, 2006

andypski said:
You seem to find it remarkable when the 7800 manages to do well with no AA or AF (when in many current benchmarks these cards are actually limited by the CPU more than anything else)

With your biased viewpoint you just managed to come out with a great oxymoron.

"In cpu limited games the 7800 shines !"

Sorry, where a video card manages to do well is, by definition, not a cpu limited bench.

Moloch · Mar 7, 2006

OpenGL guy said:
Because much of the workload in F.E.A.R. is stencil shadows. As I stated previously, if you look at R520 performance in F.E.A.R., you can easily see that the stencil shadows account for about half of the rendering time (turn off shadows and notice your performance doubles). Thus, tripling the speed of the other half of the rendering can net a maximum of 50% performance improvement. (If x is the time to render a frame on R520, x/2 is the time to render the shadows and x/2 is the time to render the shaders. Thus, x/6 is the time to render the shaders on R580, assuming everything can take advantage of 3 ALUs, thus total time is 2x/3. So framerate will be at most 50% higher.)

Oh.. duh

And the reason for R580's shadermark 2.1 performance not being tripled?

dizietsma · Mar 7, 2006

OpenGL guy said:
We are already seeing the benefits. Have you not noticed the performance difference between the X1800s vs. the X1900s?

Yes indeed, it's not approaching 3x ..everyone has commented how it seems to be handicapped somehow.

( edited for clarity )

OpenGL guy · Mar 7, 2006

Razor1 said:
True the x1800 was very ALU limited though. Damn just gets more complex as we look into it more lol, can't purely say shader limited, since texture ops are part of that too.

Well lets take Doom 3 and ATi's shader replacement. Yes Doom 3 was shader intensive when it came out. But ATi's shader replacement removed a texture op well lookup, and replaced it with ALU calculations. Now Fear, Oblivion, anything made on the Unreal Engine 3 this can't be done since it will not end up with a similiar look if this done, since the lightmaps are hand made, unless the developer wants yesteryears graphics. I see it going both ways, and even at 1 to 7 it doesn't seem that the g70's are ALU limited, but it looks like the x1900's are texture op's limited otherwise according to the pixel shader tests the x1900's should have a much larger lead then 5% at high res.

You misunderstand what DOOM 3 is doing with the texture lookups. The shader was trying to save on math calculations and used texture lookups as function tables in some cases. Obviously this is not good if your machine is very fast at math.

In general, texture instructions can't be optimized, but math instructions can.

OpenGL guy · Mar 7, 2006

radeonic2 said:
Oh.. duh
And the reason for R580's shadermark 2.1 performance not being tripled?

The shaders are texture heavy? Best answer I can give as I am not intimately familiar with ShaderMark's shaders. I've looked at many of our Ashli shaders and the R580 really shines there as most of the shaders are very ALU-limited.

G7x vs R580 Architectural Efficiency

Razor1

Jawed

andypski

Razor1

Razor1

andypski

andypski

Razor1

Jawed

Jawed

OpenGL guy

RobertR1

Pro

Moloch

God of Wicked Games

Razor1

OpenGL guy

dizietsma

Moloch

God of Wicked Games

dizietsma

OpenGL guy

OpenGL guy

Similar threads