G7x vs R580 Architectural Efficiency

Razor1 said:
Well lets take Doom 3 and ATi's shader replacement. Yes Doom 3 was shader intensive when it came out. But ATi's shader replacement removed a texture op well lookup, and replaced it with ALU calculations. Now Fear, Oblivion, anything made on the Unreal Engine 3 this can't be done since it will not end up with a similiar look if this done, since the lightmaps are hand made, unless the developer wants yesteryears graphics.

Actually, it was the other way around: Since the ALU computation was slow, JC (or whoever did the shader) traded it for a texture lookup. This was only possible because the results of the computations were very predictable and easily saved in a 1x8 (or similar) sized texture. This won't be as easily possible in a more advanced scenario...
 
OpenGL guy said:
The shaders are texture heavy? Best answer I can give as I am not intimately familiar with ShaderMark's shaders. I've looked at many of our Ashli shaders and the R580 really shines there as most of the shaders are very ALU-limited.
After doing a search sireric said the shader compilier is still being worked on for "the whole 3 alu thing"
 
Last edited by a moderator:
radeonic2 said:
After doing a search sireric said the shader compilider is still being worked on for "the whole 3 alu thing"

In English, does that mean more performance to come from future drivers???
 
OpenGL guy said:
Because much of the workload in F.E.A.R. is stencil shadows. As I stated previously, if you look at R520 performance in F.E.A.R., you can easily see that the stencil shadows account for about half of the rendering time (turn off shadows and notice your performance doubles). Thus, tripling the speed of the other half of the rendering can net a maximum of 50% performance improvement. (If x is the time to render a frame on R520, x/2 is the time to render the shadows and x/2 is the time to render the shaders. Thus, x/6 is the time to render the shaders on R580, assuming everything can take advantage of 3 ALUs, thus total time is 2x/3. So framerate will be at most 50% higher.)

This is what I don't get, if this game is so limited by shadows it will go infavor of nV as res goes up, not in favor of ATi, unless the bottleneck is shifting, which is a possibility although unlikely since I would expect it to scale equally then.
 
dizietsma said:
With your biased viewpoint you just managed to come out with a great oxymoron.

"In cpu limited games the 7800 shines !"

Sorry, where a video card manages to do well is, by definition, not a cpu limited bench.
I believe the original comment that I was replying to was stating that the X1900 wasn't demonstrating clear wins in many cases with AA and AF turned off, not that 7800 was paticularly shining in these cases, so I think it's perfectly reasonable to say that in some cases this could be because of CPU limitations without being oxymoronic (if that's a word ;))
 
Razor1 said:
This is what I don't get, if this game is so limited by shadows it will go infavor of nV as res goes up, not in favor of ATi, unless the bottleneck is shifting, which is a possibility although unlikely since I would expect it to scale equally then.
Scaling resolution should affect shadows and shaders pretty much equally in F.E.A.R. as both have to process the same increase in pixels. If the game was doing lots of render-to-texture effects with a fixed resolution buffer, then I'd expect the Z load to increase more than shader load, but F.E.A.R. doesn't fall into this category.
 
OpenGL guy said:
Scaling resolution should affect shadows and shaders pretty much equally in F.E.A.R. as both have to process the same increase in pixels. If the game was doing lots of render-to-texture effects with a fixed resolution buffer, then I'd expect the Z load to increase more than shader load, but F.E.A.R. doesn't fall into this category.

Correct thats what I was thinking, hmm well then Jawed maybe correct with his bandwidth theory, but still don't see how it could be :LOL:
 
andypski said:
I believe the original comment that I was replying to was stating that the X1900 wasn't demonstrating clear wins in many cases with AA and AF turned off, not that 7800 was paticularly shining in these cases, so I think it's perfectly reasonable to say that in some cases this could be because of CPU limitations without being oxymoronic (if that's a word ;))

Sorry to correct you but the original comment was not saying that X1900 wasn't demonstrating clear wins with no AA and AF, the original comment was that nvidia was winning

"I don't think ATi has a single win if there is no aa and af involved"

How does that equate with you now saying "not that 7800 was paticularly shining in these cases" when the other poster said nvidia was winning everything ? All you could come up with originally was the 7800 was shining because it was cpu limited. Now you are denying it was shining at all ?

Ati have really good AA/AF performance so it is no wonder you wish to plug that, but cpu limitations should not be brought into it when it is not patientally the case.
 
Razor1 said:
True the x1800 was very ALU limited though. Damn just gets more complex as we look into it more lol, can't purely say shader limited, since texture ops are part of that too.

Well lets take Doom 3 and ATi's shader replacement. Yes Doom 3 was shader intensive when it came out. But ATi's shader replacement removed a texture op well lookup, and replaced it with ALU calculations.
And, ironically, appears to make R520 slightly ALU-limited! Sigh: still waiting for someone to verify if this is the case by running D3 with this shader optimisation turned off...

Jawed
 
Dave Baumann said:
You'll see some numbers that pretty much confirms that soon.

Reading the tea leaves...Dave is testing a 7900 and sees the effects of lower BW compared to the 7800GTX 512.

:)
 
What's also interesting to me is the number of instructions involved in computing norms


Code:
     dp3 r6.x, t1, t1
    dp3 r3.x, t2, t2
    rsq r2.w, r6.x
    rsq r0.w, r3.x
    mul r3.xyz, r2.w, t1
    mad r5.xyz, t2, r0.w, r3

Ok, 6 instructions performing 2 norms

Code:
    nrm r4.xyz, r5
norm++

Code:
    nrm r2.xyz, r5
norm++

That's 3 tex and 21 ALU instructions. Other recent games show similar trends.

And 1/3 of the ALU instructions (well, the NRM macros expand to 3 instrs anyway, so it's really 27 ALU ops) are performing normalization.

That means the ratio is 3:27 on ATI, but could potentially be 3:15 on NV because of the possibility of free NRM_16. That to me says that if 30-40% of your instructions are doing normalization, it makes sense to have single-cycle throughput of NRM if possible. It would be nice if NRM was just a source modifier you could put on any register, like "r1_nrm" :)
 
Last edited by a moderator:
dizietsma said:
Sorry to correct you but the original comment was not saying that X1900 wasn't demonstrating clear wins with no AA and AF, the original comment was that nvidia was winning "I don't think ATi has a single win if there is no aa and af involved"
Hmmm... unless I'm very much mistaken (Not winning != losing) as there is such a thing as a draw, so unless I misread that statement it doesn't say anything more or less than that. You can choose to read it as nVidia winning with noAA/AF, but that is pure inference.
How does that equate with you now saying "not that 7800 was paticularly shining in these cases" when the other poster said nvidia was winning everything ? All you could come up with originally was the 7800 was shining because it was cpu limited. Now you are denying it was shining at all ?
The original post that I replied to makes no comment about the 7800 "shining" with no AA/AF, merely that the X1900 wasn't really racking up any significant wins until you turn them on. Perhaps the intention was to state "7800 clearly wins in many cases with no AA or AF" but that was not how the original post was phrased.
Ati have really good AA/AF performance so it is no wonder you wish to plug that, but cpu limitations should not be brought into it when it is not patientally the case.
Ok - I've completely lost track of the earlier posts in this thread since it was divided, but the original post that I replied to did not quote any numbers at all or refer to any specific benchmark results as far as I know - the only comment was "I don't think ATi has a single win if there is no aa and af involved". It was itself a reply to another post that also seemingly had no link to any specific benchmarks. As such, my comment that many titles tend to be CPU limited on parts this powerful with noAA/AF does not seem unreasonable. I don't see why you're reading so much more into it than that.

In a later post (after the original one, but before my reply) Razor1 gives a link to a specific Xbitlabs review which does show a number of wins for 7800 GTX with no AA/AF (eg. Doom3, Chronicles of Riddick). It also shows a number of flat performance cases that are outright draws (CPU limited cases where both cards are flat across resolutions), and some very strange results in some cases (Really low scores for X1900 in SeriousSam2 - far lower than any that I have seen here, and the performance increasing at higher resolutions on 7800 on UT2004 - what's up with that?), and some wins for X1900XTX with no AA/AF like Call of Duty 2 (so, more than just Half-Life2 after all).

However, that link was posted while I was replying to the first comment, so unless I missed some earlier post with some results then at the time that I was writing my post as far as I can tell I didn't have any specific cases to refer to, and could only talk in generalities. I do not see any reason to go back to my original post and edit it after the fact - the remaining discussion evolved on its own without any need for that.
 
DemoCoder said:
That means the ratio is 3:27 on ATI, but could potentially be 3:15 on NV because of the possibility of free NRM_16.
Of course, assuming that this is an accurate transcription of the shader nVidia should not be using their NRM_16 because the shader is full precision all the way through, so the only way you could use it in this case would be to knowingly violate the DX spec. :oops:

But I'll take your point on the rest ;)
 
  • Like
Reactions: Geo
Of course, but it's well known that only about 17-bits of precision are needed to represent a norm adequately (no real perceptual loss). FP16 uses about 10. So what Nvidia really needs is NRM_24 (FP24 s15e8) and they'd probably be good enough for perceptually lossless.

I would bet that in most cases, even NRM_16 isn't that bad, even that we have been looking at games with no-renormalization at all, or FX8 cube-map lookup for a long time.
 
ERK said:
Reading the tea leaves...Dave is testing a 7900 and sees the effects of lower BW compared to the 7800GTX 512.

:)


It'll be interesting to say the least. Because in my experiences and tests. F.E.A.R still seems to respond to core changes more significantly than bandwith. To be clear I ran a few tests a while back with FEARS built in benchmark adjusting memory bandwith and core while running at 1600x1200, and the larger gains/losses were seen from core adjustments. However bandwith did provide performance deficit/gains it just was no where near as significant as the core adjustments.
 
  • Like
Reactions: Geo
DemoCoder said:
Code:
     dp3 r6.x, t1, t1
    dp3 r3.x, t2, t2
    rsq r2.w, r6.x
    rsq r0.w, r3.x
    mul r3.xyz, r2.w, t1
    mad r5.xyz, t2, r0.w, r3

Ok, 6 instructions performing 2 norms

Code:
    nrm r4.xyz, r5
norm++

Code:
    nrm r2.xyz, r5
norm++
I'm not quite sure what you're trying to say in the above. If you wanted to rewrite the top block of code with norms, it would be:
Code:
nrm r3.xyz, t1
nrm r2.xyz, t2
add r5.xyz, r3, r2
...wouldn't it?
 
I'm just counting the norms. (2 norms expanded out, norm++ means "+1 norm") There are 4 norms being done. These potentially execute on NV HW with single cycle throughput if the compiler can recognize them. I'm not saying anythng about how you could rewrite them with macros or not (although using NRM would probably be superior since it is easier for the compiler to map it)

I think it is a relevant observation that roughly 1/3 of the instructions are performing norms.
 
Last edited by a moderator:
Back
Top