G7x vs R580 Architectural Efficiency

Mintmaster said:
Anyway, FP16 normalization doesn't get you anywhere near 17-bits of precision, but rather 11 bits. That's a big difference, and not enough for a highly specular reflection on a smooth surface.
Has anyone tried to measure the fp16 normalization precision and compared it to doing the normalization as a dp3/rsq/mul sequence?
 
  • Like
Reactions: Geo
Chalnoth said:
It could be a lot of things, but it's not memory bandwidth.

It could be due to ATI's memory bandwidth savings techniques preferring high resolution.
It could be due to nVidia having lower CPU usage in their drivers, which would tend to inflate the scores for low resolutions.
It could be due to ATI's texture caches being a bit better for magnification, which is more prevalent at high res.

But if its CPU bound then nV will not have 2 or more resolutions (well it could but the frame rates will not change then until its not CPU bound) where leads over ATi, since the frame rates change at ever resolution.

Ok lets say its fillrate limited, actually before this discussion started I think that is what I had in mind, I mentioned over at Rage a couple weeks back.

So we have nV leading at lower reses until it gets to 1600x1200, where ATi takes the lead, this is plausable with fillrate limited situation. But the only other thing which I stated in my first post, is nV has an ALU structure that is much more suited for FEAR's shaders. And if FEAR is using a 1 to 7 ratio of texture ops to ALU ops, well that just shoots down ATi's goal of thier 3 ALU's per pipe.

If its fillrate limited at lower res ATi will have a lead as it does with 1600x1200.

If its CPU limited and nV has driver's that reduce CPU work load, then we will see equal numbers at different resolutions at least until the the cpu limit is passed.

Nethier one of these happened.

The only other thing that could just throw this all in the garbage is there is a bug in nV's drivers at high resolutions, or with ATi's drivers at low resolutions.

This is also seen in SCCT so its probably not a bug ;)
 
Last edited by a moderator:
Chalnoth said:
Why would that be?
Not sure what you're asking.

If x^2 + y^2 + z^2 = 1, then max(x,y,z) >= 1/sqrt(3).

For N dot H lighting, you need unit vectors for N and L, and specular lighting gives you a highlight when N dot H is near one, i.e. N is near L, so all components matter.

Even if N is constant (0,0,1), consider H varying (non-linearly, of course) from (0, 0.01, 0.99995) to (0, -0.01, 0.99995). You can see that it's only the Z component that matters. For less trivial cases, you'll still have the largest components - those lying between 0.5 and 1.0 where I think FP16 is accurate to 2^-11 - affecting your result the most.
 
Mintmaster said:
I was hoping you'd reply to my post in the other thread when you mentioned this. Especially the part about how cheap a 17-bit renormalizer would be.

Sorry, I never saw your response. Here is the derivation for 17-bits. Not, don't confuse the term "you only need 17 bits of storage" with "17-bit precision". 17-bits are needed to specify a normal, but the decompressed value is still 48-96 bits.The original terminology I used was "17-bits of precision needed to represent without perceptual loss", and that representation is the one I link to here, which uses 17-bits with a lookup table to return any one of 100,000 normals uniformly distributed over the unit sphere. I also don't think a high frequency change in a specular highlight is such a big deal, and if it becomes one, interpolation hides it rather well. In fact, specular highlight artifacts are one of the least of our problems today. Aliasing in shaders dominates the artifacts IMHO. (the original paper of course is based on geometry compression, so interpolation is implied anyway, since you're compression vertex normals)


Anyway, FP16 normalization doesn't get you anywhere near 17-bits of precision, but rather 11 bits.

I didn't say that it would, I said FP24 would probably be closer to ideal.



I don't see normalization (esp. FP16) being an appreciable percentage of the workload for shaders in the future. Might as well use the math logic for other uses.

Well, from my theory, the "normalization" hardware is really just a reuse of existing environment mapping hardware, since to do a cube map or sphere map lookup, they need to calculate the intersection of the vector and the cube/sphere anyway. And once you calculate the intersection, you've got the normal. With respect to cube maps, this is an RCP (of dominant axis) followed by a MUL (to scale the minor axis). This gives you the intersection on the face of the unit cube. Now one can use these coordinates to look into a 2000 entry on-chip ROM table of normals, stored at 16bits per component (48bits per normal). I don't really remember from my comp arch days how many transistors per bit are needed for ROM, but I assume it's fairly cheap, whatever the cost C, it's roughly C*100000, so for 6 transistors per bit, it's 600k trannies, or less than 0.3% of a modern GPU core. Sphere mapping, I don't really know if this is done in HW or done via driver.
(yes, the cubic coordinate projection may yield some distortion away from uniform distribution, but I'm sure there's a trick to alleviate it, I don't have time to work on it right now)


I also disagree that somehow the length of shaders are going to reduce the number of nrm ops in the future. Maybe if those shader ops consist purely of blends and scalar functions, but I don't see it. I also don't see shader lengths increasing dramatically longer than they are today. I think there will be more passes and more render-to-texture ops, but that dramatically longer shaders are not in the cards. (e.g. 1 or 2 orders of magnitide bigger, e.g. 500-1000 instrs) I believe they will represent a pathological edge case.
 
Last edited by a moderator:
Bob said:
Has anyone tried to measure the fp16 normalization precision and compared it to doing the normalization as a dp3/rsq/mul sequence?
The only thing I've seen is something in ATI's presentation:
http://www.ati.com/developer/Dark_Secrets_of_shader_Dev-Mojo.pdf - page 54

Not exactly the most unbiased source, though (I remember a similar NV PR slide back in the FX days showing the limitations of FP24, but this seems much more plausible).

It should be pretty easy to test for anyone with NVidia hardware. I'll see if I can quickly whip up an application where I think precision might be an issue.
 
Razor1 said:
So we have nV leading at lower reses until it gets to 1600x1200, where ATi takes the lead, this is plausable with fillrate limited situation. But the only other thing which I stated in my first post, is nV has an ALU structure that is much more suited for FEAR's shaders. And if FEAR is using a 1 to 7 ratio of texture ops to ALU ops, well that just shoots down ATi's goal of thier 3 ALU's per pipe.
I think you are taking too simplistic a view of which areas of workloads change as you alter resolution - decreasing resolution affects many areas of the 3D pipeline, and may reveal differences in architectural decisions in many areas beyond merely shaders - distribution of work, ratio of pixels to vertices, behaviour of compression techniques, behaviour of early Z techniques, behaviour of texture caches etc. etc. etc.

The list of things that can change with scaling resolution is much longer than you seem to think - it's by no means some kind of simple static linear scaling as you are making out.
 
andypski said:
I think you are taking too simplistic a view of which areas of workloads change as you alter resolution - decreasing resolution affects many areas of the 3D pipeline, and may reveal differences in architectural decisions in many areas beyond merely shaders - distribution of work, ratio of pixels to vertices, behaviour of compression techniques, behaviour of early Z techniques, behaviour of texture caches etc. etc. etc.

The list of things that can change with scaling resolution is much longer than you seem to think - it's by no means some kind of simple static linear scaling as you are making out.

I understand that very well, but Fear and SCCT are two games that are not bound by the same things as other games are, this happens in no other games actaully all other games scale very well and similiarly with respect to AA and AF off and on, unless these two engines are made compeletly different doesn't add up, and SCCT is using the UT03 or 04 engine? So the only equallity these games have are thier heavy use of shadows and this will play to fillrates again, at least from the surface.
There is an efficiency of nV when it comes to shaders that are less then 1 to 3 is what you have been saying, but that is not the case here since as Ogl guy pointed out Fear is using a 1 to 7 ratio.

By saying what you just said, you're saying ATi has weaknesses over nV counterparts when going to low res (weakness in one of those areas you were mentioning), when programs are made in a certain way which I don't think so, because it would have been seen else were other then these two games.

Explain to my why Doom 3 and Quake 4 run better with no AA and AF on, and they scale exactly like most Dx games that use shaders aswell with no aa and af.

The original statement was nV has a shader advantage so far for todays games and in the near future, and the x1900xtx will not see thier 3 ALU's come to any use any time soon.
 
Last edited by a moderator:
Razor1 said:
I understand that very well, but Fear and SCCT are two games that are not bound by the same things as other games are, this happens in no other games (actaully all other games scale very well, unless these two engines are made compeletly different doesn't add up, and SCCT is using the UT03 or 04 engine? So the only equallity these games have are thier heavy use of shadows and this will play to fillrates again, at least from the surface.
Not equal at all - as I recall FEAR uses stencil shadows and SCCT uses shadow buffers. These two shadow approaches have such wildly different performance characteristics that attempting to claim this as a point of commonality between the two titles is ill conceived, although I can agree that most shadow rendering typically caters to simple fillrate. I would not say, for example, that the shadow rendering in SCCT is any heavier than that in 3DMark05, and that application has quite different performance characteristics. Commonality or similarity in one area is not sufficient to decide a performance profile.
There is an efficiency of nV when it comes to shaders that are less then 1 to 3 is what you have been saying, but that is not the case here since as Ogl guy pointed out Fear is using a 1 to 7 ratio.
Certainly I would expect nVs relative performance to improve as the texture:ALU ratio gets lower, but predicting exactly where any advantage from their larger number of texture units may be realised with real content is clearly difficult, as the differences in AF performance demonstrate.

Also saying that 1 shader in FEAR has a 1:7 ratio doesn't tell you the typical case or anything much about the overall performance characteristics - even if this is the most heavily used shader it doesn't necessarily make it the be-all and end-all - it might be used 15% of the time and yet there could be a whole slew of shaders in the 14,13,12,11,10,9... etc. range that overall would account for far more rendering time, and the time taken for the stencil shadows (which use no shaders) wouldn't show up in such an analysis at all.
By saying what you just said, your saying ATi has weaknesses over nV counterparts when going to low res, which I don't think so.
I'm not saying that at all - I'm saying that different areas of the architectures will become stressed differently and may become the bottlenecks at different resolutions - depending on the exact workload supplied this might work for or against any given architecture. It would be as application dependent as any other performance issue.
Explain to my why Doom 3 and Quake 4 run better with no AA and AF on, and they scale exactly like most Dx games that use shaders aswell.
Well I would suspect that the nVidia hardware's nominally doubled Z rate when rendering without AA or colour might have something to do with the D3 and Q4 performance, as both those games make heavy use of stencil shadows. You would expect that this might give some level of boost with no AA, but the doubled rate doesn't carry over to high-quality rendering modes with AA. As to your comment about D3 and Q4 scaling similarly to most DX games, I don't think I can agree with that statement at all - the range of scaling in DX titles is pretty wide, and while no doubt some do scale lin a similar manner to Doom3 I don't think it's the majority by any means.
The original statement was nV has a shader advantage so far for todays games and in the near future, and the x1900xtx will not see thier 3 ALU's come to any use any time soon.
...and I have yet to see any convincing proof of the first statement, and the second statement simply cannot be proved or disproved at the moment and is mere speculation.
 
The question as to whether a software is CPU limited in a title without AA and AF is pretty easily answered. You can run a title like Serious Sam 2, F.E.A.R, or any other DirectX title and do some core clock scaling tests and you will find that they generally are not CPU limited in these cases. ((Certainly not in any recent game)) I have never subscribed to the philosophy that turning on AA is the only way to make a title realistically GPU limited. But it is certainly a way to illustrate one of the many bottlenecks a piece of hardware can have.
 
Last edited by a moderator:
andypski said:
Not equal at all - as I recall FEAR uses stencil shadows and SCCT uses shadow buffers. These two shadow approaches have such wildly different performance characteristics that attempting to claim this as a point of commonality between the two titles is ill conceived, although I can agree that most shadow rendering typically caters to simple fillrate. I would not say, for example, that the shadow rendering in SCCT is any heavier than that in 3DMark05, and that application has quite different performance characteristics. Commonality or similarity in one area is not sufficient to decide a performance profile
.

Agreed but thats all we have to go on for the time being ;) , other then synthetics which rarely show us real gaming results.

Certainly I would expect nVs relative performance to improve as the texture:ALU ratio gets lower, but predicting exactly where any advantage from their larger number of texture units may be realised with real content is clearly difficult, as the differences in AF performance demonstrate.

Also saying that 1 shader in FEAR has a 1:7 ratio doesn't tell you the typical case or anything much about the overall performance characteristics - even if this is the most heavily used shader it doesn't necessarily make it the be-all and end-all - it might be used 15% of the time and yet there could be a whole slew of shaders in the 14,13,12,11,10,9... etc. range that overall would account for far more rendering time, and the time taken for the stencil shadows (which use no shaders) wouldn't show up in such an analysis at all.

I'm not sure which shader that was, but all we have to do is look at the shader that FEAR uses most of, thier surface shaders for thier levels

I'm not saying that at all - I'm saying that different areas of the architectures will become stressed differently and may become the bottlenecks at different resolutions - depending on the exact workload supplied this might work for or against any given architecture. It would be as application dependent as any other performance issue.

Agreed

Well I would suspect that the nVidia hardware's nominally doubled Z rate when rendering without AA or colour might have something to do with the D3 and Q4 performance, as both those games make heavy use of stencil shadows. You would expect that this might give some level of boost with no AA, but the doubled rate doesn't carry over to high-quality rendering modes with AA. As to your comment about D3 and Q4 scaling similarly to most DX games, I don't think I can agree with that statement at all - the range of scaling in DX titles is pretty wide, and while no doubt some do scale lin a similar manner to Doom3 I don't think it's the majority by any means.

That is true about the scaling interms of with aa and af, but with no aa and af it isn't

...and I have yet to see any convincing proof of the first statement, and the second statement simply cannot be proved or disproved at the moment and is mere speculation.

This is true, we will have to wait and see, unless FEAR is using that 1 to 7 ratio for its surface shaders ;) . Then there will be a situation where I think FEAR's shaders have a major advantage for nV when compared to ATi's superior shader performance, and this will also be seen in other games.
 
DemoCoder said:
Here is the derivation for 17-bits.
Heh, funny thing is I did similar calculations in my reply from the other thread.

That calculation is very primitive in that it assumes our mapping of the normal to an intensity (in our case via a dot product) is linear over the sphere, which is far from the case for specular reflection.

100,000 unique normals is woefully inadequate for lighting, as 0.01 radians is quite large. If you consider cos(theta)^16, which is actually quite a broad specular highlight with a FWHM of 33 degrees, then the difference between 0.25 rad and 0.26 rad gives you an intensity difference of 1/40, equivalent to a 5-bit gradient. Banding won't be pretty here, let alone higher specularity.

EDIT: I never said high-frequency change of specular highlights is a problem. In fact, that would hide the problem. I'm more concerned with smooth areas where the specular highlight does not disappear over a distance of a few pixels, in which case intensity gradients don't need many bits. Imagine a sheet of metal, a smooth floor, a car, etc. Any time you go from lit to shaded over the span of a few degrees change between N and H, you need high resolution in your dot product. The reason we haven't seen this problem is that we're using full precision most of the time. If you recall the FarCry artifacts we used to see on NVidia cards, I'm pretty sure that was often due to inadequate N dot H precision.

DemoCoder said:
Well, from my theory, the "normalization" hardware is really just a reuse of existing environment mapping hardware, since to do a cube map or sphere map lookup, they need to calculate the intersection of the vector and the cube/sphere anyway. And once you calculate the intersection, you've got the normal. With respect to cube maps, this is an RCP (of dominant axis) followed by a MUL (to scale the minor axis). This gives you the intersection on the face of the unit cube. Now one can use these coordinates to look into a 2000 entry on-chip ROM table of normals, stored at 16bits per component (48bits per normal).
Sphere mapping is not handled in hardware per pixel, so the best you can do is deal with the coordinate you got from the division by the major axis in cube mapping. A 2000 entry lookup table hardly seemed enough when I addressed this in that other thread, but now that I see you're using the environment map calculation as an intermediate, I see how interpolation between the table values would work. Still, you need some hardware to do a bilinear interpolation.

Maybe NVidia uses the FP16 filtering hardware for this, which suddenly makes a lot of sense. The table would work sort of like a built-in cube map renormalization texture, and becomes the only additional hardware cost. According to NVidia, though, the free FP16 normalization can occur alongside a texture fetch, right? That doesn't jive with my theory, unless the FP16 blenders are separate.

Neat, but I still think dealing with FP16 precision for normals is more trouble than it's worth.
 
Last edited by a moderator:
Razor1 said:
But if its CPU bound then nV will not have 2 or more resolutions (well it could but the frame rates will not change then until its not CPU bound) where leads over ATi, since the frame rates change at ever resolution.
Not necessarily, because no bound is absolute. What if, at some given resolution, 30% of the frames are bounded by the CPU, 70% by the GPU? If you increase the res, this could increase to a 20%/80% ratio, or something different. It's all dependent upon the game demo.

So we have nV leading at lower reses until it gets to 1600x1200, where ATi takes the lead, this is plausable with fillrate limited situation. But the only other thing which I stated in my first post, is nV has an ALU structure that is much more suited for FEAR's shaders. And if FEAR is using a 1 to 7 ratio of texture ops to ALU ops, well that just shoots down ATi's goal of thier 3 ALU's per pipe.
Bear in mind that with anisotropic filtering, the texture units will, on average, take something like 2-4 cycles to finish the filtering for a texture request (dependent upon how many texture stages anisotropic is requested, whether or not trilinear filtering is enabled, what type of anisotropic algorithm is used, and what, if any, other filtering optimizations are enabled).

Anyway, I'll have to step back and clarify something about what I've been trying to say, because I've realized that I may have been a bit misleading in my adherence that it's not memory bandwidth.

Specifically, what I mean is that it can't be memory bandwidth alone, but it can be related to how memory bandwidth limitations change with resolution. For example, if the game is entirely memory bandwidth-limited, and if ATI's memory bandwidth savings techniques scale more with resolution than nVidia's, then it is possible that ATI will gain more in such a scenario from upping the resolution.

Additionally, if you're talking about AA benchmarks, nVidia has the additional optimization of performing the recombination of AA samples at scanout: this allows them to do a buffer swap without using any memory bandwidth or GPU processing. This means that they will use more GPU memory than ATI at a given res/AA setting. But this means that the scanout bandwidth becomes a larger portion of overall memory bandwidth (usually, depends on refresh), meaning that this optimization helps them less and less as the resolution is increased.

So I guess my objection was more that you seemed to be saying (to me) that this had something to do with some absolute bandwidth limit, but the details are much richer than that.
 
Mintmaster said:
100,000 unique normals is woefully inadequate for lighting, as 0.01 radians is quite large. If you consider cos(theta)^16, which is actually quite a broad specular highlight with a FWHM of 33 degrees, then the difference between 0.25 rad and 0.26 rad gives you an intensity difference of 1/40, equivalent to a 5-bit gradient. Banding won't be pretty here, let alone higher specularity.
Well, here's my little analysis.

For consideration, I took the case where x, y, z are all equal to 1/sqrt(3). In this case, if we take one value and add to it 1/2^10 to it, then the difference in angle between the two settings will be:
arcsin(1/sqrt(3) + 1/2^10) - arcsin(1/sqrt(3)) = 0.001197.

(I've assumed that the other coordinates remain fixed, while this one is changed by one bit)

With that difference in angle, starting at 0.25 rad, you'd only have a brightness difference of about 1/300, which is within the accuracy of 8-bit lighting.
 
Razor1 said:
But if its CPU bound then nV will not have 2 or more resolutions (well it could but the frame rates will not change then until its not CPU bound) where leads over ATi, since the frame rates change at ever resolution.
You have to understand that things do not suddenly change from CPU bound to GPU bound. During the transition, you will have parts of the frame that are both.

The same thing holds for geometry. If you were completely fillrate/shader-rate/bandwidth limited, you'd see a ~1.6 time increase in framerate for each standard resolution reduction. The fact is we rarely see this.

Here's a example:
A frame consists of two triangle batches.
Batch #1 of 1M uniform triangles covers 1k pixels.
Batch #2 of 1k uniform triangles covers 1M pixels.

Lets say you can process 2 pixels/clk and 1 triangle/clk. As it is, the chip will take 1,500,000 cycles.
At 1/100 the resolution, it will take 1,005,000 cycles.
At 1/10 the resolution, it will take 1,050,000 cycles.
At 10 times the resolution, it will take 6,000,000 cycles.
At 100 times the resolution, it will take 51,000,000 cycles.

Imagine a second chip that does 5 pixels/clk and 1 triangle/clk. As it is, the chip will take 1,200,000 cycles.
At 1/100 the resolution, it will take 1,002,000 cycles.
At 1/10 times the resolution, it will take 1,020,000 cycles.
At 10 times the resolution, it will take 3,000,000 cycles.
At 100 times the resolution, it will take 21,000,000 cycles.

This chip has 2.5 times the shading power, but its lead in the different resolutions is as follows: 0.3%, 3%, 25%, 100%, 243%. CPU and geometry limitations do not show up as easily as you're implying.

Now, can you be a little bit more clear about what results you're taking about? What's so confusing, and what's so different about FEAR and SC:CT? I'm having a really hard time deciphering your posts and your logic. Also, make sure you read the previous post I directed towards you.
 
Chalnoth said:
Well, here's my little analysis.

For consideration, I took the case where x, y, z are all equal to 1/sqrt(3). In this case, if we take one value and add to it 1/2^10 to it, then the difference in angle between the two settings will be:
arcsin(1/sqrt(3) + 1/2^10) - arcsin(1/sqrt(3)) = 0.001197.

(I've assumed that the other coordinates remain fixed, while this one is changed by one bit)

With that difference in angle, starting at 0.25 rad, you'd only have a brightness difference of about 1/300, which is within the accuracy of 8-bit lighting.
You're crossing conversations here. With DC I was talking about the 100,000 normals from his link, so I intentionally used a low specular power. Imagine a flat floor lit by the sun, and the reflection (where intensity is >0.5 of peak) occupies 33 degrees of your field of view. Not exactly a sharp reflection, maybe good for plastic. Note that 0.01 rad = 0.6 deg, so I could easily have chosen a shinier highlight.

In my previous post, when I was talking about FP16, I considered an intensity function with a FWHM of 10 degrees. cos(5 degrees) = 0.9962. Your calculated normal precision gives a DP3 precision of ~0.0007, so I get a whopping 5 levels between 1 and 0.5. (Remember cos(x) ~= 1 - 1/2*x^2, so small angles have much smaller effect on the dot product. arccos(1-1/2^11) = 2 degrees!)

Do a texture lookup using the reflection vector (e.g. a shiny car like ATI's demo or any racing game) and you're even worse off. Better off just sticking with per vertex normalization and interpolation of eye and halfway vectors so that you at least have continuity if not accuracy.
 
Last edited by a moderator:
Mintmaster said:
You have to understand that things do not suddenly change from CPU bound to GPU bound. During the transition, you will have parts of the frame that are both.

The same thing holds for geometry. If you were completely fillrate/shader-rate/bandwidth limited, you'd see a ~1.6 time increase in framerate for each standard resolution reduction. The fact is we rarely see this.

Here's a example:
A frame consists of two triangle batches.
Batch #1 of 1M uniform triangles covers 1k pixels.
Batch #2 of 1k uniform triangles covers 1M pixels.

Lets say you can process 2 pixels/clk and 1 triangle/clk. As it is, the chip will take 1,500,000 cycles.
At 1/100 the resolution, it will take 1,005,000 cycles.
At 1/10 the resolution, it will take 1,050,000 cycles.
At 10 times the resolution, it will take 6,000,000 cycles.
At 100 times the resolution, it will take 51,000,000 cycles.

Imagine a second chip that does 5 pixels/clk and 1 triangle/clk. As it is, the chip will take 1,200,000 cycles.
At 1/100 the resolution, it will take 1,002,000 cycles.
At 1/10 times the resolution, it will take 1,020,000 cycles.
At 10 times the resolution, it will take 3,000,000 cycles.
At 100 times the resolution, it will take 21,000,000 cycles.

This chip has 2.5 times the shading power, but its lead in the different resolutions is as follows: 0.3%, 3%, 25%, 100%, 243%. CPU and geometry limitations do not show up as easily as you're implying.

Now, can you be a little bit more clear about what results you're taking about? What's so confusing, and what's so different about FEAR and SC:CT? I'm having a really hard time deciphering your posts and your logic. Also, make sure you read the previous post I directed towards you.

Good point Chlnoth and Mintmaaster, wasn't thinking along those lines, well so far the benchmarks we have seen at almost all sites, I took xbit for example, FEAR and SCCT, nV leads all the way up to 1600x1200, then ATi takes over abit a small precentange but still a lead is a lead. It could be possible its purely fillrates, but was Ogl guy said it should not do this, since Fillrates due to shadows and Shader usage should scale with resolution. These are the only 2 games thise ever happens in

http://www.xbitlabs.com/articles/video/display/radeon-x1900xtx_20.html

http://www.xbitlabs.com/articles/video/display/radeon-x1900xtx_28.html

As Andy was mentioning both games stress different portions due to thier method of shadow implimentation.

So it just convolutes the analysis. But going by the lines the CPU is being hampered at low res it will give a better explination to the results ;)
 
Last edited by a moderator:
One point on Doom3 / Quake4's appearance relative to other games. There are many games that use stencil shadows + some unified lighting model on both D3D and OpenGL. They all do roughly the same operations, but generally scale differently.

I suspect the main reason is because stencil shadow / shader games generally have two or three basic renderer configurations that they spend a long time in (e.g. a stencil shadow rendering pass or an interaction rendering pass). Each of these different states bottleneck the hardware very differently, and so even changing the ratio between A, B and C may expose small differences between different pieces of hardware.
 
Back
Top