The anisotropic filtering perf. now makes sense...

Typedef Enum · Apr 23, 2002

nVidia had better get on the ball here...The very day the numbers were released by the online press, I just knew they made no sense.

I would link to the images, but I cannot figure out why the Server is telling me it cannot locate them?

At any rate, here's the bottom line...

I slapped a GF3 clocked to Ti500 level in the system, and performed the same tests used in my recent review...

In Multi-Texturing performance (which is the only thing I care about), here's the breakdown:

1. 3DMark2001: Ti500 yields 50% greater fill-rate than the Ti4400.
2. Serious Sam2: Ti500 yields 48% greater fill-rate than the Ti4400.

I *really* hope those guys take notice and figure out (finally) what the heck is going on.

Richthofen · Apr 23, 2002

I discovered the same problem here when comparing my GF3Ti500 with the GF4 Ti4400. The GF4 looses a lot of multitexturing performance. The numbers are about the same like with singletexturing.

With the new 28.90 beta driver leaks from Guru3d i still have the problem in 3DMark so i assume that the aniso performance in direct3d is about the same like with the 28.32 official drivers.

BUT

I get a massive performance boost with anisotropic filtering in OpenGL.
My Quake3 scores jumped by about 40 fps.
Seems to me that nvidia found the problem and got it fixed under opengl.
Maybe you should try those new beta drivers and see if there are any improvements with your scores.
Because i dont have any screenshots with anisotropy with the 28.32 drivers i cant tell if the image quality is the same or a little worse - i just dunno but i think you might have lots of them

PS: Please forgive if my english suxx - i am from germany so writing in english is not what i do every day

Typedef Enum · Apr 23, 2002

I re-tested using 28.90, and I got the same results in both OpenGL (Serious Sam2) as well as D3D (3DMark2001)....

I just retested Jedi Knight, and I got about 3 FPS higher than with 28.80 using 4x + 8x Aniso.

I'm convinced that there's something wrong in the drivers...

Sharkfood · Apr 23, 2002

Hmmm.. So I guess my question is- what is the correlation between this "finding" and actual performance using Anisotropy?

If you model a GF3 in these conditions:
No Anisotropic Filtering
Fill Rate (Single-Texturing) 648.5 MTexels/s
Fill Rate (Multi-Texturing) 1332.1 MTexels/s
2x Anisotropic Filtering
Fill Rate (Single-Texturing) 411.0 MTexels/s
Fill Rate (Multi-Texturing) 804.2 MTexels/s
4x Anisotropic Filtering
Fill Rate (Single-Texturing) 411.0 MTexels/s
Fill Rate (Multi-Texturing) 804.2 MTexels/s
8x Anisotropic Filtering
Fill Rate (Single-Texturing) 411.0 MTexels/s
Fill Rate (Multi-Texturing) 804.2 MTexels/s

Now, if there were some correlation between synthetic measurement of fillrate and performance, I'd see this might cause some question- but surely there isn't. Given the same resultant, measurable fillrate, any other test run through no-anis, 2x, 4x and 8x- will see a steady drop-off in performance at each level, regardless of the fillrate measurement.

This would suggest to me that the visual "hit" in fillrate is nothing more than some sort of worse-case measurement due to the nature of the test, but actual fillrate throughput in *3d games/engines* is varied depending on how anisotropy is really dispersed and/or applied.

Simply put- worse case due to the nature of the synthetic, widely varying case depending on a real 3D scene.

If this synthetic measurement had a correlation to actual performance, you'd wind up with similar measured performance through the levels of anisotropic filtering (none, 2x, 4x, 8x)- but you simply do not.

So I'm not all together convinced there is anything truly wrong here.. (yet). Just a difference in the "mode" from which the GF4 sets anisotropic "mode" in the hardware and the sense of how it effects a raw-fillrate synthetic test to yield a false reading. Surely the GF3 has the *exact* same behavior, with the exception of multitexturing not taking the same *measured* worst-case hit in this synthetic test, but very similar "same hit regardless of anisotropy" across the boards, but with wildly varying results in anything else BUT this fillrate test.

Make sense?

Lukas · Apr 23, 2002

Many have said that it shouldn't be ant difference in speed in the fillratetet with aniso on any card...At least not if it is working right. The 8500 dosen't show ANY difference in speed.

tb · Apr 23, 2002

Aniostropic texture filtering has an filrate impact, because it uses more sub-pixels to generate the final pixel color.

Thomas

Mephisto · Apr 23, 2002

no, this is dependend on the implementation. On Geforce, it HAS a fillrate impact, as the Geforce only can read 8 texels/clock/TMU. You can imagine a core that can read 64 texels/clock/TMU, then 8:1 anisotropic filtering has no fillrate impact.

tEd · Apr 23, 2002

Sharkfood said:
Hmmm.. So I guess my question is- what is the correlation between this "finding" and actual performance using Anisotropy?

If you model a GF3 in these conditions:
No Anisotropic Filtering
Fill Rate (Single-Texturing) 648.5 MTexels/s
Fill Rate (Multi-Texturing) 1332.1 MTexels/s
2x Anisotropic Filtering
Fill Rate (Single-Texturing) 411.0 MTexels/s
Fill Rate (Multi-Texturing) 804.2 MTexels/s
4x Anisotropic Filtering
Fill Rate (Single-Texturing) 411.0 MTexels/s
Fill Rate (Multi-Texturing) 804.2 MTexels/s
8x Anisotropic Filtering
Fill Rate (Single-Texturing) 411.0 MTexels/s
Fill Rate (Multi-Texturing) 804.2 MTexels/s

Now, if there were some correlation between synthetic measurement of fillrate and performance, I'd see this might cause some question- but surely there isn't. Given the same resultant, measurable fillrate, any other test run through no-anis, 2x, 4x and 8x- will see a steady drop-off in performance at each level, regardless of the fillrate measurement.

This would suggest to me that the visual "hit" in fillrate is nothing more than some sort of worse-case measurement due to the nature of the test, but actual fillrate throughput in *3d games/engines* is varied depending on how anisotropy is really dispersed and/or applied.

Simply put- worse case due to the nature of the synthetic, widely varying case depending on a real 3D scene.

If this synthetic measurement had a correlation to actual performance, you'd wind up with similar measured performance through the levels of anisotropic filtering (none, 2x, 4x, 8x)- but you simply do not.

So I'm not all together convinced there is anything truly wrong here.. (yet). Just a difference in the "mode" from which the GF4 sets anisotropic "mode" in the hardware and the sense of how it effects a raw-fillrate synthetic test to yield a false reading. Surely the GF3 has the *exact* same behavior, with the exception of multitexturing not taking the same *measured* worst-case hit in this synthetic test, but very similar "same hit regardless of anisotropy" across the boards, but with wildly varying results in anything else BUT this fillrate test.

Make sense?

I thought about that too! but something isn't just right with the aniso. performance on the gf4. I don't know if this fillrate thingy has something to do with it or not.

take a look here http://www.digit-life.com/articles/digest3d/0202/itogi-video-q3ani.html

in almost every case does take the gf44600 a bigger hit with aniso. compared to the gf3ti500

I would like to see some benchmarks with a underclocked gf4(240/550) against gf3ti500

ERP · Apr 23, 2002

Just to add fuel to the fire, I did some aniso benchmarks a while ago in our game, just to measure relative cost.
Although NV2A isn't GF3 or 4, the nice thing is that the driver layer is extremely thin so we can remove it from the equation, and it's easy to guarantee your not CPU bound.

The numbers represent the amount of time to render a particular game scene in milliseconds with various levels of anisotropy forced on every polygon.
most of the polygons in the scene use 2 or more textures, but nothing uses more than 4.
As a bonus I've compared the cost of LOD bias in the various aniso modes. LOD Bias settings run accross the top, Aniso down the sides, the numbers are the numbers the hardware takes, rather than level of anisotropy.

0 -1 -2 -3
Linear 5.2 5.4 6.0 6.3
Aniso 2 5.7 6.2 7.1 7.4
Aniso 3 6.1 6.8 8.1 8.3
Aniso 4 6.4 7.4 8.7 8.8

In general enabling Aniso is about the same cost as increasing the LOD Bias by one point, and the lowest level of aniso costs just under 10%, the highest about 23%.

Note that LOD Bias is expensive on GF3/4, I believe this relates to the cache architecture, it's optimised for a 1-1 texel to pixel ratio, and changing the LOD bias changes this.

Typedef Enum · Apr 23, 2002

In many ways, I suggest that there's a direct relationship between the 2 (that being fill-rate and Aniso. perf)...I do, however, have some doubts as well..

That being said, I'm still pretty much convinced that nVidia didn't say, at some point, "OK...Let's go ahead and change this...change that...at the cost of effectively slicing our fill-rate by 75%"

The thing that makes me wonder is the fact that, through it all, the performance is still high (relatively speaking)...It's not like the Radeon 8500 bug where the performance just tanks to 10 FPS...Is it possible that nVidia redid some things from the GeForce3, knowing full well that they would drop some serious fill-rate...BUT, in the end, would still end up significantly higher than the GF3? Possible, yes....

But...It just doesn't seem 'right' that they would sacrifice that much Multi-Texturing performance in doing so...

We need to get some direct clarification from nVidia on this issue...

Richthofen · Apr 23, 2002

What i did some minutes ago with these new 28.90 drivers was to start gl_extreme. I did the fillrate test without Anisotropy and then with Anisotropy.
The Fillrate numbers were quite ok.

I think something in OpenGL works better with these drivers with Anisotropy enabled.

Lukas · Apr 23, 2002

well why aren't the 8500 showing any difference in fillrate speed in 3dmark. If the G4 takes a lage hit then the 8500 should atleast take some hit....

Typedef Enum · Apr 23, 2002

Lukas,

That's not true at all...In order for that statement to be true, both chips would have to use the same implementation, which they clearly are not...and even then, there could be a good number of things happening 'under the hood' which would make that statement null/void.

This is what I've found with the 28.90 drivers...

Anisotropic filtering performance IS up...I did a slew of benchmarks last night, so here's a few:

Quake3 (10x7 4xFSAA + 8x)
---------------------------
28.32: 96.6 FPS
28.90: 108 FPS

Quake3 (10x7 4xFSAA + 4x)
---------------------------
28.32: 96.7 FPS
28.90: 108 FPS

Quake3 (10x7 No FSAA + 8x)
---------------------------
28.32: 127 FPS
28.90: 139 FPS

Quake3 (10x7 No FSAA + 4x)
---------------------------
28.32: 137 FPS
28.90: 145 FPS

These drivers do bump up the performance some ~12 FPS or so...which is a good thing; however, the fill-rate tests remained the same. I ran 3DMark2001 and Serious Sam2...same methodology...same result(s).

The interesting thing to note is the fact that there's no performance difference in going from 4x to 8x when FSAA is enabled.

ben6 · Apr 24, 2002

Scott , what do you get without Aniso on 1024x768 with 4x FSAA on? I would suggest something is turned off in this instance (4x FSAA, 4x Aniso, 4x FSAA 8x Aniso.)

Ben

Sharkfood · Apr 24, 2002

type-

These drivers do bump up the performance some ~12 FPS or so...which is a good thing; however, the fill-rate tests remained the same. I ran 3DMark2001 and Serious Sam2...same methodology...same result(s).

That is kind of my theory as well- since the GF3 pretty much parallels that data. There can be noticable improvements in anisotropic filtering performance, yet 3DMark fillrate tests remain flat-lined through all levels of anisotropy. My theory is the fillrate tests in 3Dmark/SS and the like are exploiting a strange behavior of NVIDIA Geforce3/4 that has no bearing on the real world. i.e. "fillrate" from these tests with anisotropic filtering enabled is not accurate and the results simply should be discarded.

So although I would expect anisotropic filtering performance to improve, I'm not overly convinced synthetic fillrate tests will necessarily change.

The interesting thing to note is the fact that there's no performance difference in going from 4x to 8x when FSAA is enabled.

I think this is a driver bug as well in the 28.xx drivers. I do not think the new registry key is being parsed correctly and anisotropy is simply "topping out" at 4x. I can duplicate the same results as you with the GF3 using the 28.32 drivers- in both OpenGL and D3D (I used 3dmark2000 @ 10x7x32,32b textures, adventure med detail). I got the same exact scores- and identical IQ between the top most two anisotropy reg-values (using both NVMax and RivaTuner).

It's also possible that the registry value is simply incorrect or tuners and NV display panel (for OGL) are simply not setting the correct "top end" value. But from my tests, 2x/4x work, 8x falls back to 4x. This isn't the case in older Det4's (12.xx->23.xx)

Althornin · Apr 24, 2002

tb said:
Aniostropic texture filtering has an filrate impact, because it uses more sub-pixels to generate the final pixel color.

Thomas

NOT if polygons are parralell to the viewplane

Babel-17 · Apr 24, 2002

Hi, I may be off-base but is it possibile that the setting of texture sharpening is enabled, possibly by way of a "tweaker" such as Rivatuner?

I've noticed that it bumps aniso one notch. Thus, if you have texture sharpening enabled with 4x aniso you are already maxed out aniso-wise and enabling 8x aniso is reduntant.

Just my 2 cents.

Typedef Enum · Apr 24, 2002

It's a bug....shhhh.

Joe DeFuria · Apr 24, 2002

Bug? Surely it couldn't be...aren't you constantly extolling the virtues of nVidia's drivers as the "gold standard?" How could such a major bug like this make it into their drivers at all, let alone still exist?

This makes the whole "High Poly Bug" thing that ATI went through look like child's play.

Gubbi · Apr 24, 2002

ERP said:
The numbers represent the amount of time to render a particular game scene in milliseconds with various levels of anisotropy forced on every polygon.
most of the polygons in the scene use 2 or more textures, but nothing uses more than 4.
As a bonus I've compared the cost of LOD bias in the various aniso modes. LOD Bias settings run accross the top, Aniso down the sides, the numbers are the numbers the hardware takes, rather than level of anisotropy.

0 -1 -2 -3
Linear 5.2 5.4 6.0 6.3
Aniso 2 5.7 6.2 7.1 7.4
Aniso 3 6.1 6.8 8.1 8.3
Aniso 4 6.4 7.4 8.7 8.8

In general enabling Aniso is about the same cost as increasing the LOD Bias by one point, and the lowest level of aniso costs just under 10%, the highest about 23%.

Note that LOD Bias is expensive on GF3/4, I believe this relates to the cache architecture, it's optimised for a 1-1 texel to pixel ratio, and changing the LOD bias changes this.

Very interesting.

I would imagine that the texture cache achieves (some of) it's multiple read ports by interleaving banks. Then when lowering the LOD or increasing aniso you get more requests hitting the same banks/ports and hence stalls. This could (in part) explain why the performance seems to drop to a plateau (must be some kind of worst case though).

Cheers
Gubbi

The anisotropic filtering perf. now makes sense...

Typedef Enum

Richthofen

Typedef Enum

Sharkfood

Lukas

tb

Mephisto

tEd

Casual Member

ERP

Typedef Enum

Richthofen

Lukas

Typedef Enum

ben6

Sharkfood

Althornin

Senior Lurker

Babel-17

Typedef Enum

Joe DeFuria

Gubbi

Similar threads