different phenomenas of fillrate test on 9700 and GFFX

DOOM III

Newcomer
can anyone tell me why the multi-texture fillrates of FX drops when AA enable while the multi-texture fillrates of 9500/9700 keeps stable no matter AA enable or not. is it due to the insufficient RAM bandwidth on FX or the inherent disadvantage of a 4X2 structure that FX allpied?
 
people,just say what comes into your mind when you first see this thread. i just want to have some response even if it's not 100% accurate. :cry:
 
Have someone change the ram speed on the GF FX to see if it is bandwidth limiting or core limiting.
 
First thing: Wait for B3D's preview.

Second: Someone help MDolenc out! :)
 
noko said:
Have someone change the ram speed on the GF FX to see if it is bandwidth limiting or core limiting.

i had test on my GF4 which is also 4X2 structure and found it's heavily memory bandwidth limted,don't know what the case is on FX.
 
Perhaps they use something like my "bit mask" idea while ATI uses something like my "address line" idea. If the test has no edges, I think that would explain the results.

The thing is, I don't see how color compression could work reliably using my "address line" idea and something like the F-Buffer (someone just indicated someone who presented that concept works at nvidia), so perhaps having a method of color compression with fixed overhead is related to the direction nvidia is heading in that regard.
 
MDolenc said:
The first thing that comes into my mind:
Would someone be kind enough to post results that this thing makes with FSAA on? :oops:
surely
Radeon 9700 Pro w/ cat 3.1
Athlon XP 2700+
512MB DDR333
KT400 mobo
WinXp Pro

0x AA
2x AA
4x AA
6x AA

note that my system wasn't "clean". Some open apps, haven't rebooted in a while, etc...
 
Geforce Quadro FX 2000 (standard 400/800 clock)
P4 1.8 (yeh ok)
Driver 42.90
512MB ram

http://homepage.ntlworld.com/pocketmoon/Results0.txt
http://homepage.ntlworld.com/pocketmoon/Results2.txt
http://homepage.ntlworld.com/pocketmoon/Results4.txt
http://homepage.ntlworld.com/pocketmoon/Results4s.txt


Plus some interesting (normalised) graphs for you.

First off a graph of the above results. Wow those zixels get hit hard with AA!

g1.jpg


Secondly the test at 4xAA but at standard clock (400/800), underclocked mem (400/600) and underclocked core (300/800)


g2.jpg



discuss :D


EDIT
Check out the 4xS impact on zixels compared to 4x...
 
Solid proof the extra zixel power indeed comes via the extra z-units already there for MSAA.

Interesting that 4xMSAA seems to completely shut off whatever z-only magic (presumably something in the drivers) it is that allows NV30 to decide it can just skip running the shader programs.
 
I'm not so sure, Dave H. Notice that the 2x scores are identical to the no-FSAA scores, while the 4x scores are below half the performance of 2x or no AA (with color writes disabled).

I really think that nVidia screwed up big time with the AA of the FX.
 
In the long run, I don't know what nVidia has done with it's technology by judging benchmarks.
If they have disabled it all in the drivers (not likely) or if they have just plainly destroyed the hardware itself.
 
I'm not so sure, Dave H. Notice that the 2x scores are identical to the no-FSAA scores

Which is exactly what we'd expect. FWIW, the setup I envision is that each pixel pipeline has 4 z-units, in pairs of two, hardwired to calculate z-values at particular offsets to a given pixel. (Incidentally--and I'm sure there's a simple way around this--but in my most naive idea of this it seems like it would be easier for 2xMSAA to be limited to OGMS where stencil/z-only passes are concerned.)

In any case, with 4 z-units and only double-rate zixeling, there's absolutely no reason why 2xMSAA would affect things any.

while the 4x scores are below half the performance of 2x or no AA (with color writes disabled).

Pure fillrate is about half performance, which is as expected. As for the other results, it's abundantly clear that when in "z-only mode" NV30 is simply ignoring everything to do with colors--texture reads and shader ops. Presumably, with 4xMSAA the drivers don't even consider the possibility of entering "z-only mode" and therefore go through all the motions, failing only to write color when it's done. (Which is exactly what R300 appears to do on this benchmark).
 
Matches up well with the NV30 working like an NV2x, which also relies on the extra Z units to write 8 z/stencils per clock on the XBox.
 
antlers4 said:
Matches up well with the NV30 working like an NV2x, which also relies on the extra Z units to write 8 z/stencils per clock on the XBox.

so the XBox is also optimised for the "Z-buffer first" rendering ala Doom3 style?
 
Back
Top