Wierd NV40 Fillrate results

I wrote a benchmark tool myself, but I found once z read/write is enabled, there's no way I can achieve the ideal z fillrate because bandwidth begins to become the bottleneck. I'm wondering how others benchmark tools tackled this problem.
 
jolle said:
But its not related to NVs capabilites to sortof simulate 32pipes in Z passes..

I just think xbitlabs may have different understanding on "a rendered zixel". They probably included whatever zixel sent in the pipeline, regardless of whether it's processed by the ROP.

I did a quick test, using front to back order, my 9700Pro showed a z fillrate of 1588.59 MZixel/s (no way near to the ideal value because of bandwidth). Now if you consider the overdraw, the fillrate is effective 1588.59 * 2 = 3177.2 MZixel, much higher than the ideal value: 2600 MZixel/s.
 
should be pretty easy to find out..
the same test gives 9800XT 3112Mp/sec from 0 - 4 textures and
5950Ultra about 3300 at 0 textures then 2880 on 1- 4 textures

is that about right? or does that fit your theory?
i dont know the frequencies in my head hehe..
 
Oops, my theory doesn't add up. :oops:

If they applied the same calculation mechanism on other cards, they should also give out much higher z fillrate than expectation, but we saw both results from 5950U and 9800XT are just as expected.
 
hehe, i just read that Albtron is gonna have the core at 600Mhz..

funny thing about that is that when i first saw this fillrate "anomaly" mentioned at
Rage3d forum, i wrote this:

Then i thought some sort of function similar to Overdrive on ATIs XT cards
and it dynamicly clocks itself higher..
but that would mean about 618Mhz on the core, from 400Mhz, and that
would be, well insane..

all of the sudden it doesnt seem THAT crazy, but then again it wouldnt make
any sense whatsoever if it DID have something similar to Overdrive and
that it ONLY kicked in during this 1 benchmark out all the tests that has
been done...
 
Lol, man, that's the craziest idea I've heard of these days though I couldn't prove it wrong.
I'd rather tend to believe there're some sort of virtual z pipelines. ;)
 
991060 said:
Lol, man, that's the craziest idea I've heard of these days though I couldn't prove it wrong.
I'd rather tend to believe there're some sort of virtual z pipelines. ;)

Well, if you already have "virtual pipelines" why not add "virtual virtual pipelins"? :)
 
Hmm, good question, the fact is that I don't see the additional Z ROPs( C ROPs in normal mode) quite "virtual" coz we know how they work.

Maybe I should say "ghost pipeline" instead? :LOL:
 
991060 said:
Lol, man, that's the craziest idea I've heard of these days though I couldn't prove it wrong.
I'd rather tend to believe there're some sort of virtual z pipelines. ;)

yeah that would be neat, specially if it would be something like shifting the
load onto VS units, which sounds unreal but if it turned out would be a
impressive stunt no matter how useful it would actually be in real games...

Ohyeah, the source on 600Mhz core isnt totally unreliable i guess
http://www.albatron.com.tw/english/news/news_detail.asp?news_id=77

hehe, there is a minute chance the something in the driver kicked in and
sent it up to around 600Mhz Core, if that is indeed where it will be in retail..
Or that its a sick mindgame Nvidia is playing on us lol..
However (extremely) unliky it maybe..
 
The way I see this possibly working is simply this:
Color blends require a read from the framebuffer, and a write to the framebuffer.

Z-compares require a read from the z-buffer. If blending is not active, it is conceivable that the NV40 will reallocate this framebuffer read to instead become a z-buffer write.

Z-writes require a write to the z-buffer. Color output typically goes to the framebuffer, but it appears the NV40 can also output a z value to the z-buffer.

In other words, if xbitlabs' results are valid, then we can start describing the NV40 as a 32x0 + 16x1, instead of 32x0/16x1.
 
32+18 would give a closer theoretical peak at 20.000, which is above the resualt and not below it..

then again 18 is wierd too any case..

If throwing away all realistic ways of thinking, just looking at numbers as a
puzzle:

they got 19900 roughly..
32+16 * 400Mhz = 19200, somewhat below the result..
32+18 * 400Mhz = 20000, about 100 Above the test result
32 x 625Mhz = 20.000 Again just above the result..

Well, that it all of a sudden will run 625Mhz in just 1 test its just too far fetched to be concidered, even tho it seems like it will be running close to
that in its final form..

the number 18 is hard to put in there, UNLESS the also very farcetched
idea about it processing 3 pixels per VS unit is amused..
Only reason I even mentioned it is due to a rumour that was floating around earlier that it would acctually do something like that..

Even tho those things can be dissmissed as impossible/most unlikly, the real reason should land the
theoretical limit atleast ON, or just above the result shown..
Or so I would think atleast..
 
Chalnoth, is this what you're suggesting?

post-1-1082355048.png
 
found the source on the VS units rumour, its not of the very credible type, lol
http://www.theinquirer.net/?article=14878

pretty much everything else mentioned about NV40 is wrong, can anyone comment this here part:
What Nvidia is using is the ability of the Vertex Shader model 3.0 (known as PS 3.0 or VS 3.0) where the Shader can actually render pixels as a virtual pipeline. But it can only render them without filtering information other than Point Sampling, a basic filter technique.

To get into more details, the Vertex Shader model 3.0 feature has the ability to fetch textures but you could use your VS 3.0 only where you don’t have any geometry - when you have to process just pixels. This cannot be used in modern games as you process just point sample textures.

EDIT..

More suggestions need some expert calls..

http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=13
The VP unit withing the GPU.. its a 16way SIMD processor

Any way to make that one act as 16 extra pipes in special conditions..
such as ONLY Z writes, and no reads as uttar suggest as a possible
difference in the fillrate benchmark tools..
Another longshot...
 
Back
Top