Interpreting 3DMark fillrate figures

I was playing around with 3DMark 2006 tonight and the results somewhat puzzle me.

On my 7800 (256MB), single texturing scores around 4200 MTexle/s.
Multitexturing, however, scores around 6600 MTexels/s.

Admittely, it's been a while since I ran these test. The last time was with my Voodoo 3 using 3Dmark99!

I can understand the Voodoo 3 giving higher numbers in multi-texturing, since it has two TMus per pipe. But why does the G70 with one texture per pipe also manage to output more texels when multi-texturing? I surfed around and the R580 also has this characteristic.

Another issue is the massive disparity between theoretical and actual fillrate numbers. The G70 should render in excess of 10 MTexels/s (430 * 24). But it is not even performing at half that number. Given that 3DMark06 uses miserly 2x2 textures and only 64 quads, I just don't see what could be preventing the hardware from performing to its peak. I recall my Voodoo 3 performing very close to the theoretical 166/333 (pixel/texel) fillrate numbers. What's up with today's cards?
 
Sounds like you may be hitting the memory bandwidth limits of your card - if you are bandwidth-limited, dual-texturing will cause only half as much framebuffer bandwidth usage per applied texture layer as single-texturing.
 
3DMark's "single-texture fillrate" test has been entirely bandwidth limited for all versions of 3DMark. It is grossly mislabelled. Everyone should just ignore it. Or better: it should finally be removed from 3DMark.

*shakes fist*
 
JF_Aidan_Pryde said:
On my 7800 (256MB), single texturing scores around 4200 MTexle/s.
Multitexturing, however, scores around 6600 MTexels/s.

What's up with today's cards?
24 TMUs, but only 16 ROPs :)
 
...and one more thing - the single fillrate test in 3DMark performs alpha blending on textures being processed, so the final rate speed is even more bounded.
Just use the nDAW benchmark to evaluate the fillrates.
 
JF_Aidan_Pryde said:
On my 7800 (256MB), single texturing scores around 4200 MTexle/s.
Multitexturing, however, scores around 6600 MTexels/s.
As per normal people are jumping all over 3DMark without actually stopping to look at the figures you're displaying. What kind of 7800 are you using? A GTX or GT? A quick check of some figures of my own (source) shows something more in line with expectations for the multitexturing test compared to yours.

...and one more thing - the single fillrate test in 3DMark performs alpha blending on textures being processed, so the final rate speed is even more bounded.
Just use the nDAW benchmark to evaluate the fillrates.
Both tests are using alpha blending - the difference being how it is done, as the multitexturing test is blending 8 textures per quad, rather than 64 separate quads. It's certainly the better of the two tests though but the nDAW app isn't without fault either.
 
Thanks Nick.

Your results make much more sense. My 7800 GTX is underclocked because I don't have the PCI-E power connector. I forgot about this while testing.

Your multi-texture results seem to exceed the theoretical by a little: 450MHz * 24 = 10800 MT/s. You scored 11519 MT/s. Any idea why this may be?
 
JF_Aidan_Pryde said:
Your multi-texture results seem to exceed the theoretical by a little: 450MHz * 24 = 10800 MT/s. You scored 11519 MT/s. Any idea why this may be?
That's because it's not running at 450MHz - it's actually at 490MHz; 3DMark isn't recording the appropriate 3D clock domains correctly.
 
Neeyik said:
As per normal people are jumping all over 3DMark without actually stopping to look at the figures you're displaying. What kind of 7800 are you using? A GTX or GT? A quick check of some figures of my own (source) shows something more in line with expectations for the multitexturing test compared to yours.
And it still shows that your "single-texturing fillrate" benchmark doesn't measure single-texturing fillrate. That's why I've been jumping a lot in the past years.
You score 5280MT/s. You should score 16T*490MHz=7840MT/s and you will score very close to that in ... tata ... a useful single-texture fillrate benchmark.

You score 5280MT/s because of a bandwidth limitation. You have roughly 40GB/s of memory bandwidth on your card and the predicted score for that subtest is bandwidth/(4+7*8)*8~=5350MT/s. Close enough? Am I still jumping or am I simply stating the truth?

4+7*8: 4 bytes per texel for the pure writes in the first layer, 8 bytes per texel for the remaining 7 layers to account for both reads and writes.
Neeyik said:
Both tests are using alpha blending - the difference being how it is done, as the multitexturing test is blending 8 textures per quad, rather than 64 separate quads. It's certainly the better of the two tests though but the nDAW app isn't without fault either.
You make it sound like it's just a minor detail but that difference is in fact very significant.
 
In the context of what JF_Aidan_Pryde was referring to (i.e. unusually low fill rate results), you and others were "jumping in". It's all very well criticising the product (I am well aware of the failings of the tests, and I can guarantee that I've been vocal about it towards FM as long as anybody else here - what would seem to be an important difference is that I don't get worked up about it!) but you're not actually helping JF's problem.

The point about the use of alpha blending is indeed very important but I wasn't attempting to reduce its significance - I was merely adding to Felix's comment about its usage.
You score 5280MT/s. You should score 16T*490MHz=7840MT/s and you will score very close to that in ... tata ... a useful single-texture fillrate benchmark.
http://www.beyond3d.com/reviews/leadtek/7800r/index.php?p=05#fill

Same product in both tests - it would seem that the 3DMark fill rate results fit in between the pure colour fill rate and single texture alpha blend test results. So when does a benchmark become "useful" or "useless"? Who actually decides such qualities? The end user, of course - to you, it might be useless, but to me it's useful in that it tells me "something". Solely relying on 3DMark to measure a graphics card's fill rate is obviously flawed but then the same is true of any other fill rate benchmark around - wait until you see the fun and games I've been having with a 7300GS ;)
 
Neeyik said:
In the context of what JF_Aidan_Pryde was referring to (i.e. unusually low fill rate results), you and others were "jumping in".
His "unusually low fillrate results" in 3DMark are meaningless. It is pure coincidence that in the case of the 7800 series we must indeed expect an MT fillrate significantly above the ST fillrate, due to the narrow ROP architecture of the chip.

3DMark shows such results on almost every card ever built, with the only exception I'm aware of being the extremely unbalanced Parhelia. That's because it measures bandwidth. It was an educational jump. It didn't sound like JF_Aidan_Pryde was aware of the problem.

So even while it may not be so terribly wrong here as it usually is, it is still important that people are educated to disregard 3DMark whenever they seek to measure [pause] single-textured fillrate.
Neeyik said:
It's all very well criticising the product (I am well aware of the failings of the tests, and I can guarantee that I've been vocal about it towards FM as long as anybody else here - what would seem to be an important difference is that I don't get worked up about it!) but you're not actually helping JF's problem.
Excuse me, but the amount of being worked up seems perfectly adequate to me after more than six years of this, and still counting.
Neeyik said:
<...>
http://www.beyond3d.com/reviews/leadtek/7800r/index.php?p=05#fill

Same product in both tests - it would seem that the 3DMark fill rate results fit in between the pure colour fill rate and single texture alpha blend test results.
Coincidence.
Neeyik said:
So when does a benchmark become "useful" or "useless"? Who actually decides such qualities? The end user, of course - to you, it might be useless, but to me it's useful in that it tells me "something".
It tells you something other than it claims to tell you. You have a bandwidth benchmark there, but instead of displaying bandwidth you multiply by some arbitrary constant and display it as something else. The test is highly misleading, not to me, mind you, but to "end users". And to reviewers.
Scott Wasson said:
Hmm. The S8 Nitro's multitextured fill rate is faster than the competition, as expected, but its single-textured fill rate is well below what we'd expect from an eight-pipe design. This low performance raises red flags for us, because both NVIDIA and SiS have misled the press and the public about the pipeline configurations of their GPUs in the recent past. NVIDIA led us to believe its NV30 was a 8x1 pipeline configuration, but it turned out to be a 4x2 design—that is, it had four pipes with two texture units per pipe. Similarly, SiS said its Xabre was a 4x2 design, but it turned out be a rather unorthodox 2x4 configuration.

S3, however, is adamant DeltaChrome S8 is an eight-pipeline GPU. I inquired about this issue pointedly and repeatedly, and the answer was consistent. S3 says the pipeline config doesn't switch to 4x2 at any point.
Source

This is just one example of many. In fact I had a short email exchange with Scott about that because it seemed to be an ongoing trend at The Tech Report anyway, and the quote I've just given finally got me "worked up" enough to, pardon, tell him the truth about that 3DMark subtest.

"Fillrate" is a word with a predefined meaning. People have an expectation about what fillrate is, and they can (and will, as seen above) work with "fillrates" based on these expectations. You don't change that. You don't go around and take triangle throughput figures, multiply by 17, tack on a different unit and display them as "Fillrate" either, do you? But that's essentially what you do with your bandwidth test: you measure some property of the hardware and arbitrarily display it as something else.

It would be much more useful if it were actually labelled as a bandwidth test and had the proper unit attached. It's not really useless as it is, it's just ... grossly misleading.
Neeyik said:
Solely relying on 3DMark to measure a graphics card's fill rate is obviously flawed but then the same is true of any other fill rate benchmark around - wait until you see the fun and games I've been having with a 7300GS ;)
Yeah, well ...
I really don't think all other fillrate benchmarks are as flawed as 3DMark's. In fact I don't think that any other is. At least I know of two that are not.

You know it would be all fine and dandy to just concentrate on "real world" things. E.g. Serious Sam had a built-in "real world" fillrate benchmark and I don't think many people have complained about that. It's just that I prefer that, if you do synthetics, that you do them properly, and if you don't do synthetics, don't pretend you do.
 
Last edited by a moderator:
zeckensack said:
His "unusually low fillrate results" in 3DMark are meaningless. It is pure coincidence that in the case of the 7800 series we must indeed expect an MT fillrate significantly above the ST fillrate, due to the narrow ROP architecture of the chip.
No, it's not meaningless because you're still not getting it.
JF_Aidan_Pryde said:
Another issue is the massive disparity between theoretical and actual fillrate numbers. The G70 should render in excess of 10 MTexels/s (430 * 24). But it is not even performing at half that number.
Apart from not fully understanding what the tests were doing, JF was also concerned about the fact that his results were far lower than he was expecting. I then showed him that this was not the norm (providing my own results) and typically MT results are usually within the expected figures given by theoretical values, accounting for bandwidth loses. This is what I was responding to, with an additional comment about the use of alpha blending - that's all. There was no need for a self-righteous tirade about the fucking program.
 
zeckensack,
According to 3DMark06, it uses 2x2 textures in the fillrate tests. Surely this would always be in the texture cache during the tests -- how could the result be bandwidth limited?
 
JF_Aidan_Pryde said:
zeckensack,
According to 3DMark06, it uses 2x2 textures in the fillrate tests. Surely this would always be in the texture cache during the tests -- how could the result be bandwidth limited?
Framebuffer bandwidth, not texture bandwidth.
 
arjan de lumens said:
Framebuffer bandwidth, not texture bandwidth.
The drop in texture size only helped the single texture test too; had virtually no effect with the multitexture one which does beg the question why they really bothered with the texture size change.
 
it has been seen and known for many years that single texture fillrate is limited by bandwith, I don't understand this sudden outrage about it, and it's certainly not meaningless at all.
a fillrate benchmark which doesn't suffer of the bandwith limitation, THAT would be perfectly meaningless.
 
Hello.
N00b here...

Always thought that fill-rate was depended colour or rather texture depth and data rate.
Cause I have a 7800GT (490/1.2Ghz)

That give a ST: of 4638.2MTexels/sec
MT: 9521.1MTexelse/sec

As I can see the ST fill-rate is less than half of the theoratical maximum, while the MT is closer to where it should be. Using 16-bit colour and I did get :
ST: of 7432.2MTexels/sec
MT: 9521.1MTexelse/sec

Why does colour depth have such a massive effect on single texture fill-rate?
 
Gouhan said:
Hello.
Why does colour depth have such a massive effect on single texture fill-rate?
The single texture test performs 64 alpha blends per frame - half the amount of data being read and written for each blend, and you'll get better performance.
 
Oh okay that's a bit better.
Very wiered though that at the lower screen sizes the fill-rate is worse thn that of the higher screen sizes. I lose about 400MP in the ST test when going from 1280x1024 to 640x480...
 
Gouhan said:
Oh okay that's a bit better.
Very wiered though that at the lower screen sizes the fill-rate is worse thn that of the higher screen sizes. I lose about 400MP in the ST test when going from 1280x1024 to 640x480...
I would guess at state changes. When performing a 'state change' (swapping to another texture, another blend mode, another framebuffer or whatever), the GPU may be unable to draw any pixels for a short period. When reducing the resolution, the amount of time needed for state changes in unchanged, but the number of pixels drawn between each state change is greatly reduced, causing the percentage of cycles lost to state changes to increase.
 
Back
Top