Blending rate confusion on NVIDIA GPUs...

fellix

Veteran
I don't know whatever someone noted it by now, but anyway - I've diged some nDAW Fillrate bench numbers from here and there (under "here" I mean the b3D reviews) and I put some simple math on it:

blendrate0bo.png

^^Updated

The last column indicates the color/alphablend fillrate ratio expressed in % of reduction on colorrate basis.
As you guess, the more persentage is - the less efficient blending rate is performed (obviously), e.g. 6800U is in the worse condition here (except for the ATi part).
//I've also included the theoretical rates for every card, instead of pointing the raw MHz, because I think that these are key factor to the issue.

So the 0.2 cent question is: Where is the reason for this "mess" with the blend rates, and is the NV42 (in it's 6800GS flavor) found the sweetish spot of balance. ;)

Too bad, I don't have a 12-pipe NV40 score here.
 
Last edited by a moderator:
The blend rate drop is based upon color/blendrate ratio, not just simple color/texel ratio - the later doen't imply anything in the case. Note, that I haven't included the actual blendrate in the table, but I've pointed the raw numbers (incl. the bandwidth) only to distinguish each GPU base capacity acctualy affecting the blend rate.
However, you can browse for some fillrate numbers through the b3D reviews, as I did.
 
Last edited by a moderator:
There is no mess. It's a combination of bandwidth limitation and a limited number of blend units. NV40 can output 16 pixels per clock, but only 8 with blending. G70 can do 16 with or without blending, but is bandwidth limited with blending. For NV43 it's 4 and 2, respecively. I don't know for sure about NV42, but it could be 12 and 8.
 
Just so everyone's on the same page, fellix is trying to tie up the loose ends prompted by the B3D 6800GS review's discussion thread, in which we've kinda maybe settled that NV42 has 8 ROPs--or not. (Ha ... ha?)

It may well be 12, and Dave's explanations may hold:
The colour and Z fill-rate performances of the 6800 GS are a little further behind that of the 6800 GT than its theoretical differences would suggest it should be. In this test the GS's floating point texture fill-rate is about half the theoretical colour fill-rate, indicating that an FP16 texture takes two cycles per Bilinear sample, but the Blend rate is higher than half the fill-rate, indicating that all of the ROPs may be blend capable.

fellix, I can't figure out how you got your % figures. Could you explain your formula for, say, the 6600GT #s in the 6800GS review (1983 color, 1636 blend).
 
OK, here is a "classic" example (with my own results):

6800GT (433/1150) -- 6452|3266 (color|blend);
6800GS (433/1150) -- 3472|3233 (color|blend);

...and the math:

6800GT >> 100 - 6452/3266 = 98%
6800GS >> 100 - 3472/3233 = 7%

We see, that the color rate of NV40 card is nearly twice higher... or, I prefer to say - the blend rate is twice lower, assuming that in this particular case NV40 have double the ROP rate (count), compared to NV42.
And I hardy can accept, that not all (half?) of the ROPs of NV40 are blend capable.
What about G70 flavors - 1/4 can't blend or 2/3 or 9/10?
Most probably, not all can sustain blending, constained by the FB bandwidth... or may be something else!?
 
fellix said:
OK, here is a "classic" example (with my own results):

6800GT (433/1150) -- 6452|3266 (color|blend);
6800GS (433/1150) -- 3472|3233 (color|blend);

...and the math:

6800GT >> 100 - 6452/3266 = 98%
6800GS >> 100 - 3472/3233 = 7%

We see, that the color rate of NV40 card is nearly twice higher... or, I prefer to say - the blend rate is twice lower, assuming that in this particular case NV40 have double the ROP rate (count), compared to NV42.
Your math is wrong, that A is 98% higher than B is not the same as B being 98% lower than A. 98% lower is not twice lower, its fifty times lower! The correct way is 1 - 3266/6452 = 0.49 = 49% lower.
 
Yep, you'r right about the percentage representation, exept for the "fifty TIMES" line!? ;)
It looks I've scrambled the only ALU in my head.

But that don't alter the main topic for the rates, anyway:

6800GT >> 1 - 3266/6452 = 49%
6800GS >> 1 - 3233/3472 = 6%

//I'll update the table later on.
 
fellix said:
Yep, you'r right about the percentage representation, exept for the "fifty TIMES" line!? ;)
If you mean that saying "fifty times lower" is a meaningless phrase, then maybe you're right. But if you're saying that the difference is not 50 times, then you're wrong. If A is 100, and B is 98% lower, then B = 100 - 98 = 2. 100 / 2 = 50, A is fifty times as large as B.
 
ok, ok - it's all set for correct now. ;)

The chart was updated.

P.S.: Anyone with unlocked 6800nu here?
 
Last edited by a moderator:
fellix said:
And I hardy can accept, that not all (half?) of the ROPs of NV40 are blend capable.
What about G70 flavors - 1/4 can't blend or 2/3 or 9/10?
Most probably, not all can sustain blending, constained by the FB bandwidth... or may be something else!?
AFAIK NV40 can blend 8 pixels per clock (4 with FP16), G70 can blend 16 per clock (8 with FP16), and that rate can be sustained with AA.
It's a combination of transistor budget and bandwidth restrictions that led to this design.
 
Looking at some of the results for ATI chipsets, it would seem that the blend rate is halve the colour rate (eg. X1800 - colour = 16 pixels per clock, blend 8 pixels per clock):
Code:
		fill-rate	colour test	blend test	% diff
X1800XL		8000		7059.0		2999.8		-58%
X850 Pro	6060		4967.3		2733.0		-45%
X800 XT		8000		5634.6		2967.1		-47%
X800 XL		6400		4980.3		2750.6		-45%
9800 Pro	3040		2665.1		1822.0		-32%
X700 XT		3800		3440.2		1708.8		-50%
 
fellix said:
ok, ok - it's all set for correct now. ;)

The chart was updated.

P.S.: Anyone with unlocked 6800nu here?

I'm not sure what you mean by "unlocked', but I've got two machines that have plain 6800's in them.
 
Xmas said:
NV40 can output 16 pixels per clock, but only 8 with blending. G70 can do 16 with or without blending, but is bandwidth limited with blending.
Hmm, so G70 has an updated ROP unit, and NV42 might simply have 8 of these ROP units? fellix's GS blend rate tracks exactly with 434(MHz/s) * 8(ROPs, or blends/s) = 3472(Mblends).

fellix's chart also shows how bandwidth-limited blends are. See especially the 7800 vs. the 6800GS w/ the same bandwidth (lower efficiency b/c more ROPs), or the two 6600GTs (same thing, higher core clock but same mem clock means less bandwidth per ROP per clock).
 
Back
Top