7800/6800 fillrate details

Mintmaster

Veteran
Back in the R3xx vs. NV3x days we knew all the details of fillrate under different conditions, such as with various levels of FSAA, when blending, etc. But I'm having a really hard time finding all the details for the 7800GTX and even the 6800U.

Here's what I know about the fillrate (ROP only) of the 6800U:

-32 pix/clk for z/stencil only
-16 pix/clk for colour writes
-8 pix/clk for blending
-4 pix/clk for FP16 blending
-32 samples/clk with AA

I still have these questions though:

1. What is the fillrate with AA and blending? Each sample needs to be blended separately.

2. What is the FP16 fillrate without blending? I would assume it's 8 pix/clk.

3. Most importantly, are these numbers all the same for the 7800GTX?

I tried to look at Futuremark's ORB for 7800GTX AA fillrate (I assume 3DMark05 still uses blending), but don't have the pro version.

Thanks in advance for anyone who runs any tests.
 
Mintmaster said:
Here's what I know about the fillrate (ROP only) of the 6800U:

-32 pix/clk for z/stencil only
-16 pix/clk for colour writes
-8 pix/clk for blending
-4 pix/clk for FP16 blending
-32 samples/clk with AA

I still have these questions though:

1. What is the fillrate with AA and blending? Each sample needs to be blended separately.

2. What is the FP16 fillrate without blending? I would assume it's 8 pix/clk.

3. Most importantly, are these numbers all the same for the 7800GTX?

2). I seem to be able to write half the peak fillrate in pixels doing FP16, so 8pix/clk seems right.

And the rest of the numbers you've got are correct on both NV45 and G70 (quickly tested with a 6800 Ultra and 7800 GTX both on PCIe). Assuming you mean alpha blending :smile:

I don't have a fillrate test that uses floating point blending though, do you have one to try?
 
small 64-bit float texture w/ bilinear filtering, 64-bit floating point render target, no blending: 3837.1 Mpix/s

Half that with alpha blending. 490MHz ROPs, GeForce 7800 GTX, 78.03.

So yeah, 8pix/clock FP16 with 4pix/clock if blending :D You were correct from start to finish, Mintmaster.
 
Mintmaster said:
Back in the R3xx vs. NV3x days we knew all the details of fillrate under different conditions, such as with various levels of FSAA, when blending, etc. But I'm having a really hard time finding all the details for the 7800GTX and even the 6800U.

Here's what I know about the fillrate (ROP only) of the 6800U:

-32 pix/clk for z/stencil only
-16 pix/clk for colour writes
-8 pix/clk for blending
-4 pix/clk for FP16 blending
-32 samples/clk with AA

I still have these questions though:

1. What is the fillrate with AA and blending? Each sample needs to be blended separately.

2. What is the FP16 fillrate without blending? I would assume it's 8 pix/clk.

3. Most importantly, are these numbers all the same for the 7800GTX?
1. As long as all samples in a destination pixel are identical (no edge pixel), they don't need to be blended separately. In best case, NV40/G70 blending with AA is just as fast as without AA. There are two possibilities for NV40/G70:
- if a framebuffer tile is compressible, blend rate is independent of AA. If not, fillrate is divided by the number of samples, or
- NVidia implemented a recently published patent (6,927,781) so the chips average subsamples before blending.

2. 8 pix/clk

3. Yes

-4 pix/clk for FP32
 
Code:
large 32-bit integer texture w/ point sampling, 32-bit integer render target, no blending:   4270.5 Mpix/s
large 32-bit integer texture w/ point sampling, 32-bit integer render target, alpha blending:   3097.5 Mpix/s
large 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, no blending:   4196.1 Mpix/s
large 32-bit integer texture w/ point sampling, 64-bit floating point render target, no blending:   2646.4 Mpix/s
small 32-bit integer texture w/ point sampling, 32-bit integer render target, no blending:   9681.4 Mpix/s
small 32-bit integer texture w/ point sampling, 32-bit integer render target, alpha blending:   5032.6 Mpix/s
small 64-bit float texture w/ point sampling, 32-bit integer render target, no blending:   5094.3 Mpix/s
large 64-bit float texture w/ point sampling, 32-bit integer render target, no blending:   2492.6 Mpix/s

That's X1800 XT on FX-57
 
Thanks for the info guys. I'm still curious about measured results for fillrate with AA and blending both enabled.

Xmas said:
1. As long as all samples in a destination pixel are identical (no edge pixel), they don't need to be blended separately. In best case, NV40/G70 blending with AA is just as fast as without AA. There are two possibilities for NV40/G70:
- if a framebuffer tile is compressible, blend rate is independent of AA. If not, fillrate is divided by the number of samples, or
- NVidia implemented a recently published patent (6,927,781) so the chips average subsamples before blending.
I know they don't need to be blended separately, but current GPU's only push out 2 samples per clock per pipe anyways, so do they even bother with this optimization? I guess it would be smart for NVidia since they only have 8 blend units. Regardless, I'd love to see the measured results.

Rys, can you please check? Fillrate with 4xAA and alpha blending (X850/X1800 would be interesting). Thanks for your test data so far.
 
Mintmaster said:
3. Most importantly, are these numbers all the same for the 7800GTX?
Not quite. G70 has a full blender per ROP, so the numbers should look more like:

-32 pix/clk for z/stencil only
-16 pix/clk for colour writes
-16 pix/clk for blending
-8 pix/clk for FP16 blending


Of course, there is no memory bandwidth to blend 16 pixels/clock (or 8 pixel/clock in fp16), so those rates will not likely be achieved on real apps.
 
Code:
small 32-bit integer texture w/ point sampling, 32-bit integer render target, no blending:   7781.8 Mpix/s
small 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, no blending:   7776.1 Mpix/s
small 32-bit integer texture w/ point sampling, 32-bit integer render target, alpha blending:   4419.8 Mpix/s
large 32-bit integer texture w/ point sampling, 32-bit integer render target, no blending:   3461.4 Mpix/s
large 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, no blending:   3455.1 Mpix/s
large 32-bit integer texture w/ point sampling, 32-bit integer render target, alpha blending:   2625.3 Mpix/s

All with 4X AA forced in CCC. Sorry they're not the same tests as last time, I'll get some parity later on and repost fresh numbers.
 
Back
Top