FP Blending / Filtering Benchmark?

Dave Baumann

Gamerscore Wh...
Moderator
Legend
Any of the coders out there care to knock a theoretical FP Filtering and Blending fill-rate benchmark? I think this might be useful.

Cheers.
 
Jpaana's test does the following:

jpaana said:
Simple "benchmark" for testing floating point textures, render
targets, filtering and blending. It draws 200 layers of screen sized
quads per frame.

Here are the results of my BFG 6800GT AGP (8x agp rate, 370MHz core, and default memory clock) w/Forceware 65.73, at 1024x768 (no aniso/aa) with everything set to high quality (no optimizations):

large 32-bit integer texture w/ point sampling, 32-bit integer render target, no blending: 3577.8 Mpix/s

large 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, no blending: 3576.7 Mpix/s

large 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, alpha blending: 2688.4 Mpix/s

large 32-bit integer texture w/ bilinear filtering, 64-bit floating point render target, no blending: 1967.2 Mpix/s

large 32-bit integer texture w/ bilinear filtering, 64-bit floating point render target, alpha blending: 1271.6 Mpix/s

large 64-bit float texture w/ point sampling, 32-bit integer render target, no blending: 577.5 Mpix/s

large 64-bit float texture w/ bilinear filtering, 32-bit integer render target, no blending: 591.1 Mpix/s

large 64-bit float texture w/ bilinear filtering, 32-bit integer render target, alpha blending: 582.3 Mpix/s

large 64-bit float texture w/ bilinear filtering, 64-bit floating point render target, no blending: 530.6 Mpix/s

large 64-bit float texture w/ bilinear filtering, 64-bit floating point render target, alpha blending: 500.0 Mpix/s


Seems like bilinear FP16 texture filtering is taking alot more than 2 cycles per clock; it's more like 6 cycles per clock.
 
If you try the smaller texture size (key S), float textures become much faster, so probably the texture cache has insufficient bandwidth, filtering itself doesn't seem to be a problem.
 
Oops, missed the part about toggling "s" to vary texture size :oops: .

Here are the results of the bench (same as aboeve) using a small texture:

small 32-bit integer texture w/ point sampling, 32-bit integer render target, no blending: 5904.2 Mpix/s

small 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, no blending: 5922.4 Mpix/s

small 32-bit integer texture w/ bilinear filtering, 64-bit floating point render target, no blending: 3058.8 Mpix/s

small 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, alpha blending: 3749.3 Mpix/s

small 32-bit integer texture w/ bilinear filtering, 64-bit floating point render target, alpha blending: 1482.4 Mpix/s

small 64-bit float texture w/ point sampling, 32-bit integer render target, no blending: 3029.9 Mpix/s

small 64-bit float texture w/ bilinear filtering, 32-bit integer render target, no blending: 3028.4 Mpix/s

small 64-bit float texture w/ bilinear filtering, 64-bit floating point render target, no blending: 2939.7 Mpix/s

small 64-bit float texture w/ bilinear filtering, 32-bit integer render target, alpha blending: 2977.7 Mpix/s

small 64-bit float texture w/ bilinear filtering, 64-bit floating point render target, alpha blending: 1489.8 Mpix/s

From the above results, it seems NV40 (A1 core) is sticking to its claimed theoretical performance and is not "broken" in the blending/filtering department.

We can conclude the following about the Performance of NV40 (according to this bench), when texture size is optimized for cache size (as stated by Jpaana):

With Int textures...

-Bilinear filtering/Point sampling takes 1 cycle

-Rendering to an FP64 framebuffer takes 2 cycles

-Performing Alpha Blending takes 2 cycles

-Rendering to an FP64 framebuffer and performing alpha blending takes 4 cycles

With FP16 Textures...

-Bilinear filtering/Point sampling takes 2 cycles

-Rendering to an FP64 framebuffer takes 2 cycles

-Performing Alpha Blending on an Int32 framebuffer takes 2 cycles

-Rendering to an FP64 framebuffer and performing Alpha Blending takes 4 cycles
 
DaveBaumann said:
Any of the coders out there care to knock a theoretical FP Filtering and Blending fill-rate benchmark? I think this might be useful.

BTW, suggestions as to how these can be useful in a game would be good.
 
Bit of testing with a 9800:

Code:
large 32-bit int texture w/ point , 32-bit int  target, no blending:   1502.3 Mpix/s
large 32-bit int texture w/ point , 32-bit int  target, alpha blending:   1239.9 Mpix/s
large 32-bit int texture w/ bilinear , 32-bit int  target, no blending:   1553.7 Mpix/s
large 32-bit int texture w/ bilinear , 32-bit int  target, alpha blending:   1216.7 Mpix/s
large 32-bit int texture w/ point , 64-bit FP  target, no blending:   1157.9 Mpix/s
large 32-bit int texture w/ point , 64-bit FP  target, alpha blending:   1282.0 Mpix/s
large 32-bit int texture w/ bilinear , 64-bit FP  target, no blending:   1290.7 Mpix/s
large 32-bit int texture w/ bilinear , 64-bit FP  target, alpha blending:   1204.6 Mpix/s
large 64-bit FP texture w/ point , 64-bit FP  target, no blending:    881.6 Mpix/s
large 64-bit FP texture w/ point , 64-bit FP point  target, alpha blending:    888.0 Mpix/s
large 64-bit FP texture w/ bilinear , 64-bit FP point  target, alpha blending:    893.6 Mpix/s
large 64-bit FP texture w/ bilinear , 64-bit FP point  target, no blending:    893.6 Mpix/s

Code:
small 32-bit int texture w/ point , 32-bit int  target, no blending:   2612.2 Mpix/s
small 32-bit int texture w/ bilinear , 32-bit int  target, no blending:   2729.2 Mpix/s
small 32-bit int texture w/ point , 32-bit int  target, alpha blending:   2103.7 Mpix/s
small 32-bit int texture w/ bilinear , 32-bit int  target, alpha blending:   2065.6 Mpix/s
small 32-bit int texture w/ point , 64-bit FP point  target, no blending:   1436.3 Mpix/s
small 32-bit int texture w/ bilinear , 64-bit FP point  target, no blending:   1439.4 Mpix/s
small 32-bit int texture w/ point , 64-bit FP point  target, alpha blending:   1407.2 Mpix/s
small 32-bit int texture w/ bilinear , 64-bit FP point  target, alpha blending:   1394.8 Mpix/s
small 64-bit FP texture w/ point , 64-bit FP point  target, no blending:   1375.7 Mpix/s
small 64-bit FP texture w/ bilinear , 64-bit FP point  target, no blending:   1397.7 Mpix/s
small 64-bit FP texture w/ point , 64-bit FP point  target, alpha blending:   1382.9 Mpix/s
small 64-bit FP texture w/ bilinear , 64-bit FP point  target, alpha blending:   1376.1 Mpix/s
 
On my 6600GT clocked at 520Mhz:

Code:
small 32-bit integer texture w/ point sampling, 32-bit integer render target, no blending:   2086.3 Mpix/s
small 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, no blending:   2092.3 Mpix/s
small 32-bit integer texture w/ bilinear filtering, 64-bit floating point render target, no blending:   1049.1 Mpix/s
small 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, alpha blending:   2118.5 Mpix/s
small 32-bit integer texture w/ bilinear filtering, 64-bit floating point render target, alpha blending:    541.8 Mpix/s
small 64-bit float texture w/ point sampling, 32-bit integer render target, no blending:   2086.1 Mpix/s
small 64-bit float texture w/ bilinear filtering, 32-bit integer render target, no blending:   2081.0 Mpix/s
small 64-bit float texture w/ bilinear filtering, 64-bit floating point render target, no blending:   1044.0 Mpix/s
small 64-bit float texture w/ bilinear filtering, 32-bit integer render target, alpha blending:   2077.0 Mpix/s
small 64-bit float texture w/ bilinear filtering, 64-bit floating point render target, alpha blending:    529.1 Mpix/s

Looks like FP16 textures don't take a hit as they do on the NV40.
 
It seems the 6600GT takes a negligeable performance hit for FP *filtering, while the 6800's fillrate is cut in half.

With full HDR it comes down to 1489.8 Mpix/s vs. 529.1 Mpix/s (6800GT vs. 6600GT, respectively) when bilinearly filtering a small (32x32) FP texture, rendering it to a 64-bit FP buffer, and performing alpha blending. The 6800GT is ahead by a factor of 3, which corresponds to the clockspeed and pipeline differences between the two processors.

*Edited word from blending to filtering
 
Graphics_Krazy said:
On my 6600GT clocked at 520Mhz:
Looks like FP16 textures don't take a hit as they do on the NV40.
Keep in mind though the results with 32bits textures/targets are much lower to begin with (I assume because of the single quad-rop unit?).
Also, what are the results with large texture sizes?
 
[edit]sorry, i just saw this is discussed in another thread already, missed that :oops: :LOL: [/edit]






Luminescent said:
It seems the 6600GT takes a negligeable performance hit for FP blending, while the 6800's fillrate is cut in half.

Is this the explanation why the 6600GT is quicker than the 6800GT/U with HDR enabled?

training_hdr.gif


http://www.xbitlabs.com/articles/video/display/farcry13_7.html


However, in another bench the 6600GT is definately much slower then 6800GT/U. :? (but maybe it is because here are all PCI-E cards, in the above x-bit bench 6800u/gt are AGP.)

fc_hdr7.png


http://www.hexus.net/content/reviews/review.php?dXJsX3Jldmlld19JRD05MzAmdXJsX3BhZ2U9Ng==
 
Stock 6800:

Code:
large 32-bit int texture w/ point sampling, 32-bit int render target, no blending:   1876.3 Mpix/s
large 32-bit int texture w/ bilinear filtering, 32-bit int render target, no blending:   1794.6 Mpix/s
large 32-bit int texture w/ point sampling, 32-bit int render target, alpha blending:   1158.8 Mpix/s
large 32-bit int texture w/ bilinear filtering, 32-bit int render target, alpha blending:   1144.0 Mpix/s
large 64-bit float texture w/ point sampling, 32-bit int render target, no blending:    362.8 Mpix/s
large 64-bit float texture w/ bilinear filtering, 32-bit int render target, no blending:    342.0 Mpix/s
large 64-bit float texture w/ point sampling, 32-bit int render target, alpha blending:    391.9 Mpix/s
large 64-bit float texture w/ bilinear filtering, 32-bit int render target, alpha blending:    377.0 Mpix/s

Code:
large 32-bit int texture w/ point sampling, 64-bit FP render target, no blending:   1433.7 Mpix/s
large 32-bit int texture w/ bilinear filtering, 64-bit FP render target, no blending:   1450.3 Mpix/s
large 32-bit int texture w/ point sampling, 64-bit FP render target, alpha blending:    966.1 Mpix/s
large 32-bit int texture w/ bilinear filtering, 64-bit FP render target, alpha blending:    947.1 Mpix/s
large 64-bit float texture w/ point sampling, 64-bit FP render target, no blending:    355.4 Mpix/s
large 64-bit float texture w/ bilinear filtering, 64-bit FP render target, no blending:    353.4 Mpix/s
large 64-bit float texture w/ point sampling, 64-bit FP render target, alpha blending:    332.4 Mpix/s
large 64-bit float texture w/ bilinear filtering, 64-bit FP render target, alpha blending:    300.5 Mpix/s

Code:
small 32-bit int texture w/ point sampling, 32-bit int render target, no blending:   1935.9 Mpix/s
small 32-bit int texture w/ bilinear filtering, 32-bit int render target, no blending:   1949.3 Mpix/s
small 32-bit int texture w/ point sampling, 32-bit int render target, alpha blending:   1502.4 Mpix/s
small 32-bit int texture w/ bilinear filtering, 32-bit int render target, alpha blending:   1462.2 Mpix/s
small 64-bit float texture w/ point sampling, 32-bit int render target, no blending:   1564.6 Mpix/s
small 64-bit float texture w/ bilinear filtering, 32-bit int render target, no blending:   1586.0 Mpix/s
small 64-bit float texture w/ point sampling, 32-bit int render target, alpha blending:   1297.9 Mpix/s
small 64-bit float texture w/ bilinear filtering, 32-bit int render target, alpha blending:   1256.0 Mpix/s

Code:
small 32-bit int texture w/ point sampling, 64-bit FP render target, no blending:   2561.3 Mpix/s
small 32-bit int texture w/ bilinear filtering, 64-bit FP render target, no blending:   2578.0 Mpix/s
small 32-bit int texture w/ point sampling, 64-bit FP render target, alpha blending:   1246.0 Mpix/s
small 32-bit int texture w/ bilinear filtering, 64-bit FP render target, alpha blending:   1253.1 Mpix/s
small 64-bit float texture w/ point sampling, 64-bit FP render target, no blending:   1940.8 Mpix/s
small 64-bit float texture w/ bilinear filtering, 64-bit FP render target, no blending:   1948.8 Mpix/s
small 64-bit float texture w/ point sampling, 64-bit FP render target, alpha blending:   1260.8 Mpix/s
small 64-bit float texture w/ bilinear filtering, 64-bit FP render target, alpha blending:   1259.5 Mpix/s
 
Luminescent said:
It seems the 6600GT takes a negligeable performance hit for FP blending, while the 6800's fillrate is cut in half.

???

small 64-bit float texture w/ bilinear filtering, 32-bit integer render target, alpha blending: 2077.0 Mpix/s
small 64-bit float texture w/ bilinear filtering, 64-bit floating point render target, alpha blending: 529.1 Mpix/s
 
Graphics_Krazy said:
Looks like FP16 textures don't take a hit as they do on the NV40.

Don't forget that while the 6600 has 8 pixel pipelines, it has only 4 rops. That's why you can't see the huge performance drop.
 
Tridam said:
Luminescent said:
It seems the 6600GT takes a negligeable performance hit for FP blending, while the 6800's fillrate is cut in half.

???

small 64-bit float texture w/ bilinear filtering, 32-bit integer render target, alpha blending: 2077.0 Mpix/s
small 64-bit float texture w/ bilinear filtering, 64-bit floating point render target, alpha blending: 529.1 Mpix/s
I'm sorry, I meant filtering. :oops:

I was referring to the following:
6600GT
small 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, no blending: 2092.3 Mpix/s
small 64-bit float texture w/ bilinear filtering, 32-bit integer render target, no blending: 2081.0 Mpix/s
as opposed to
6800GT
small 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, no blending: 5922.4 Mpix/s
small 64-bit float texture w/ bilinear filtering, 32-bit integer render target, no blending: 3028.4 Mpix/s
 
Luminescent said:
I'm sorry, I meant filtering. :oops:

I was referring to the following:
6600GT
small 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, no blending: 2092.3 Mpix/s
small 64-bit float texture w/ bilinear filtering, 32-bit integer render target, no blending: 2081.0 Mpix/s
as opposed to
6800GT
small 32-bit integer texture w/ bilinear filtering, 32-bit integer render target, no blending: 5922.4 Mpix/s
small 64-bit float texture w/ bilinear filtering, 32-bit integer render target, no blending: 3028.4 Mpix/s

Not seeing such a big hit on my vanilla 6800 (66.81). Still worse than the 6600GT though.

Code:
small 32-bit int texture w/ bilinear filtering, 32-bit int render target, no blending:   1949.3 Mpix/s 
small 64-bit float texture w/ bilinear filtering, 32-bit int render target, no blending:   1586.0 Mpix/s
 
That's because the NV43 is ROP-limited when only doing a single bilinear 32bit texture access per pixel.

But those 6800 results look pretty weird, sometimes exactly half of what they should be.

Theoretical peak values:
Code:
FP   FP          6800GT    6800      6600GT
tex  rt  bl      NV40-16   NV40-12   NV43

-    -    -       5600      3900      2000
-    -    x       2800      2600      2000
-    x    -       2800      2600      1000
-    x    x       1400      1300       500
x    -    -       2800      1950      2000
x    -    x       2800      1950      2000
x    x    -       2800      1950      1000
x    x    x       1400      1300       500
 
Back
Top