New GLSL / Pbuffer benchmark [Update: version 1.4 / ORCv0.4]

xp1700 oc'ed to 2.1ghz - 6800nu 128MB 400\800 - forceware 71.84

minus the last test as it was taking far too long.
also to note is that this program was eating up over a 100 meg of my system ram on that last test, and before that test it was only using 20MB.

Code:
GL filter framework 1.3 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...



Testing 32x32 image:
BufferCreateINT: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
No suitable INT format found. Trying FP... (Flaky 6x00 workaround)

BufferCreateINT16: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
JustCopy: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
SimpleSmooth: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
TexNoise: msecs: 328 || ms/i: 0.164 || i/s: 6097.56
3x3Conv: msecs: 297 || ms/i: 0.297 || i/s: 3367
TEncode: msecs: 188 || ms/i: 0.188 || i/s: 5319.15
TDecode: msecs: 203 || ms/i: 0.203 || i/s: 4926.11
LinDiffINT: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
LinDiffINT16: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
LinDiffFP16: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
LinDiffFP32: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
PMTEncoded: msecs: 547 || ms/i: 0.547 || i/s: 1828.15
PMStandard: msecs: 437 || ms/i: 0.437 || i/s: 2288.33
PMBuffered: msecs: 31 || ms/i: 0.062 || i/s: 16129

Testing 64x64 image:
BufferCreateINT: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
SimpleSmooth: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
TexNoise: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
3x3Conv: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
TEncode: msecs: 93 || ms/i: 0.093 || i/s: 10752.7
TDecode: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
LinDiffINT: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
LinDiffINT16: msecs: 250 || ms/i: 0.125 || i/s: 8000
LinDiffFP16: msecs: 250 || ms/i: 0.125 || i/s: 8000
LinDiffFP32: msecs: 250 || ms/i: 0.125 || i/s: 8000
PMTEncoded: msecs: 437 || ms/i: 0.437 || i/s: 2288.33
PMStandard: msecs: 438 || ms/i: 0.438 || i/s: 2283.11
PMBuffered: msecs: 31 || ms/i: 0.062 || i/s: 16129

Testing 128x128 image:
BufferCreateINT: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
JustCopy: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
SimpleSmooth: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
TexNoise: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
3x3Conv: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
TEncode: msecs: 93 || ms/i: 0.093 || i/s: 10752.7
TDecode: msecs: 125 || ms/i: 0.125 || i/s: 8000
LinDiffINT: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
LinDiffINT16: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
LinDiffFP16: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
LinDiffFP32: msecs: 563 || ms/i: 0.2815 || i/s: 3552.4
PMTEncoded: msecs: 453 || ms/i: 0.453 || i/s: 2207.51
PMStandard: msecs: 844 || ms/i: 0.844 || i/s: 1184.83
PMBuffered: msecs: 140 || ms/i: 0.28 || i/s: 3571.43

Testing 256x256 image:
BufferCreateINT: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 79 || ms/i: 13.1667 || i/s: 75.9494
BufferCreateFP32: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
JustCopy: msecs: 218 || ms/i: 0.109 || i/s: 9174.31
SimpleSmooth: msecs: 250 || ms/i: 0.125 || i/s: 8000
TexNoise: msecs: 250 || ms/i: 0.125 || i/s: 8000
3x3Conv: msecs: 266 || ms/i: 0.266 || i/s: 3759.4
TEncode: msecs: 110 || ms/i: 0.11 || i/s: 9090.91
TDecode: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
LinDiffINT: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
LinDiffINT16: msecs: 625 || ms/i: 0.3125 || i/s: 3200
LinDiffFP16: msecs: 609 || ms/i: 0.3045 || i/s: 3284.07
LinDiffFP32: msecs: 2109 || ms/i: 1.0545 || i/s: 948.317
PMTEncoded: msecs: 969 || ms/i: 0.969 || i/s: 1031.99
PMStandard: msecs: 3250 || ms/i: 3.25 || i/s: 307.692
PMBuffered: msecs: 859 || ms/i: 1.718 || i/s: 582.072

Testing 512x512 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 234 || ms/i: 0.234 || i/s: 4273.5
SimpleSmooth: msecs: 437 || ms/i: 0.437 || i/s: 2288.33
TexNoise: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
3x3Conv: msecs: 453 || ms/i: 0.906 || i/s: 1103.75
TEncode: msecs: 109 || ms/i: 0.218 || i/s: 4587.16
TDecode: msecs: 281 || ms/i: 0.562 || i/s: 1779.36
LinDiffINT: msecs: 422 || ms/i: 0.422 || i/s: 2369.67
LinDiffINT16: msecs: 1015 || ms/i: 1.015 || i/s: 985.222
LinDiffFP16: msecs: 1016 || ms/i: 1.016 || i/s: 984.252
LinDiffFP32: msecs: 3719 || ms/i: 3.719 || i/s: 268.889
PMTEncoded: msecs: 1766 || ms/i: 3.532 || i/s: 283.126
PMStandard: msecs: 5968 || ms/i: 11.936 || i/s: 83.7802
PMBuffered: msecs: 1000 || ms/i: 4 || i/s: 250

Testing 1024x1024 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 891 || ms/i: 0.891 || i/s: 1122.33
SimpleSmooth: msecs: 1656 || ms/i: 1.656 || i/s: 603.865
TexNoise: msecs: 1141 || ms/i: 1.141 || i/s: 876.424
3x3Conv: msecs: 1750 || ms/i: 3.5 || i/s: 285.714
TEncode: msecs: 375 || ms/i: 0.75 || i/s: 1333.33
TDecode: msecs: 1094 || ms/i: 2.188 || i/s: 457.038
LinDiffINT: msecs: 1687 || ms/i: 1.687 || i/s: 592.768
LinDiffINT16: msecs: 3968 || ms/i: 3.968 || i/s: 252.016
LinDiffFP16: msecs: 4187 || ms/i: 4.187 || i/s: 238.834
LinDiffFP32: msecs: 15563 || ms/i: 15.563 || i/s: 64.255
PMTEncoded: msecs: 6891 || ms/i: 13.782 || i/s: 72.5584
PMStandard: msecs: 23719 || ms/i: 47.438 || i/s: 21.0801
 
Hmm, would this critter be excersizing any of the famous "broken" part of NV40s video engine?
 
geo said:
Hmm, would this critter be excersizing any of the famous "broken" part of NV40s video engine?

Well, if you look at the tests they mostly confirm what we already know: nv40 is very fast with 8/16 bit color components, and still quite fast for 32 bit. You shouldn't count the one odd test, and except for that the performance is great.

Also, the T-encoded PM test (compared to standard pm) shows that nv4x is a bit better at swizzling than the Rxxx.
Another observation is that buffering actually offers a greater performance advantage on NV cards, which could mean that either context switches are slower with nv drivers, or that subtexture copying is more optimized.
 
PeterT said:
geo said:
Hmm, would this critter be excersizing any of the famous "broken" part of NV40s video engine?

Well, if you look at the tests they mostly confirm what we already know: nv40 is very fast with 8/16 bit color components, and still quite fast for 32 bit. You shouldn't count the one odd test, and except for that the performance is great.

Also, the T-encoded PM test (compared to standard pm) shows that nv4x is a bit better at swizzling than the Rxxx.
Another observation is that buffering actually offers a greater performance advantage on NV cards, which could mean that either context switches are slower with nv drivers, or that subtexture copying is more optimized.

Ah well, another "usual suspect" turns out to be entirely uninvolved. :LOL: Edited my results to include the fact that it is an AGP card and system RAM is 1024MB.
 
P4 2Ghz, Radeon 9600XT AIW, Win2k, Catalyst 5.2
Code:
Testing 32x32 image:
BufferCreateINT: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateINT16: msecs: 735 || ms/i: 122.5 || i/s: 8.16327
BufferCreateFP16: msecs: 734 || ms/i: 122.333 || i/s: 8.17439
BufferCreateFP32: msecs: 734 || ms/i: 122.333 || i/s: 8.17439
JustCopy: msecs: 844 || ms/i: 0.422 || i/s: 2369.67
SimpleSmooth: msecs: 890 || ms/i: 0.445 || i/s: 2247.19
TexNoise: msecs: 938 || ms/i: 0.469 || i/s: 2132.2
3x3Conv: msecs: 500 || ms/i: 0.5 || i/s: 2000
TEncode: msecs: 437 || ms/i: 0.437 || i/s: 2288.33
TDecode: msecs: 828 || ms/i: 0.828 || i/s: 1207.73
LinDiffINT: msecs: 937 || ms/i: 0.4685 || i/s: 2134.47
LinDiffINT16: msecs: 953 || ms/i: 0.4765 || i/s: 2098.64
LinDiffFP16: msecs: 968 || ms/i: 0.484 || i/s: 2066.12
LinDiffFP32: msecs: 1000 || ms/i: 0.5 || i/s: 2000
PMTEncoded: msecs: 1563 || ms/i: 1.563 || i/s: 639.795
PMStandard: msecs: 1532 || ms/i: 1.532 || i/s: 652.742
PMBuffered: msecs: 172 || ms/i: 0.344 || i/s: 2906.98

Testing 64x64 image:
BufferCreateINT: msecs: 766 || ms/i: 127.667 || i/s: 7.8329
BufferCreateINT16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP16: msecs: 765 || ms/i: 127.5 || i/s: 7.84314
BufferCreateFP32: msecs: 750 || ms/i: 125 || i/s: 8
JustCopy: msecs: 890 || ms/i: 0.445 || i/s: 2247.19
SimpleSmooth: msecs: 907 || ms/i: 0.4535 || i/s: 2205.07
TexNoise: msecs: 937 || ms/i: 0.4685 || i/s: 2134.47
3x3Conv: msecs: 469 || ms/i: 0.469 || i/s: 2132.2
TEncode: msecs: 453 || ms/i: 0.453 || i/s: 2207.51
TDecode: msecs: 828 || ms/i: 0.828 || i/s: 1207.73
LinDiffINT: msecs: 985 || ms/i: 0.4925 || i/s: 2030.46
LinDiffINT16: msecs: 968 || ms/i: 0.484 || i/s: 2066.12
LinDiffFP16: msecs: 985 || ms/i: 0.4925 || i/s: 2030.46
LinDiffFP32: msecs: 954 || ms/i: 0.477 || i/s: 2096.44
PMTEncoded: msecs: 1562 || ms/i: 1.562 || i/s: 640.205
PMStandard: msecs: 1515 || ms/i: 1.515 || i/s: 660.066
PMBuffered: msecs: 172 || ms/i: 0.344 || i/s: 2906.98

Testing 128x128 image:
BufferCreateINT: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateINT16: msecs: 765 || ms/i: 127.5 || i/s: 7.84314
BufferCreateFP16: msecs: 766 || ms/i: 127.667 || i/s: 7.8329
BufferCreateFP32: msecs: 765 || ms/i: 127.5 || i/s: 7.84314
JustCopy: msecs: 906 || ms/i: 0.453 || i/s: 2207.51
SimpleSmooth: msecs: 906 || ms/i: 0.453 || i/s: 2207.51
TexNoise: msecs: 938 || ms/i: 0.469 || i/s: 2132.2
3x3Conv: msecs: 484 || ms/i: 0.484 || i/s: 2066.12
TEncode: msecs: 438 || ms/i: 0.438 || i/s: 2283.11
TDecode: msecs: 828 || ms/i: 0.828 || i/s: 1207.73
LinDiffINT: msecs: 969 || ms/i: 0.4845 || i/s: 2063.98
LinDiffINT16: msecs: 984 || ms/i: 0.492 || i/s: 2032.52
LinDiffFP16: msecs: 1031 || ms/i: 0.5155 || i/s: 1939.86
LinDiffFP32: msecs: 984 || ms/i: 0.492 || i/s: 2032.52
PMTEncoded: msecs: 1563 || ms/i: 1.563 || i/s: 639.795
PMStandard: msecs: 1515 || ms/i: 1.515 || i/s: 660.066
PMBuffered: msecs: 406 || ms/i: 0.812 || i/s: 1231.53

Testing 256x256 image:
BufferCreateINT: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateINT16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP32: msecs: 766 || ms/i: 127.667 || i/s: 7.8329
JustCopy: msecs: 922 || ms/i: 0.461 || i/s: 2169.2
SimpleSmooth: msecs: 922 || ms/i: 0.461 || i/s: 2169.2
TexNoise: msecs: 922 || ms/i: 0.461 || i/s: 2169.2
3x3Conv: msecs: 1469 || ms/i: 1.469 || i/s: 680.735
TEncode: msecs: 437 || ms/i: 0.437 || i/s: 2288.33
TDecode: msecs: 844 || ms/i: 0.844 || i/s: 1184.83
LinDiffINT: msecs: 1109 || ms/i: 0.5545 || i/s: 1803.43
LinDiffINT16: msecs: 1094 || ms/i: 0.547 || i/s: 1828.15
LinDiffFP16: msecs: 1110 || ms/i: 0.555 || i/s: 1801.8
LinDiffFP32: msecs: 1313 || ms/i: 0.6565 || i/s: 1523.23
PMTEncoded: msecs: 1594 || ms/i: 1.594 || i/s: 627.353
PMStandard: msecs: 2172 || ms/i: 2.172 || i/s: 460.405
PMBuffered: msecs: 1531 || ms/i: 3.062 || i/s: 326.584

Testing 512x512 image:
BufferCreateINT: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateINT16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP32: msecs: 766 || ms/i: 127.667 || i/s: 7.8329
JustCopy: msecs: 797 || ms/i: 0.797 || i/s: 1254.71
SimpleSmooth: msecs: 1516 || ms/i: 1.516 || i/s: 659.631
TexNoise: msecs: 1172 || ms/i: 1.172 || i/s: 853.242
3x3Conv: msecs: 2859 || ms/i: 5.718 || i/s: 174.886
TEncode: msecs: 219 || ms/i: 0.438 || i/s: 2283.11
TDecode: msecs: 1156 || ms/i: 2.312 || i/s: 432.526
LinDiffINT: msecs: 2109 || ms/i: 2.109 || i/s: 474.158
LinDiffINT16: msecs: 2109 || ms/i: 2.109 || i/s: 474.158
LinDiffFP16: msecs: 2094 || ms/i: 2.094 || i/s: 477.555
LinDiffFP32: msecs: 2516 || ms/i: 2.516 || i/s: 397.456
PMTEncoded: msecs: 1734 || ms/i: 3.468 || i/s: 288.351
PMStandard: msecs: 4203 || ms/i: 8.406 || i/s: 118.963
PMBuffered: msecs: 2890 || ms/i: 11.56 || i/s: 86.5052

Testing 1024x1024 image:
BufferCreateINT: msecs: 765 || ms/i: 127.5 || i/s: 7.84314
BufferCreateINT16: msecs: 766 || ms/i: 127.667 || i/s: 7.8329
BufferCreateFP16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP32: msecs: 766 || ms/i: 127.667 || i/s: 7.8329
JustCopy: msecs: 3141 || ms/i: 3.141 || i/s: 318.37
SimpleSmooth: msecs: 5953 || ms/i: 5.953 || i/s: 167.983
TexNoise: msecs: 3281 || ms/i: 3.281 || i/s: 304.785
3x3Conv: msecs: 11390 || ms/i: 22.78 || i/s: 43.8982
TEncode: msecs: 610 || ms/i: 1.22 || i/s: 819.672
TDecode: msecs: 4594 || ms/i: 9.188 || i/s: 108.838
LinDiffINT: msecs: 8344 || ms/i: 8.344 || i/s: 119.847
LinDiffINT16: msecs: 8344 || ms/i: 8.344 || i/s: 119.847
LinDiffFP16: msecs: 8344 || ms/i: 8.344 || i/s: 119.847
LinDiffFP32: msecs: 10000 || ms/i: 10 || i/s: 100
PMTEncoded: msecs: 6734 || ms/i: 13.468 || i/s: 74.2501
PMStandard: msecs: 16609 || ms/i: 33.218 || i/s: 30.1042
PMBuffered: msecs: 12547 || ms/i: 50.188 || i/s: 19.9251
 
Pentium-M 1.5GHz, Mobility Radeon 9600 Pro Turbo 128 MiB (337/245), Omega based on Catalyst 5.1

Code:
GL filter framework 1.3 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...



Testing 32x32 image:
BufferCreateINT: msecs: 801 || ms/i: 133.5 || i/s: 7.49064
BufferCreateINT16: msecs: 741 || ms/i: 123.5 || i/s: 8.09717
BufferCreateFP16: msecs: 741 || ms/i: 123.5 || i/s: 8.09717
BufferCreateFP32: msecs: 781 || ms/i: 130.167 || i/s: 7.68246
JustCopy: msecs: 411 || ms/i: 0.2055 || i/s: 4866.18
SimpleSmooth: msecs: 431 || ms/i: 0.2155 || i/s: 4640.37
TexNoise: msecs: 470 || ms/i: 0.235 || i/s: 4255.32
3x3Conv: msecs: 281 || ms/i: 0.281 || i/s: 3558.72
TEncode: msecs: 220 || ms/i: 0.22 || i/s: 4545.45
TDecode: msecs: 340 || ms/i: 0.34 || i/s: 2941.18
LinDiffINT: msecs: 440 || ms/i: 0.22 || i/s: 4545.45
LinDiffINT16: msecs: 440 || ms/i: 0.22 || i/s: 4545.45
LinDiffFP16: msecs: 441 || ms/i: 0.2205 || i/s: 4535.15
LinDiffFP32: msecs: 451 || ms/i: 0.2255 || i/s: 4434.59
PMTEncoded: msecs: 731 || ms/i: 0.731 || i/s: 1367.99
PMStandard: msecs: 681 || ms/i: 0.681 || i/s: 1468.43
PMBuffered: msecs: 120 || ms/i: 0.24 || i/s: 4166.67

Testing 64x64 image:
BufferCreateINT: msecs: 791 || ms/i: 131.833 || i/s: 7.58534
BufferCreateINT16: msecs: 761 || ms/i: 126.833 || i/s: 7.88436
BufferCreateFP16: msecs: 751 || ms/i: 125.167 || i/s: 7.98935
BufferCreateFP32: msecs: 842 || ms/i: 140.333 || i/s: 7.12589
JustCopy: msecs: 420 || ms/i: 0.21 || i/s: 4761.9
SimpleSmooth: msecs: 461 || ms/i: 0.2305 || i/s: 4338.39
TexNoise: msecs: 451 || ms/i: 0.2255 || i/s: 4434.59
3x3Conv: msecs: 230 || ms/i: 0.23 || i/s: 4347.83
TEncode: msecs: 220 || ms/i: 0.22 || i/s: 4545.45
TDecode: msecs: 341 || ms/i: 0.341 || i/s: 2932.55
LinDiffINT: msecs: 471 || ms/i: 0.2355 || i/s: 4246.28
LinDiffINT16: msecs: 471 || ms/i: 0.2355 || i/s: 4246.28
LinDiffFP16: msecs: 450 || ms/i: 0.225 || i/s: 4444.44
LinDiffFP32: msecs: 450 || ms/i: 0.225 || i/s: 4444.44
PMTEncoded: msecs: 711 || ms/i: 0.711 || i/s: 1406.47
PMStandard: msecs: 701 || ms/i: 0.701 || i/s: 1426.53
PMBuffered: msecs: 170 || ms/i: 0.34 || i/s: 2941.18

Testing 128x128 image:
BufferCreateINT: msecs: 791 || ms/i: 131.833 || i/s: 7.58534
BufferCreateINT16: msecs: 771 || ms/i: 128.5 || i/s: 7.7821
BufferCreateFP16: msecs: 761 || ms/i: 126.833 || i/s: 7.88436
BufferCreateFP32: msecs: 771 || ms/i: 128.5 || i/s: 7.7821
JustCopy: msecs: 431 || ms/i: 0.2155 || i/s: 4640.37
SimpleSmooth: msecs: 450 || ms/i: 0.225 || i/s: 4444.44
TexNoise: msecs: 471 || ms/i: 0.2355 || i/s: 4246.28
3x3Conv: msecs: 601 || ms/i: 0.601 || i/s: 1663.89
TEncode: msecs: 210 || ms/i: 0.21 || i/s: 4761.9
TDecode: msecs: 341 || ms/i: 0.341 || i/s: 2932.55
LinDiffINT: msecs: 481 || ms/i: 0.2405 || i/s: 4158
LinDiffINT16: msecs: 501 || ms/i: 0.2505 || i/s: 3992.02
LinDiffFP16: msecs: 501 || ms/i: 0.2505 || i/s: 3992.02
LinDiffFP32: msecs: 551 || ms/i: 0.2755 || i/s: 3629.76
PMTEncoded: msecs: 721 || ms/i: 0.721 || i/s: 1386.96
PMStandard: msecs: 832 || ms/i: 0.832 || i/s: 1201.92
PMBuffered: msecs: 581 || ms/i: 1.162 || i/s: 860.585

Testing 256x256 image:
BufferCreateINT: msecs: 771 || ms/i: 128.5 || i/s: 7.7821
BufferCreateINT16: msecs: 761 || ms/i: 126.833 || i/s: 7.88436
BufferCreateFP16: msecs: 771 || ms/i: 128.5 || i/s: 7.7821
BufferCreateFP32: msecs: 761 || ms/i: 126.833 || i/s: 7.88436
JustCopy: msecs: 571 || ms/i: 0.2855 || i/s: 3502.63
SimpleSmooth: msecs: 1212 || ms/i: 0.606 || i/s: 1650.17
TexNoise: msecs: 841 || ms/i: 0.4205 || i/s: 2378.12
3x3Conv: msecs: 2303 || ms/i: 2.303 || i/s: 434.216
TEncode: msecs: 210 || ms/i: 0.21 || i/s: 4761.9
TDecode: msecs: 941 || ms/i: 0.941 || i/s: 1062.7
LinDiffINT: msecs: 1703 || ms/i: 0.8515 || i/s: 1174.4
LinDiffINT16: msecs: 1703 || ms/i: 0.8515 || i/s: 1174.4
LinDiffFP16: msecs: 1702 || ms/i: 0.851 || i/s: 1175.09
LinDiffFP32: msecs: 2002 || ms/i: 1.001 || i/s: 999.001
PMTEncoded: msecs: 1432 || ms/i: 1.432 || i/s: 698.324
PMStandard: msecs: 3024 || ms/i: 3.024 || i/s: 330.688
PMBuffered: msecs: 2203 || ms/i: 4.406 || i/s: 226.963

Testing 512x512 image:
BufferCreateINT: msecs: 791 || ms/i: 131.833 || i/s: 7.58534
BufferCreateINT16: msecs: 802 || ms/i: 133.667 || i/s: 7.4813
BufferCreateFP16: msecs: 761 || ms/i: 126.833 || i/s: 7.88436
BufferCreateFP32: msecs: 781 || ms/i: 130.167 || i/s: 7.68246
JustCopy: msecs: 1071 || ms/i: 1.071 || i/s: 933.707
SimpleSmooth: msecs: 2344 || ms/i: 2.344 || i/s: 426.621
TexNoise: msecs: 1462 || ms/i: 1.462 || i/s: 683.995
3x3Conv: msecs: 4486 || ms/i: 8.972 || i/s: 111.458
TEncode: msecs: 250 || ms/i: 0.5 || i/s: 2000
TDecode: msecs: 1813 || ms/i: 3.626 || i/s: 275.786
LinDiffINT: msecs: 3305 || ms/i: 3.305 || i/s: 302.572
LinDiffINT16: msecs: 3305 || ms/i: 3.305 || i/s: 302.572
LinDiffFP16: msecs: 3305 || ms/i: 3.305 || i/s: 302.572
LinDiffFP32: msecs: 3895 || ms/i: 3.895 || i/s: 256.739
PMTEncoded: msecs: 2694 || ms/i: 5.388 || i/s: 185.598
PMStandard: msecs: 5868 || ms/i: 11.736 || i/s: 85.2079
PMBuffered: msecs: 4176 || ms/i: 16.704 || i/s: 59.8659

Testing 1024x1024 image:
BufferCreateINT: msecs: 781 || ms/i: 130.167 || i/s: 7.68246
BufferCreateINT16: msecs: 772 || ms/i: 128.667 || i/s: 7.77202
BufferCreateFP16: msecs: 761 || ms/i: 126.833 || i/s: 7.88436
BufferCreateFP32: msecs: 741 || ms/i: 123.5 || i/s: 8.09717
JustCopy: msecs: 4246 || ms/i: 4.246 || i/s: 235.516
SimpleSmooth: msecs: 9323 || ms/i: 9.323 || i/s: 107.262
TexNoise: msecs: 4497 || ms/i: 4.497 || i/s: 222.37
3x3Conv: msecs: 17845 || ms/i: 35.69 || i/s: 28.0191
TEncode: msecs: 951 || ms/i: 1.902 || i/s: 525.762
TDecode: msecs: 7210 || ms/i: 14.42 || i/s: 69.3481
LinDiffINT: msecs: 13099 || ms/i: 13.099 || i/s: 76.3417
LinDiffINT16: msecs: 13099 || ms/i: 13.099 || i/s: 76.3417
LinDiffFP16: msecs: 13099 || ms/i: 13.099 || i/s: 76.3417
LinDiffFP32: msecs: 15472 || ms/i: 15.472 || i/s: 64.6329
PMTEncoded: msecs: 10516 || ms/i: 21.032 || i/s: 47.5466
PMStandard: msecs: 23323 || ms/i: 46.646 || i/s: 21.4381
PMBuffered: msecs: 18016 || ms/i: 72.064 || i/s: 13.8766
 
athlon xp 3000, geforce 6600gt

Code:
GL filter framework 1.3 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...



Testing 32x32 image:
BufferCreateINT: msecs: 125 || ms/i: 20.8333 || i/s: 48
No suitable INT format found. Trying FP... (Flaky 6x00 workaround)

BufferCreateINT16: msecs: 187 || ms/i: 31.1667 || i/s: 32.0856
BufferCreateFP16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP32: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
JustCopy: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
SimpleSmooth: msecs: 297 || ms/i: 0.1485 || i/s: 6734.01
TexNoise: msecs: 328 || ms/i: 0.164 || i/s: 6097.56
3x3Conv: msecs: 219 || ms/i: 0.219 || i/s: 4566.21
TEncode: msecs: 204 || ms/i: 0.204 || i/s: 4901.96
TDecode: msecs: 218 || ms/i: 0.218 || i/s: 4587.16
LinDiffINT: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
LinDiffINT16: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
LinDiffFP16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP32: msecs: 265 || ms/i: 0.1325 || i/s: 7547.17
PMTEncoded: msecs: 500 || ms/i: 0.5 || i/s: 2000
PMStandard: msecs: 422 || ms/i: 0.422 || i/s: 2369.67
PMBuffered: msecs: 46 || ms/i: 0.092 || i/s: 10869.6

Testing 64x64 image:
BufferCreateINT: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateINT16: msecs: 156 || ms/i: 26 || i/s: 38.4615
BufferCreateFP16: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
BufferCreateFP32: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
JustCopy: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
SimpleSmooth: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
TexNoise: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
3x3Conv: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
TEncode: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
TDecode: msecs: 110 || ms/i: 0.11 || i/s: 9090.91
LinDiffINT: msecs: 297 || ms/i: 0.1485 || i/s: 6734.01
LinDiffINT16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP32: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
PMTEncoded: msecs: 454 || ms/i: 0.454 || i/s: 2202.64
PMStandard: msecs: 422 || ms/i: 0.422 || i/s: 2369.67
PMBuffered: msecs: 47 || ms/i: 0.094 || i/s: 10638.3

Testing 128x128 image:
BufferCreateINT: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateINT16: msecs: 203 || ms/i: 33.8333 || i/s: 29.5567
BufferCreateFP16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP32: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
JustCopy: msecs: 235 || ms/i: 0.1175 || i/s: 8510.64
SimpleSmooth: msecs: 218 || ms/i: 0.109 || i/s: 9174.31
TexNoise: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
3x3Conv: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
TEncode: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
TDecode: msecs: 125 || ms/i: 0.125 || i/s: 8000
LinDiffINT: msecs: 250 || ms/i: 0.125 || i/s: 8000
LinDiffINT16: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
LinDiffFP16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP32: msecs: 750 || ms/i: 0.375 || i/s: 2666.67
PMTEncoded: msecs: 485 || ms/i: 0.485 || i/s: 2061.86
PMStandard: msecs: 1062 || ms/i: 1.062 || i/s: 941.62
PMBuffered: msecs: 172 || ms/i: 0.344 || i/s: 2906.98

Testing 256x256 image:
BufferCreateINT: msecs: 93 || ms/i: 15.5 || i/s: 64.5161
BufferCreateINT16: msecs: 141 || ms/i: 23.5 || i/s: 42.5532
BufferCreateFP16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP32: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
JustCopy: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
SimpleSmooth: msecs: 328 || ms/i: 0.164 || i/s: 6097.56
TexNoise: msecs: 390 || ms/i: 0.195 || i/s: 5128.21
3x3Conv: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
TEncode: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
TDecode: msecs: 234 || ms/i: 0.234 || i/s: 4273.5
LinDiffINT: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffINT16: msecs: 813 || ms/i: 0.4065 || i/s: 2460.02
LinDiffFP16: msecs: 812 || ms/i: 0.406 || i/s: 2463.05
LinDiffFP32: msecs: 2797 || ms/i: 1.3985 || i/s: 715.052
PMTEncoded: msecs: 1266 || ms/i: 1.266 || i/s: 789.889
PMStandard: msecs: 4203 || ms/i: 4.203 || i/s: 237.925
PMBuffered: msecs: 1000 || ms/i: 2 || i/s: 500

Testing 512x512 image:
BufferCreateINT: msecs: 125 || ms/i: 20.8333 || i/s: 48
BufferCreateINT16: msecs: 140 || ms/i: 23.3333 || i/s: 42.8571
BufferCreateFP16: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
BufferCreateFP32: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
JustCopy: msecs: 453 || ms/i: 0.453 || i/s: 2207.51
SimpleSmooth: msecs: 578 || ms/i: 0.578 || i/s: 1730.1
TexNoise: msecs: 578 || ms/i: 0.578 || i/s: 1730.1
3x3Conv: msecs: 610 || ms/i: 1.22 || i/s: 819.672
TEncode: msecs: 141 || ms/i: 0.282 || i/s: 3546.1
TDecode: msecs: 422 || ms/i: 0.844 || i/s: 1184.83
LinDiffINT: msecs: 469 || ms/i: 0.469 || i/s: 2132.2
LinDiffINT16: msecs: 1344 || ms/i: 1.344 || i/s: 744.048
LinDiffFP16: msecs: 1344 || ms/i: 1.344 || i/s: 744.048
LinDiffFP32: msecs: 5110 || ms/i: 5.11 || i/s: 195.695
PMTEncoded: msecs: 2297 || ms/i: 4.594 || i/s: 217.675
PMStandard: msecs: 7719 || ms/i: 15.438 || i/s: 64.7752
PMBuffered: msecs: 1031 || ms/i: 4.124 || i/s: 242.483

Testing 1024x1024 image:
BufferCreateINT: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateINT16: msecs: 156 || ms/i: 26 || i/s: 38.4615
BufferCreateFP16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP32: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
JustCopy: msecs: 1781 || ms/i: 1.781 || i/s: 561.482
SimpleSmooth: msecs: 2250 || ms/i: 2.25 || i/s: 444.444
TexNoise: msecs: 2078 || ms/i: 2.078 || i/s: 481.232
3x3Conv: msecs: 2359 || ms/i: 4.718 || i/s: 211.954
TEncode: msecs: 500 || ms/i: 1 || i/s: 1000
TDecode: msecs: 1656 || ms/i: 3.312 || i/s: 301.932
LinDiffINT: msecs: 1797 || ms/i: 1.797 || i/s: 556.483
LinDiffINT16: msecs: 5172 || ms/i: 5.172 || i/s: 193.349
LinDiffFP16: msecs: 5172 || ms/i: 5.172 || i/s: 193.349
LinDiffFP32: msecs: 20235 || ms/i: 20.235 || i/s: 49.4193
PMTEncoded: msecs: 9032 || ms/i: 18.064 || i/s: 55.3587
PMStandard: msecs: 30563 || ms/i: 61.126 || i/s: 16.3597
 
Athlon XP @ 2,4 GHz, Abit NF7-S, 512 MB, X800 XT @ 515/570 MHz
Code:
GL filter framework 1.3 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...



Testing 32x32 image:
BufferCreateINT: msecs: 813 || ms/i: 135.5 || i/s: 7.38007
BufferCreateINT16: msecs: 672 || ms/i: 112 || i/s: 8.92857
BufferCreateFP16: msecs: 796 || ms/i: 132.667 || i/s: 7.53769
BufferCreateFP32: msecs: 782 || ms/i: 130.333 || i/s: 7.67263
JustCopy: msecs: 453 || ms/i: 0.2265 || i/s: 4415.01
SimpleSmooth: msecs: 469 || ms/i: 0.2345 || i/s: 4264.39
TexNoise: msecs: 500 || ms/i: 0.25 || i/s: 4000
3x3Conv: msecs: 281 || ms/i: 0.281 || i/s: 3558.72
TEncode: msecs: 234 || ms/i: 0.234 || i/s: 4273.5
TDecode: msecs: 359 || ms/i: 0.359 || i/s: 2785.52
LinDiffINT: msecs: 484 || ms/i: 0.242 || i/s: 4132.23
LinDiffINT16: msecs: 531 || ms/i: 0.2655 || i/s: 3766.48
LinDiffFP16: msecs: 531 || ms/i: 0.2655 || i/s: 3766.48
LinDiffFP32: msecs: 515 || ms/i: 0.2575 || i/s: 3883.5
PMTEncoded: msecs: 781 || ms/i: 0.781 || i/s: 1280.41
PMStandard: msecs: 828 || ms/i: 0.828 || i/s: 1207.73
PMBuffered: msecs: 125 || ms/i: 0.25 || i/s: 4000

Testing 64x64 image:
BufferCreateINT: msecs: 781 || ms/i: 130.167 || i/s: 7.68246
BufferCreateINT16: msecs: 797 || ms/i: 132.833 || i/s: 7.52823
BufferCreateFP16: msecs: 687 || ms/i: 114.5 || i/s: 8.73362
BufferCreateFP32: msecs: 782 || ms/i: 130.333 || i/s: 7.67263
JustCopy: msecs: 500 || ms/i: 0.25 || i/s: 4000
SimpleSmooth: msecs: 515 || ms/i: 0.2575 || i/s: 3883.5
TexNoise: msecs: 547 || ms/i: 0.2735 || i/s: 3656.31
3x3Conv: msecs: 281 || ms/i: 0.281 || i/s: 3558.72
TEncode: msecs: 250 || ms/i: 0.25 || i/s: 4000
TDecode: msecs: 390 || ms/i: 0.39 || i/s: 2564.1
LinDiffINT: msecs: 531 || ms/i: 0.2655 || i/s: 3766.48
LinDiffINT16: msecs: 547 || ms/i: 0.2735 || i/s: 3656.31
LinDiffFP16: msecs: 531 || ms/i: 0.2655 || i/s: 3766.48
LinDiffFP32: msecs: 484 || ms/i: 0.242 || i/s: 4132.23
PMTEncoded: msecs: 766 || ms/i: 0.766 || i/s: 1305.48
PMStandard: msecs: 765 || ms/i: 0.765 || i/s: 1307.19
PMBuffered: msecs: 109 || ms/i: 0.218 || i/s: 4587.16

Testing 128x128 image:
BufferCreateINT: msecs: 797 || ms/i: 132.833 || i/s: 7.52823
BufferCreateINT16: msecs: 703 || ms/i: 117.167 || i/s: 8.53485
BufferCreateFP16: msecs: 672 || ms/i: 112 || i/s: 8.92857
BufferCreateFP32: msecs: 797 || ms/i: 132.833 || i/s: 7.52823
JustCopy: msecs: 469 || ms/i: 0.2345 || i/s: 4264.39
SimpleSmooth: msecs: 484 || ms/i: 0.242 || i/s: 4132.23
TexNoise: msecs: 484 || ms/i: 0.242 || i/s: 4132.23
3x3Conv: msecs: 266 || ms/i: 0.266 || i/s: 3759.4
TEncode: msecs: 234 || ms/i: 0.234 || i/s: 4273.5
TDecode: msecs: 359 || ms/i: 0.359 || i/s: 2785.52
LinDiffINT: msecs: 547 || ms/i: 0.2735 || i/s: 3656.31
LinDiffINT16: msecs: 485 || ms/i: 0.2425 || i/s: 4123.71
LinDiffFP16: msecs: 547 || ms/i: 0.2735 || i/s: 3656.31
LinDiffFP32: msecs: 500 || ms/i: 0.25 || i/s: 4000
PMTEncoded: msecs: 781 || ms/i: 0.781 || i/s: 1280.41
PMStandard: msecs: 828 || ms/i: 0.828 || i/s: 1207.73
PMBuffered: msecs: 125 || ms/i: 0.25 || i/s: 4000

Testing 256x256 image:
BufferCreateINT: msecs: 688 || ms/i: 114.667 || i/s: 8.72093
BufferCreateINT16: msecs: 796 || ms/i: 132.667 || i/s: 7.53769
BufferCreateFP16: msecs: 782 || ms/i: 130.333 || i/s: 7.67263
BufferCreateFP32: msecs: 797 || ms/i: 132.833 || i/s: 7.52823
JustCopy: msecs: 515 || ms/i: 0.2575 || i/s: 3883.5
SimpleSmooth: msecs: 532 || ms/i: 0.266 || i/s: 3759.4
TexNoise: msecs: 547 || ms/i: 0.2735 || i/s: 3656.31
3x3Conv: msecs: 312 || ms/i: 0.312 || i/s: 3205.13
TEncode: msecs: 250 || ms/i: 0.25 || i/s: 4000
TDecode: msecs: 407 || ms/i: 0.407 || i/s: 2457
LinDiffINT: msecs: 563 || ms/i: 0.2815 || i/s: 3552.4
LinDiffINT16: msecs: 500 || ms/i: 0.25 || i/s: 4000
LinDiffFP16: msecs: 500 || ms/i: 0.25 || i/s: 4000
LinDiffFP32: msecs: 500 || ms/i: 0.25 || i/s: 4000
PMTEncoded: msecs: 781 || ms/i: 0.781 || i/s: 1280.41
PMStandard: msecs: 797 || ms/i: 0.797 || i/s: 1254.71
PMBuffered: msecs: 312 || ms/i: 0.624 || i/s: 1602.56

Testing 512x512 image:
BufferCreateINT: msecs: 797 || ms/i: 132.833 || i/s: 7.52823
BufferCreateINT16: msecs: 781 || ms/i: 130.167 || i/s: 7.68246
BufferCreateFP16: msecs: 797 || ms/i: 132.833 || i/s: 7.52823
BufferCreateFP32: msecs: 687 || ms/i: 114.5 || i/s: 8.73362
JustCopy: msecs: 265 || ms/i: 0.265 || i/s: 3773.58
SimpleSmooth: msecs: 297 || ms/i: 0.297 || i/s: 3367
TexNoise: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
3x3Conv: msecs: 484 || ms/i: 0.968 || i/s: 1033.06
TEncode: msecs: 109 || ms/i: 0.218 || i/s: 4587.16
TDecode: msecs: 187 || ms/i: 0.374 || i/s: 2673.8
LinDiffINT: msecs: 422 || ms/i: 0.422 || i/s: 2369.67
LinDiffINT16: msecs: 422 || ms/i: 0.422 || i/s: 2369.67
LinDiffFP16: msecs: 406 || ms/i: 0.406 || i/s: 2463.05
LinDiffFP32: msecs: 578 || ms/i: 0.578 || i/s: 1730.1
PMTEncoded: msecs: 407 || ms/i: 0.814 || i/s: 1228.5
PMStandard: msecs: 1031 || ms/i: 2.062 || i/s: 484.966
PMBuffered: msecs: 266 || ms/i: 1.064 || i/s: 939.85

Testing 1024x1024 image:
BufferCreateINT: msecs: 688 || ms/i: 114.667 || i/s: 8.72093
BufferCreateINT16: msecs: 781 || ms/i: 130.167 || i/s: 7.68246
BufferCreateFP16: msecs: 797 || ms/i: 132.833 || i/s: 7.52823
BufferCreateFP32: msecs: 687 || ms/i: 114.5 || i/s: 8.73362
JustCopy: msecs: 704 || ms/i: 0.704 || i/s: 1420.45
SimpleSmooth: msecs: 1390 || ms/i: 1.39 || i/s: 719.424
TexNoise: msecs: 1172 || ms/i: 1.172 || i/s: 853.242
3x3Conv: msecs: 1844 || ms/i: 3.688 || i/s: 271.15
TEncode: msecs: 140 || ms/i: 0.28 || i/s: 3571.43
TDecode: msecs: 579 || ms/i: 1.158 || i/s: 863.558
LinDiffINT: msecs: 1578 || ms/i: 1.578 || i/s: 633.714
LinDiffINT16: msecs: 1609 || ms/i: 1.609 || i/s: 621.504
LinDiffFP16: msecs: 1609 || ms/i: 1.609 || i/s: 621.504
LinDiffFP32: msecs: 2359 || ms/i: 2.359 || i/s: 423.908
PMTEncoded: msecs: 1469 || ms/i: 2.938 || i/s: 340.368
PMStandard: msecs: 4141 || ms/i: 8.282 || i/s: 120.744
PMBuffered: msecs: 2812 || ms/i: 11.248 || i/s: 88.9047
 
I'm curious why my system is almost a magnitude slower than Mordenkainen's with the BufferCreates, given that we have the same video card and drivers. Are those tests CPU-dependent? I've got a 2GHz XP with 2x512MB PC2100 (2.5-3-3-6), and he's got a 3.2GHz P4 probably paired with PC3200.

Your reference 9700/1800+ BufferCreate scores are also closer to Mord's than mine.
 
Yes, BufferCreate's are mostly CPU / driver bound, especially for small textures. However, the differences between all the scores posted so far also seem to suggest some "random" variance. Since they (BufferCreate speeds) are also nearly irrelevant for anything vaguely resembling game performance, and even most GPGPU tasks, it isn't really neccessary to explore that further.

I mostly added these tests because my PC and laptop differed so much on that singular aspect. (Without any plausible explanation in gpu/cpu/ram speed)
 
This one should be a nice mix up:

2.0GHz Pentium M 760, 6200 TC (unknown local RAM, presumably 16MB) [edit] the 6200 TC is 300/300MHz with 32MB local RAM

Code:
Testing 32x32 image:
BufferCreateINT: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
No suitable INT format found. Trying FP... (Flaky 6x00 workaround)

BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
SimpleSmooth: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
TexNoise: msecs: 235 || ms/i: 0.1175 || i/s: 8510.64
3x3Conv: msecs: 265 || ms/i: 0.265 || i/s: 3773.58
TEncode: msecs: 140 || ms/i: 0.14 || i/s: 7142.86
TDecode: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
LinDiffINT: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
LinDiffINT16: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
LinDiffFP16: msecs: 204 || ms/i: 0.102 || i/s: 9803.92
LinDiffFP32: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
PMTEncoded: msecs: 422 || ms/i: 0.422 || i/s: 2369.67
PMStandard: msecs: 312 || ms/i: 0.312 || i/s: 3205.13
PMBuffered: msecs: 63 || ms/i: 0.126 || i/s: 7936.51

Testing 64x64 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
SimpleSmooth: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
TexNoise: msecs: 250 || ms/i: 0.125 || i/s: 8000
3x3Conv: msecs: 125 || ms/i: 0.125 || i/s: 8000
TEncode: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
TDecode: msecs: 93 || ms/i: 0.093 || i/s: 10752.7
LinDiffINT: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
LinDiffINT16: msecs: 250 || ms/i: 0.125 || i/s: 8000
LinDiffFP16: msecs: 250 || ms/i: 0.125 || i/s: 8000
LinDiffFP32: msecs: 469 || ms/i: 0.2345 || i/s: 4264.39
PMTEncoded: msecs: 328 || ms/i: 0.328 || i/s: 3048.78
PMStandard: msecs: 735 || ms/i: 0.735 || i/s: 1360.54
PMBuffered: msecs: 203 || ms/i: 0.406 || i/s: 2463.05

Testing 128x128 image:
BufferCreateINT: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
JustCopy: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
SimpleSmooth: msecs: 390 || ms/i: 0.195 || i/s: 5128.21
TexNoise: msecs: 766 || ms/i: 0.383 || i/s: 2610.97
3x3Conv: msecs: 359 || ms/i: 0.359 || i/s: 2785.52
TEncode: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
TDecode: msecs: 250 || ms/i: 0.25 || i/s: 4000
LinDiffINT: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffINT16: msecs: 813 || ms/i: 0.4065 || i/s: 2460.02
LinDiffFP16: msecs: 812 || ms/i: 0.406 || i/s: 2463.05
LinDiffFP32: msecs: 1656 || ms/i: 0.828 || i/s: 1207.73
PMTEncoded: msecs: 797 || ms/i: 0.797 || i/s: 1254.71
PMStandard: msecs: 2641 || ms/i: 2.641 || i/s: 378.644
PMBuffered: msecs: 953 || ms/i: 1.906 || i/s: 524.659

Testing 256x256 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
BufferCreateFP16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
JustCopy: msecs: 906 || ms/i: 0.453 || i/s: 2207.51
SimpleSmooth: msecs: 1328 || ms/i: 0.664 || i/s: 1506.02
TexNoise: msecs: 1485 || ms/i: 0.7425 || i/s: 1346.8
3x3Conv: msecs: 1203 || ms/i: 1.203 || i/s: 831.255
TEncode: msecs: 328 || ms/i: 0.328 || i/s: 3048.78
TDecode: msecs: 875 || ms/i: 0.875 || i/s: 1142.86
LinDiffINT: msecs: 937 || ms/i: 0.4685 || i/s: 2134.47
LinDiffINT16: msecs: 3141 || ms/i: 1.5705 || i/s: 636.74
LinDiffFP16: msecs: 3141 || ms/i: 1.5705 || i/s: 636.74
LinDiffFP32: msecs: 6031 || ms/i: 3.0155 || i/s: 331.62
PMTEncoded: msecs: 2828 || ms/i: 2.828 || i/s: 353.607
PMStandard: msecs: 9813 || ms/i: 9.813 || i/s: 101.906
PMBuffered: msecs: 4703 || ms/i: 9.406 || i/s: 106.315

Testing 512x512 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
BufferCreateFP16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 1641 || ms/i: 1.641 || i/s: 609.385
SimpleSmooth: msecs: 2484 || ms/i: 2.484 || i/s: 402.576
TexNoise: msecs: 2093 || ms/i: 2.093 || i/s: 477.783
3x3Conv: msecs: 2219 || ms/i: 4.438 || i/s: 225.327
TEncode: msecs: 547 || ms/i: 1.094 || i/s: 914.077
TDecode: msecs: 1578 || ms/i: 3.156 || i/s: 316.857
LinDiffINT: msecs: 1640 || ms/i: 1.64 || i/s: 609.756
LinDiffINT16: msecs: 5688 || ms/i: 5.688 || i/s: 175.809
LinDiffFP16: msecs: 5609 || ms/i: 5.609 || i/s: 178.285
LinDiffFP32: msecs: 11235 || ms/i: 11.235 || i/s: 89.0076
PMTEncoded: msecs: 5125 || ms/i: 10.25 || i/s: 97.561
PMStandard: msecs: 18468 || ms/i: 36.936 || i/s: 27.0739
PMBuffered: msecs: 6985 || ms/i: 27.94 || i/s: 35.791

Testing 1024x1024 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
BufferCreateFP16: msecs: 79 || ms/i: 13.1667 || i/s: 75.9494
BufferCreateFP32: msecs: 93 || ms/i: 15.5 || i/s: 64.5161
JustCopy: msecs: 6703 || ms/i: 6.703 || i/s: 149.187
SimpleSmooth: msecs: 10094 || ms/i: 10.094 || i/s: 99.0688
TexNoise: msecs: 7094 || ms/i: 7.094 || i/s: 140.964
3x3Conv: msecs: 8609 || ms/i: 17.218 || i/s: 58.0788
TEncode: msecs: 2078 || ms/i: 4.156 || i/s: 240.616
TDecode: msecs: 7281 || ms/i: 14.562 || i/s: 68.6719
LinDiffINT: msecs: 6562 || ms/i: 6.562 || i/s: 152.393
LinDiffINT16: msecs: 19579 || ms/i: 19.579 || i/s: 51.0751
LinDiffFP16: msecs: 19563 || ms/i: 19.563 || i/s: 51.1169
LinDiffFP32: msecs: 123375 || ms/i: 123.375 || i/s: 8.10537
PMTEncoded: msecs: 19781 || ms/i: 39.562 || i/s: 25.2768
PMStandard: msecs: 211907 || ms/i: 423.814 || i/s: 2.35953
PMBuffered: msecs: 4547 || ms/i: 18.188 || i/s: 54.9813
 
PMTEncoded: msecs: 19781 || ms/i: 39.562 || i/s: 25.2768
PMStandard: msecs: 211907 || ms/i: 423.814 || i/s: 2.35953
PMBuffered: msecs: 4547 || ms/i: 18.188 || i/s: 54.9813
:LOL: Now that seems exceedingly strange, but is actually somehow understandable. Thanks for these results!
 
Only what 14 or so times as slow as my setup for PMStandard LOL. The PMBuffered scored quite well but.
 
Is there some other way to compare, besides opening up two browsers and comparing side by side? Im not asking for something like the ORB.. just curious as to how others are doing it.
 
That's how I compared my score to others': separate Firefox tabs for each score. Just middle-clock on the little paper icon left of "Posted: <date and time>".

Or you could use a photographic memory. I've heard those come in handy.
 
ASUS A8N-SLI
AMD64 3200+
1GB DDR400
GeForce 6600GT 128MB PCI-E
ForceWare 71.84

(Was rather slow at the end)

Code:
Testing 32x32 image:
BufferCreateINT: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298No suitable INT format found. Trying FP... (Flaky 6600 workaround)

BufferCreateINT16: msecs: 406 || ms/i: 67.6667 || i/s: 14.7783
BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateFP32: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
JustCopy: msecs: 156 || ms/i: 0.078 || i/s: 12820.5
SimpleSmooth: msecs: 234 || ms/i: 0.117 || i/s: 8547.01
TexNoise: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
3x3Conv: msecs: 234 || ms/i: 0.234 || i/s: 4273.5
TEncode: msecs: 125 || ms/i: 0.125 || i/s: 8000
TDecode: msecs: 156 || ms/i: 0.156 || i/s: 6410.26
LinDiffINT: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
LinDiffINT16: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
LinDiffFP16: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
LinDiffFP32: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
PMTEncoded: msecs: 406 || ms/i: 0.406 || i/s: 2463.05
PMStandard: msecs: 297 || ms/i: 0.297 || i/s: 3367
PMBuffered: msecs: 16 || ms/i: 0.032 || i/s: 31250

Testing 64x64 image:
BufferCreateINT: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
JustCopy: msecs: 156 || ms/i: 0.078 || i/s: 12820.5
SimpleSmooth: msecs: 156 || ms/i: 0.078 || i/s: 12820.5
TexNoise: msecs: 157 || ms/i: 0.0785 || i/s: 12738.9
3x3Conv: msecs: 93 || ms/i: 0.093 || i/s: 10752.7
TEncode: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
TDecode: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
LinDiffINT: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
LinDiffINT16: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
LinDiffFP16: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
LinDiffFP32: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
PMTEncoded: msecs: 313 || ms/i: 0.313 || i/s: 3194.89
PMStandard: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
PMBuffered: msecs: 31 || ms/i: 0.062 || i/s: 16129

Testing 128x128 image:
BufferCreateINT: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateFP32: msecs: 79 || ms/i: 13.1667 || i/s: 75.9494
JustCopy: msecs: 156 || ms/i: 0.078 || i/s: 12820.5
SimpleSmooth: msecs: 157 || ms/i: 0.0785 || i/s: 12738.9
TexNoise: msecs: 234 || ms/i: 0.117 || i/s: 8547.01
3x3Conv: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
TEncode: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
TDecode: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
LinDiffINT: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
LinDiffINT16: msecs: 250 || ms/i: 0.125 || i/s: 8000
LinDiffFP16: msecs: 250 || ms/i: 0.125 || i/s: 8000
LinDiffFP32: msecs: 828 || ms/i: 0.414 || i/s: 2415.46
PMTEncoded: msecs: 390 || ms/i: 0.39 || i/s: 2564.1
PMStandard: msecs: 1219 || ms/i: 1.219 || i/s: 820.345
PMBuffered: msecs: 172 || ms/i: 0.344 || i/s: 2906.98

Testing 256x256 image:
BufferCreateINT: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateFP32: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
JustCopy: msecs: 282 || ms/i: 0.141 || i/s: 7092.2
SimpleSmooth: msecs: 437 || ms/i: 0.2185 || i/s: 4576.66
TexNoise: msecs: 438 || ms/i: 0.219 || i/s: 4566.21
3x3Conv: msecs: 390 || ms/i: 0.39 || i/s: 2564.1
TEncode: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
TDecode: msecs: 265 || ms/i: 0.265 || i/s: 3773.58
LinDiffINT: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
LinDiffINT16: msecs: 907 || ms/i: 0.4535 || i/s: 2205.07
LinDiffFP16: msecs: 906 || ms/i: 0.453 || i/s: 2207.51
LinDiffFP32: msecs: 3250 || ms/i: 1.625 || i/s: 615.385
PMTEncoded: msecs: 1375 || ms/i: 1.375 || i/s: 727.273
PMStandard: msecs: 4828 || ms/i: 4.828 || i/s: 207.125
PMBuffered: msecs: 1187 || ms/i: 2.374 || i/s: 421.23

Testing 512x512 image:
BufferCreateINT: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateINT16: msecs: 140 || ms/i: 23.3333 || i/s: 42.8571
BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 500 || ms/i: 0.5 || i/s: 2000
SimpleSmooth: msecs: 813 || ms/i: 0.813 || i/s: 1230.01
TexNoise: msecs: 640 || ms/i: 0.64 || i/s: 1562.5
3x3Conv: msecs: 688 || ms/i: 1.376 || i/s: 726.744
TEncode: msecs: 141 || ms/i: 0.282 || i/s: 3546.1
TDecode: msecs: 484 || ms/i: 0.968 || i/s: 1033.06
LinDiffINT: msecs: 500 || ms/i: 0.5 || i/s: 2000
LinDiffINT16: msecs: 1500 || ms/i: 1.5 || i/s: 666.667
LinDiffFP16: msecs: 1516 || ms/i: 1.516 || i/s: 659.631
LinDiffFP32: msecs: 5875 || ms/i: 5.875 || i/s: 170.213
PMTEncoded: msecs: 2656 || ms/i: 5.312 || i/s: 188.253
PMStandard: msecs: 9047 || ms/i: 18.094 || i/s: 55.2669
PMBuffered: msecs: 1297 || ms/i: 5.188 || i/s: 192.753

Testing 1024x1024 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
JustCopy: msecs: 2000 || ms/i: 2 || i/s: 500
SimpleSmooth: msecs: 3125 || ms/i: 3.125 || i/s: 320
TexNoise: msecs: 2406 || ms/i: 2.406 || i/s: 415.628
3x3Conv: msecs: 2625 || ms/i: 5.25 || i/s: 190.476
TEncode: msecs: 578 || ms/i: 1.156 || i/s: 865.052
TDecode: msecs: 1844 || ms/i: 3.688 || i/s: 271.15
LinDiffINT: msecs: 2000 || ms/i: 2 || i/s: 500
LinDiffINT16: msecs: 6109 || ms/i: 6.109 || i/s: 163.693
LinDiffFP16: msecs: 5922 || ms/i: 5.922 || i/s: 168.862
LinDiffFP32: msecs: 23500 || ms/i: 23.5 || i/s: 42.5532
PMTEncoded: msecs: 10141 || ms/i: 20.282 || i/s: 49.3048
PMStandard: msecs: 35547 || ms/i: 71.094 || i/s: 14.0659
PMBuffered: msecs: 316953 || ms/i: 1267.81 || i/s: 0.78876
 
fallguy said:
Is there some other way to compare, besides opening up two browsers and comparing side by side? Im not asking for something like the ORB.. just curious as to how others are doing it.

Yeah, I didn't think that I'd get so many responses, so the results are less than ideal to compare. But fret not, ORC comes to the rescue!

orc01.png


(I've wanted to write a small C# app for some time, so this was a good opportunity)

Release: very soon.
 
Looks nice. Maybe you could somehow add a "reference value" that is proportional to 1/# of pixels, indicating how an ideal GPU would behave.
 
ORC (Offline Result Comparator) 0.1 is now available:

http://landesjugendtheater.at/misc/orc01_release.zip
(99kB - Now even more 56k friendly!)

Requires the .Net 2.0 beta framework.

How to Use:
  • Click "Load" to load results from the db file.
  • Right click on results in list to view details / edit / remove
  • Select (a) result(s), (a) test(s), and resolutions that interest you
  • Click on chart Window to draw a new chart
That's it. One more shot:
orc02.png


Have fun! Currently the db contains only results from the first page, if you add more please save & send me the results.db file.
 
Back
Top