New GLSL / Pbuffer benchmark [Update: version 1.4 / ORCv0.4]

Ostsol said:
Is 2.0 just what you have installed or is there functionality from it that the program requires?
I developed it with VCS Express 2005, so it requires it by default. I don't think I used any 2.0 only functionality...
 
S754 AMD A64 3000+ at 2.4 gig Ram at 240HTT 2.5,2,2,5,T1
9800pro core at 380 and the mem at 190(Somehting wrong with the mem and not stable above 200)

Code:
GL filter framework 1.3 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...



Testing 32x32 image:
BufferCreateINT: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 297 || ms/i: 0.1485 || i/s: 6734.01
SimpleSmooth: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
TexNoise: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
3x3Conv: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
TEncode: msecs: 157 || ms/i: 0.157 || i/s: 6369.43
TDecode: msecs: 203 || ms/i: 0.203 || i/s: 4926.11
LinDiffINT: msecs: 360 || ms/i: 0.18 || i/s: 5555.56
LinDiffINT16: msecs: 359 || ms/i: 0.1795 || i/s: 5571.03
LinDiffFP16: msecs: 359 || ms/i: 0.1795 || i/s: 5571.03
LinDiffFP32: msecs: 360 || ms/i: 0.18 || i/s: 5555.56
PMTEncoded: msecs: 484 || ms/i: 0.484 || i/s: 2066
.12
PMStandard: msecs: 547 || ms/i: 0.547 || i/s: 1828.15
PMBuffered: msecs: 78 || ms/i: 0.156 || i/s: 6410.26

Testing 64x64 image:
BufferCreateINT: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateINT16: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 297 || ms/i: 0.1485 || i/s: 6734.01
SimpleSmooth: msecs: 296 || ms/i: 0.148 || i/s: 6756.76
TexNoise: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
3x3Conv: msecs: 156 || ms/i: 0.156 || i/s: 6410.26
TEncode: msecs: 156 || ms/i: 0.156 || i/s: 6410.26
TDecode: msecs: 219 || ms/i: 0.219 || i/s: 4566.21
LinDiffINT: msecs: 359 || ms/i: 0.1795 || i/s: 5571.03
LinDiffINT16: msecs: 360 || ms/i: 0.18 || i/s: 5555.56
LinDiffFP16: msecs: 359 || ms/i: 0.1795 || i/s: 5571.03
LinDiffFP32: msecs: 359 || ms/i: 0.1795 || i/s: 5571.03
PMTEncoded: msecs: 484 || ms/i: 0.484 || i/s: 2066.12
PMStandard: msecs: 484 || ms/i: 0.484 || i/s: 2066.12
PMBuffered: msecs: 125 || ms/i: 0.25 || i/s: 4000

Testing 128x128 image:
BufferCreateINT: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 297 || ms/i: 0.1485 || i/s: 6734.01
SimpleSmooth: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
TexNoise: msecs: 344 || ms/i: 0.172 || i/s: 5813.95
3x3Conv: msecs: 281 || ms/i: 0.281 || i/s: 3558.72
TEncode: msecs: 141 || ms/i: 0.141 || i/s: 7092.2
TDecode: msecs: 203 || ms/i: 0.203 || i/s: 4926.11
LinDiffINT: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
LinDiffINT16: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
LinDiffFP16: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
LinDiffFP32: msecs: 344 || ms/i: 0.172 || i/s: 5813.95
PMTEncoded: msecs: 500 || ms/i: 0.5 || i/s: 2000
PMStandard: msecs: 563 || ms/i: 0.563 || i/s: 1776.2
PMBuffered: msecs: 421 || ms/i: 0.842 || i/s: 1187.65

Testing 256x256 image:
BufferCreateINT: msecs: 672 || ms/i: 112 || i/s: 8.92857
BufferCreateINT16: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 468 || ms/i: 0.234 || i/s: 4273.5
SimpleSmooth: msecs: 579 || ms/i: 0.2895 || i/s: 3454.23
TexNoise: msecs: 781 || ms/i: 0.3905 || i/s: 2560.82
3x3Conv: msecs: 1047 || ms/i: 1.047 || i/s: 955.11
TEncode: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
TDecode: msecs: 437 || ms/i: 0.437 || i/s: 2288.33
LinDiffINT: msecs: 782 || ms/i: 0.391 || i/s: 2557.54
LinDiffINT16: msecs: 781 || ms/i: 0.3905 || i/s: 2560.82
LinDiffFP16: msecs: 797 || ms/i: 0.3985 || i/s: 2509.41
LinDiffFP32: msecs: 1266 || ms/i: 0.633 || i/s: 1579.78
PMTEncoded: msecs: 687 || ms/i: 0.687 || i/s: 1455.6
PMStandard: msecs: 1937 || ms/i: 1.937 || i/s: 516.262
PMBuffered: msecs: 1563 || ms/i: 3.126 || i/s: 319.898

Testing 512x512 image:
BufferCreateINT: msecs: 672 || ms/i: 112 || i/s: 8.92857
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 828 || ms/i: 0.828 || i/s: 1207.73
SimpleSmooth: msecs: 1094 || ms/i: 1.094 || i/s: 914.077
TexNoise: msecs: 875 || ms/i: 0.875 || i/s: 1142.86
3x3Conv: msecs: 2000 || ms/i: 4 || i/s: 250
TEncode: msecs: 125 || ms/i: 0.25 || i/s: 4000
TDecode: msecs: 812 || ms/i: 1.624 || i/s: 615.764
LinDiffINT: msecs: 1485 || ms/i: 1.485 || i/s: 673.401
LinDiffINT16: msecs: 1484 || ms/i: 1.484 || i/s: 673.854
LinDiffFP16: msecs: 1484 || ms/i: 1.484 || i/s: 673.854
LinDiffFP32: msecs: 2422 || ms/i: 2.422 || i/s: 412.882
PMTEncoded: msecs: 1250 || ms/i: 2.5 || i/s: 400
PMStandard: msecs: 3719 || ms/i: 7.438 || i/s: 134.445
PMBuffered: msecs: 2906 || ms/i: 11.624 || i/s: 86.0289

Testing 1024x1024 image:
BufferCreateINT: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateINT16: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 3187 || ms/i: 3.187 || i/s: 313.775
SimpleSmooth: msecs: 4265 || ms/i: 4.265 || i/s: 234.467
TexNoise: msecs: 3313 || ms/i: 3.313 || i/s: 301.841
3x3Conv: msecs: 7937 || ms/i: 15.874 || i/s: 62.9961
TEncode: msecs: 437 || ms/i: 0.874 || i/s: 1144.16
TDecode: msecs: 3203 || ms/i: 6.406 || i/s: 156.104
LinDiffINT: msecs: 5828 || ms/i: 5.828 || i/s: 171.585
LinDiffINT16: msecs: 5875 || ms/i: 5.875 || i/s: 170.213
LinDiffFP16: msecs: 5875 || ms/i: 5.875 || i/s: 170.213
LinDiffFP32: msecs: 9688 || ms/i: 9.688 || i/s: 103.22
PMTEncoded: msecs: 4828 || ms/i: 9.656 || i/s: 103.563
PMStandard: msecs: 14688 || ms/i: 29.376 || i/s: 34.0414
PMBuffered: msecs: 12266 || ms/i: 49.064 || i/s: 20.3815
 
Hmm, interesting, comparing those results with some obtained with a "standard" 9800Pro nicely shows which tests are bandwidth limited and which are gpu limited. Thanks!
 
No mere GF6600?

GPU: GF6600 128bits bus 256M
System: nF4-SLI A02 with A64 3200+
Driver: 71.84

Code:
GL filter framework 1.3 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...



Testing 32x32 image:
BufferCreateINT: msecs: 250 || ms/i: 41.6667 || i/s: 24
No suitable INT format found. Trying FP... (Flaky 6x00 workaround)

BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 79 || ms/i: 13.1667 || i/s: 75.9494
JustCopy: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
SimpleSmooth: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
TexNoise: msecs: 375 || ms/i: 0.1875 || i/s: 5333.33
3x3Conv: msecs: 281 || ms/i: 0.281 || i/s: 3558.72
TEncode: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
TDecode: msecs: 187 || ms/i: 0.187 || i/s: 5347.59
LinDiffINT: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
LinDiffINT16: msecs: 235 || ms/i: 0.1175 || i/s: 8510.64
LinDiffFP16: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
LinDiffFP32: msecs: 234 || ms/i: 0.117 || i/s: 8547.01
PMTEncoded: msecs: 469 || ms/i: 0.469 || i/s: 2132.2
PMStandard: msecs: 360 || ms/i: 0.36 || i/s: 2777.78
PMBuffered: msecs: 31 || ms/i: 0.062 || i/s: 16129

Testing 64x64 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 171 || ms/i: 0.0855 || i/s: 11695.9
SimpleSmooth: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
TexNoise: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
3x3Conv: msecs: 110 || ms/i: 0.11 || i/s: 9090.91
TEncode: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
TDecode: msecs: 110 || ms/i: 0.11 || i/s: 9090.91
LinDiffINT: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
LinDiffINT16: msecs: 234 || ms/i: 0.117 || i/s: 8547.01
LinDiffFP16: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
LinDiffFP32: msecs: 390 || ms/i: 0.195 || i/s: 5128.21
PMTEncoded: msecs: 360 || ms/i: 0.36 || i/s: 2777.78
PMStandard: msecs: 594 || ms/i: 0.594 || i/s: 1683.5
PMBuffered: msecs: 31 || ms/i: 0.062 || i/s: 16129

Testing 128x128 image:
BufferCreateINT: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
SimpleSmooth: msecs: 218 || ms/i: 0.109 || i/s: 9174.31
TexNoise: msecs: 391 || ms/i: 0.1955 || i/s: 5115.09
3x3Conv: msecs: 187 || ms/i: 0.187 || i/s: 5347.59
TEncode: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
TDecode: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
LinDiffINT: msecs: 218 || ms/i: 0.109 || i/s: 9174.31
LinDiffINT16: msecs: 407 || ms/i: 0.2035 || i/s: 4914
LinDiffFP16: msecs: 407 || ms/i: 0.2035 || i/s: 4914
LinDiffFP32: msecs: 1484 || ms/i: 0.742 || i/s: 1347.71
PMTEncoded: msecs: 656 || ms/i: 0.656 || i/s: 1524.39
PMStandard: msecs: 2156 || ms/i: 2.156 || i/s: 463.822
PMBuffered: msecs: 312 || ms/i: 0.624 || i/s: 1602.56

Testing 256x256 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 484 || ms/i: 0.242 || i/s: 4132.23
SimpleSmooth: msecs: 719 || ms/i: 0.3595 || i/s: 2781.64
TexNoise: msecs: 750 || ms/i: 0.375 || i/s: 2666.67
3x3Conv: msecs: 625 || ms/i: 0.625 || i/s: 1600
TEncode: msecs: 157 || ms/i: 0.157 || i/s: 6369.43
TDecode: msecs: 453 || ms/i: 0.453 || i/s: 2207.51
LinDiffINT: msecs: 500 || ms/i: 0.25 || i/s: 4000
LinDiffINT16: msecs: 1578 || ms/i: 0.789 || i/s: 1267.43
LinDiffFP16: msecs: 1500 || ms/i: 0.75 || i/s: 1333.33
LinDiffFP32: msecs: 5812 || ms/i: 2.906 || i/s: 344.116
PMTEncoded: msecs: 2453 || ms/i: 2.453 || i/s: 407.664
PMStandard: msecs: 8594 || ms/i: 8.594 || i/s: 116.36
PMBuffered: msecs: 2079 || ms/i: 4.158 || i/s: 240.5

Testing 512x512 image:
BufferCreateINT: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 843 || ms/i: 0.843 || i/s: 1186.24
SimpleSmooth: msecs: 1329 || ms/i: 1.329 || i/s: 752.445
TexNoise: msecs: 1109 || ms/i: 1.109 || i/s: 901.713
3x3Conv: msecs: 1125 || ms/i: 2.25 || i/s: 444.444
TEncode: msecs: 266 || ms/i: 0.532 || i/s: 1879.7
TDecode: msecs: 796 || ms/i: 1.592 || i/s: 628.141
LinDiffINT: msecs: 860 || ms/i: 0.86 || i/s: 1162.79
LinDiffINT16: msecs: 2672 || ms/i: 2.672 || i/s: 374.251
LinDiffFP16: msecs: 2671 || ms/i: 2.671 || i/s: 374.392
LinDiffFP32: msecs: 10547 || ms/i: 10.547 || i/s: 94.8137
PMTEncoded: msecs: 4734 || ms/i: 9.468 || i/s: 105.619
PMStandard: msecs: 16172 || ms/i: 32.344 || i/s: 30.9176
PMBuffered: msecs: 2281 || ms/i: 9.124 || i/s: 109.601

Testing 1024x1024 image:
BufferCreateINT: msecs: 93 || ms/i: 15.5 || i/s: 64.5161
BufferCreateINT16: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
BufferCreateFP16: msecs: 156 || ms/i: 26 || i/s: 38.4615
BufferCreateFP32: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
JustCopy: msecs: 3344 || ms/i: 3.344 || i/s: 299.043
SimpleSmooth: msecs: 5235 || ms/i: 5.235 || i/s: 191.022
TexNoise: msecs: 4046 || ms/i: 4.046 || i/s: 247.158
3x3Conv: msecs: 4391 || ms/i: 8.782 || i/s: 113.869
TEncode: msecs: 1016 || ms/i: 2.032 || i/s: 492.126
TDecode: msecs: 3078 || ms/i: 6.156 || i/s: 162.443
LinDiffINT: msecs: 3360 || ms/i: 3.36 || i/s: 297.619
LinDiffINT16: msecs: 10531 || ms/i: 10.531 || i/s: 94.9577
LinDiffFP16: msecs: 10531 || ms/i: 10.531 || i/s: 94.9577
LinDiffFP32: msecs: 42047 || ms/i: 42.047 || i/s: 23.7829
PMTEncoded: msecs: 17828 || ms/i: 35.656 || i/s: 28.0458
PMStandard: msecs: 62391 || ms/i: 124.782 || i/s: 8.01398
PMBuffered: msecs: 356016 || ms/i: 1424.06 || i/s: 0.702216

Finished. Press return key to close...
                Don't forget to copy the results!
 
991060 said:
I'm expecting adding support for FBO in the next release. ;)

Sorry for not responding earlier, I didn't realize that you meant Framebuffer Objects at first, and then I just forgot about your reply for a while. I plan to add FBO support as soon as the ATI drivers support them, and use them exclusively whenever possible. In fact I have been waiting (and asking) for an extension like that for over 1 year now, and - while it's not superbufffers - it's nice to finally see an improvement on the (previously horrible) OpenGL RtT situation.

My code will be simpler, more concise, platform independent and faster because of the introduction of this extension. I'm really looking forward to it. Does anyone know when GL_EXT_framebuffer_object will be supported in Catalyst?
 
PeterT said:
... and - while it's not superbuffers - it's nice to ...
:LOL:
I'm currently reading the complete renderbuffer spec, and I found this gem which perfectly expresses my own sentiments:
framebuffer_object.txt said:
We obviously won't call this "ARB_compromise_buffers", so
what name should we use?
And I thought that people writing extension specs were mostly devoid any sense of humor.

[edit]
I knew there was something supernatural about all of this...
(53) When supporting ARB_draw_buffers, do we need the level of
indirection between fragment color outputs and attached
mages provided in that API?
 
Here's some high end results on a common platform. These are both AGP based on an FX-55 platform.

X800 XT PE (Cat 5.3)

Code:
Testing 32x32 image:
BufferCreateINT: msecs: 687 || ms/i: 114.5 || i/s: 8.73362
BufferCreateINT16: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
SimpleSmooth: msecs: 265 || ms/i: 0.1325 || i/s: 7547.17
TexNoise: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
3x3Conv: msecs: 157 || ms/i: 0.157 || i/s: 6369.43
TEncode: msecs: 141 || ms/i: 0.141 || i/s: 7092.2
TDecode: msecs: 187 || ms/i: 0.187 || i/s: 5347.59
LinDiffINT: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
LinDiffINT16: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
LinDiffFP16: msecs: 282 || ms/i: 0.141 || i/s: 7092.2
LinDiffFP32: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
PMTEncoded: msecs: 500 || ms/i: 0.5 || i/s: 2000
PMStandard: msecs: 485 || ms/i: 0.485 || i/s: 2061.86
PMBuffered: msecs: 78 || ms/i: 0.156 || i/s: 6410.26

Testing 64x64 image:
BufferCreateINT: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 297 || ms/i: 0.1485 || i/s: 6734.01
SimpleSmooth: msecs: 328 || ms/i: 0.164 || i/s: 6097.56
TexNoise: msecs: 328 || ms/i: 0.164 || i/s: 6097.56
3x3Conv: msecs: 156 || ms/i: 0.156 || i/s: 6410.26
TEncode: msecs: 141 || ms/i: 0.141 || i/s: 7092.2
TDecode: msecs: 218 || ms/i: 0.218 || i/s: 4587.16
LinDiffINT: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
LinDiffINT16: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
LinDiffFP16: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
LinDiffFP32: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
PMTEncoded: msecs: 437 || ms/i: 0.437 || i/s: 2288.33
PMStandard: msecs: 422 || ms/i: 0.422 || i/s: 2369.67
PMBuffered: msecs: 62 || ms/i: 0.124 || i/s: 8064.52

Testing 128x128 image:
BufferCreateINT: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
SimpleSmooth: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
TexNoise: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
3x3Conv: msecs: 141 || ms/i: 0.141 || i/s: 7092.2
TEncode: msecs: 125 || ms/i: 0.125 || i/s: 8000
TDecode: msecs: 188 || ms/i: 0.188 || i/s: 5319.15
LinDiffINT: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffINT16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP32: msecs: 282 || ms/i: 0.141 || i/s: 7092.2
PMTEncoded: msecs: 438 || ms/i: 0.438 || i/s: 2283.11
PMStandard: msecs: 485 || ms/i: 0.485 || i/s: 2061.86
PMBuffered: msecs: 78 || ms/i: 0.156 || i/s: 6410.26

Testing 256x256 image:
BufferCreateINT: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
SimpleSmooth: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
TexNoise: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
3x3Conv: msecs: 297 || ms/i: 0.297 || i/s: 3367
TEncode: msecs: 141 || ms/i: 0.141 || i/s: 7092.2
TDecode: msecs: 187 || ms/i: 0.187 || i/s: 5347.59
LinDiffINT: msecs: 282 || ms/i: 0.141 || i/s: 7092.2
LinDiffINT16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP32: msecs: 328 || ms/i: 0.164 || i/s: 6097.56
PMTEncoded: msecs: 437 || ms/i: 0.437 || i/s: 2288.33
PMStandard: msecs: 562 || ms/i: 0.562 || i/s: 1779.36
PMBuffered: msecs: 282 || ms/i: 0.564 || i/s: 1773.05

Testing 512x512 image:
BufferCreateINT: msecs: 671 || ms/i: 111.833 || i/s: 8.94188
BufferCreateINT16: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
SimpleSmooth: msecs: 359 || ms/i: 0.359 || i/s: 2785.52
TexNoise: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
3x3Conv: msecs: 469 || ms/i: 0.938 || i/s: 1066.1
TEncode: msecs: 63 || ms/i: 0.126 || i/s: 7936.51
TDecode: msecs: 140 || ms/i: 0.28 || i/s: 3571.43
LinDiffINT: msecs: 391 || ms/i: 0.391 || i/s: 2557.54
LinDiffINT16: msecs: 390 || ms/i: 0.39 || i/s: 2564.1
LinDiffFP16: msecs: 406 || ms/i: 0.406 || i/s: 2463.05
LinDiffFP32: msecs: 578 || ms/i: 0.578 || i/s: 1730.1
PMTEncoded: msecs: 390 || ms/i: 0.78 || i/s: 1282.05
PMStandard: msecs: 984 || ms/i: 1.968 || i/s: 508.13
PMBuffered: msecs: 234 || ms/i: 0.936 || i/s: 1068.38

Testing 1024x1024 image:
BufferCreateINT: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 656 || ms/i: 0.656 || i/s: 1524.39
SimpleSmooth: msecs: 1359 || ms/i: 1.359 || i/s: 735.835
TexNoise: msecs: 1125 || ms/i: 1.125 || i/s: 888.889
3x3Conv: msecs: 1797 || ms/i: 3.594 || i/s: 278.242
TEncode: msecs: 63 || ms/i: 0.126 || i/s: 7936.51
TDecode: msecs: 609 || ms/i: 1.218 || i/s: 821.018
LinDiffINT: msecs: 1547 || ms/i: 1.547 || i/s: 646.412
LinDiffINT16: msecs: 1563 || ms/i: 1.563 || i/s: 639.795
LinDiffFP16: msecs: 1562 || ms/i: 1.562 || i/s: 640.205
LinDiffFP32: msecs: 2297 || ms/i: 2.297 || i/s: 435.35
PMTEncoded: msecs: 1438 || ms/i: 2.876 || i/s: 347.705
PMStandard: msecs: 4063 || ms/i: 8.126 || i/s: 123.062
PMBuffered: msecs: 1937 || ms/i: 7.748 || i/s: 129.066

6800 Ultra (71.84)
Code:
Testing 32x32 image:
BufferCreateINT: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
No suitable INT format found. Trying FP... (Flaky 6x00 workaround)

BufferCreateINT16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateFP16: msecs: 46 || ms/i: 7.66667 || i/s: 130.435
BufferCreateFP32: msecs: 32 || ms/i: 5.33333 || i/s: 187.5
JustCopy: msecs: 110 || ms/i: 0.055 || i/s: 18181.8
SimpleSmooth: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
TexNoise: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
3x3Conv: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
TEncode: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
TDecode: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
LinDiffINT: msecs: 125 || ms/i: 0.0625 || i/s: 16000
LinDiffINT16: msecs: 125 || ms/i: 0.0625 || i/s: 16000
LinDiffFP16: msecs: 141 || ms/i: 0.0705 || i/s: 14184.4
LinDiffFP32: msecs: 141 || ms/i: 0.0705 || i/s: 14184.4
PMTEncoded: msecs: 297 || ms/i: 0.297 || i/s: 3367
PMStandard: msecs: 219 || ms/i: 0.219 || i/s: 4566.21
PMBuffered: msecs: 16 || ms/i: 0.032 || i/s: 31250

Testing 64x64 image:
BufferCreateINT: msecs: 47 || ms/i: 7.83333 || i/s: 127.66
BufferCreateINT16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateFP16: msecs: 47 || ms/i: 7.83333 || i/s: 127.66
BufferCreateFP32: msecs: 31 || ms/i: 5.16667 || i/s: 193.548
JustCopy: msecs: 109 || ms/i: 0.0545 || i/s: 18348.6
SimpleSmooth: msecs: 110 || ms/i: 0.055 || i/s: 18181.8
TexNoise: msecs: 110 || ms/i: 0.055 || i/s: 18181.8
3x3Conv: msecs: 62 || ms/i: 0.062 || i/s: 16129
TEncode: msecs: 62 || ms/i: 0.062 || i/s: 16129
TDecode: msecs: 63 || ms/i: 0.063 || i/s: 15873
LinDiffINT: msecs: 141 || ms/i: 0.0705 || i/s: 14184.4
LinDiffINT16: msecs: 140 || ms/i: 0.07 || i/s: 14285.7
LinDiffFP16: msecs: 125 || ms/i: 0.0625 || i/s: 16000
LinDiffFP32: msecs: 141 || ms/i: 0.0705 || i/s: 14184.4
PMTEncoded: msecs: 235 || ms/i: 0.235 || i/s: 4255.32
PMStandard: msecs: 219 || ms/i: 0.219 || i/s: 4566.21
PMBuffered: msecs: 31 || ms/i: 0.062 || i/s: 16129

Testing 128x128 image:
BufferCreateINT: msecs: 47 || ms/i: 7.83333 || i/s: 127.66
BufferCreateINT16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateFP16: msecs: 47 || ms/i: 7.83333 || i/s: 127.66
BufferCreateFP32: msecs: 47 || ms/i: 7.83333 || i/s: 127.66
JustCopy: msecs: 109 || ms/i: 0.0545 || i/s: 18348.6
SimpleSmooth: msecs: 109 || ms/i: 0.0545 || i/s: 18348.6
TexNoise: msecs: 141 || ms/i: 0.0705 || i/s: 14184.4
3x3Conv: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
TEncode: msecs: 62 || ms/i: 0.062 || i/s: 16129
TDecode: msecs: 63 || ms/i: 0.063 || i/s: 15873
LinDiffINT: msecs: 141 || ms/i: 0.0705 || i/s: 14184.4
LinDiffINT16: msecs: 157 || ms/i: 0.0785 || i/s: 12738.9
LinDiffFP16: msecs: 140 || ms/i: 0.07 || i/s: 14285.7
LinDiffFP32: msecs: 407 || ms/i: 0.2035 || i/s: 4914
PMTEncoded: msecs: 234 || ms/i: 0.234 || i/s: 4273.5
PMStandard: msecs: 640 || ms/i: 0.64 || i/s: 1562.5
PMBuffered: msecs: 109 || ms/i: 0.218 || i/s: 4587.16

Testing 256x256 image:
BufferCreateINT: msecs: 47 || ms/i: 7.83333 || i/s: 127.66
BufferCreateINT16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateFP16: msecs: 47 || ms/i: 7.83333 || i/s: 127.66
BufferCreateFP32: msecs: 32 || ms/i: 5.33333 || i/s: 187.5
JustCopy: msecs: 125 || ms/i: 0.0625 || i/s: 16000
SimpleSmooth: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
TexNoise: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
3x3Conv: msecs: 203 || ms/i: 0.203 || i/s: 4926.11
TEncode: msecs: 63 || ms/i: 0.063 || i/s: 15873
TDecode: msecs: 141 || ms/i: 0.141 || i/s: 7092.2
LinDiffINT: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
LinDiffINT16: msecs: 500 || ms/i: 0.25 || i/s: 4000
LinDiffFP16: msecs: 500 || ms/i: 0.25 || i/s: 4000
LinDiffFP32: msecs: 1594 || ms/i: 0.797 || i/s: 1254.71
PMTEncoded: msecs: 734 || ms/i: 0.734 || i/s: 1362.4
PMStandard: msecs: 2406 || ms/i: 2.406 || i/s: 415.628
PMBuffered: msecs: 735 || ms/i: 1.47 || i/s: 680.272

Testing 512x512 image:
BufferCreateINT: msecs: 47 || ms/i: 7.83333 || i/s: 127.66
BufferCreateINT16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP16: msecs: 32 || ms/i: 5.33333 || i/s: 187.5
BufferCreateFP32: msecs: 46 || ms/i: 7.66667 || i/s: 130.435
JustCopy: msecs: 188 || ms/i: 0.188 || i/s: 5319.15
SimpleSmooth: msecs: 343 || ms/i: 0.343 || i/s: 2915.45
TexNoise: msecs: 266 || ms/i: 0.266 || i/s: 3759.4
3x3Conv: msecs: 359 || ms/i: 0.718 || i/s: 1392.76
TEncode: msecs: 78 || ms/i: 0.156 || i/s: 6410.26
TDecode: msecs: 250 || ms/i: 0.5 || i/s: 2000
LinDiffINT: msecs: 328 || ms/i: 0.328 || i/s: 3048.78
LinDiffINT16: msecs: 859 || ms/i: 0.859 || i/s: 1164.14
LinDiffFP16: msecs: 813 || ms/i: 0.813 || i/s: 1230.01
LinDiffFP32: msecs: 2828 || ms/i: 2.828 || i/s: 353.607
PMTEncoded: msecs: 1328 || ms/i: 2.656 || i/s: 376.506
PMStandard: msecs: 4484 || ms/i: 8.968 || i/s: 111.508
PMBuffered: msecs: 828 || ms/i: 3.312 || i/s: 301.932

Testing 1024x1024 image:
BufferCreateINT: msecs: 47 || ms/i: 7.83333 || i/s: 127.66
BufferCreateINT16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateFP16: msecs: 32 || ms/i: 5.33333 || i/s: 187.5
BufferCreateFP32: msecs: 47 || ms/i: 7.83333 || i/s: 127.66
JustCopy: msecs: 703 || ms/i: 0.703 || i/s: 1422.48
SimpleSmooth: msecs: 1328 || ms/i: 1.328 || i/s: 753.012
TexNoise: msecs: 922 || ms/i: 0.922 || i/s: 1084.6
3x3Conv: msecs: 1360 || ms/i: 2.72 || i/s: 367.647
TEncode: msecs: 297 || ms/i: 0.594 || i/s: 1683.5
TDecode: msecs: 890 || ms/i: 1.78 || i/s: 561.798
LinDiffINT: msecs: 1250 || ms/i: 1.25 || i/s: 800
LinDiffINT16: msecs: 3328 || ms/i: 3.328 || i/s: 300.481
LinDiffFP16: msecs: 3343 || ms/i: 3.343 || i/s: 299.133
LinDiffFP32: msecs: 11313 || ms/i: 11.313 || i/s: 88.3939
PMTEncoded: msecs: 5063 || ms/i: 10.126 || i/s: 98.7557
PMStandard: msecs: 17656 || ms/i: 35.312 || i/s: 28.319
PMBuffered: msecs: 213594 || ms/i: 854.376 || i/s: 1.17044

[Edit] - Forgot I just got this one in:

AGP X850 XT PE (Cat5.3)

Code:
Testing 32x32 image:
BufferCreateINT: msecs: 703 || ms/i: 117.167 || i/s: 8.53485
BufferCreateINT16: msecs: 672 || ms/i: 112 || i/s: 8.92857
BufferCreateFP16: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 265 || ms/i: 0.1325 || i/s: 7547.17
SimpleSmooth: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
TexNoise: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
3x3Conv: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
TEncode: msecs: 141 || ms/i: 0.141 || i/s: 7092.2
TDecode: msecs: 203 || ms/i: 0.203 || i/s: 4926.11
LinDiffINT: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
LinDiffINT16: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
LinDiffFP16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP32: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
PMTEncoded: msecs: 500 || ms/i: 0.5 || i/s: 2000
PMStandard: msecs: 485 || ms/i: 0.485 || i/s: 2061.86
PMBuffered: msecs: 78 || ms/i: 0.156 || i/s: 6410.26

Testing 64x64 image:
BufferCreateINT: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 296 || ms/i: 0.148 || i/s: 6756.76
SimpleSmooth: msecs: 329 || ms/i: 0.1645 || i/s: 6079.03
TexNoise: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
3x3Conv: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
TEncode: msecs: 140 || ms/i: 0.14 || i/s: 7142.86
TDecode: msecs: 219 || ms/i: 0.219 || i/s: 4566.21
LinDiffINT: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
LinDiffINT16: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
LinDiffFP16: msecs: 312 || ms/i: 0.156 || i/s: 6410.26
LinDiffFP32: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
PMTEncoded: msecs: 437 || ms/i: 0.437 || i/s: 2288.33
PMStandard: msecs: 422 || ms/i: 0.422 || i/s: 2369.67
PMBuffered: msecs: 62 || ms/i: 0.124 || i/s: 8064.52

Testing 128x128 image:
BufferCreateINT: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
SimpleSmooth: msecs: 265 || ms/i: 0.1325 || i/s: 7547.17
TexNoise: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
3x3Conv: msecs: 141 || ms/i: 0.141 || i/s: 7092.2
TEncode: msecs: 140 || ms/i: 0.14 || i/s: 7142.86
TDecode: msecs: 188 || ms/i: 0.188 || i/s: 5319.15
LinDiffINT: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffINT16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP16: msecs: 282 || ms/i: 0.141 || i/s: 7092.2
LinDiffFP32: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
PMTEncoded: msecs: 438 || ms/i: 0.438 || i/s: 2283.11
PMStandard: msecs: 484 || ms/i: 0.484 || i/s: 2066.12
PMBuffered: msecs: 78 || ms/i: 0.156 || i/s: 6410.26

Testing 256x256 image:
BufferCreateINT: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
JustCopy: msecs: 266 || ms/i: 0.133 || i/s: 7518.8
SimpleSmooth: msecs: 265 || ms/i: 0.1325 || i/s: 7547.17
TexNoise: msecs: 282 || ms/i: 0.141 || i/s: 7092.2
3x3Conv: msecs: 297 || ms/i: 0.297 || i/s: 3367
TEncode: msecs: 141 || ms/i: 0.141 || i/s: 7092.2
TDecode: msecs: 187 || ms/i: 0.187 || i/s: 5347.59
LinDiffINT: msecs: 282 || ms/i: 0.141 || i/s: 7092.2
LinDiffINT16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP16: msecs: 281 || ms/i: 0.1405 || i/s: 7117.44
LinDiffFP32: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
PMTEncoded: msecs: 453 || ms/i: 0.453 || i/s: 2207.51
PMStandard: msecs: 531 || ms/i: 0.531 || i/s: 1883.24
PMBuffered: msecs: 282 || ms/i: 0.564 || i/s: 1773.05

Testing 512x512 image:
BufferCreateINT: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
JustCopy: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
SimpleSmooth: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
TexNoise: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
3x3Conv: msecs: 453 || ms/i: 0.906 || i/s: 1103.75
TEncode: msecs: 62 || ms/i: 0.124 || i/s: 8064.52
TDecode: msecs: 125 || ms/i: 0.25 || i/s: 4000
LinDiffINT: msecs: 375 || ms/i: 0.375 || i/s: 2666.67
LinDiffINT16: msecs: 375 || ms/i: 0.375 || i/s: 2666.67
LinDiffFP16: msecs: 375 || ms/i: 0.375 || i/s: 2666.67
LinDiffFP32: msecs: 547 || ms/i: 0.547 || i/s: 1828.15
PMTEncoded: msecs: 375 || ms/i: 0.75 || i/s: 1333.33
PMStandard: msecs: 953 || ms/i: 1.906 || i/s: 524.659
PMBuffered: msecs: 219 || ms/i: 0.876 || i/s: 1141.55

Testing 1024x1024 image:
BufferCreateINT: msecs: 657 || ms/i: 109.5 || i/s: 9.13242
BufferCreateINT16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP16: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
BufferCreateFP32: msecs: 656 || ms/i: 109.333 || i/s: 9.14634
JustCopy: msecs: 672 || ms/i: 0.672 || i/s: 1488.1
SimpleSmooth: msecs: 1312 || ms/i: 1.312 || i/s: 762.195
TexNoise: msecs: 1109 || ms/i: 1.109 || i/s: 901.713
3x3Conv: msecs: 1734 || ms/i: 3.468 || i/s: 288.351
TEncode: msecs: 63 || ms/i: 0.126 || i/s: 7936.51
TDecode: msecs: 594 || ms/i: 1.188 || i/s: 841.751
LinDiffINT: msecs: 1500 || ms/i: 1.5 || i/s: 666.667
LinDiffINT16: msecs: 1500 || ms/i: 1.5 || i/s: 666.667
LinDiffFP16: msecs: 1500 || ms/i: 1.5 || i/s: 666.667
LinDiffFP32: msecs: 2203 || ms/i: 2.203 || i/s: 453.926
PMTEncoded: msecs: 1375 || ms/i: 2.75 || i/s: 363.636
PMStandard: msecs: 3890 || ms/i: 7.78 || i/s: 128.535
PMBuffered: msecs: 1985 || ms/i: 7.94 || i/s: 125.945
 
Thanks! Now I just wish I had such gaming^H^H^H^H *ahem* scientific testing capabilites here as well ;)

Note that I'll now bother you to run another .exe once I put in FBO support - in fact I may do so at every major framework / benchmark app update :D
 
For grins:
P4 3.0/800fsb/Northwood, 1 gig = 512x2 DDR400/dual channel
ATI AIW X800 XT/AGP, Catalyst 5.3's

Code:
Testing 32x32 image:
BufferCreateINT: msecs: 703 || ms/i: 117.167 || i/s: 8.53485
BufferCreateINT16: msecs: 672 || ms/i: 112 || i/s: 8.92857
BufferCreateFP16: msecs: 672 || ms/i: 112 || i/s: 8.92857
BufferCreateFP32: msecs: 687 || ms/i: 114.5 || i/s: 8.73362
JustCopy: msecs: 593 || ms/i: 0.2965 || i/s: 3372.68
SimpleSmooth: msecs: 610 || ms/i: 0.305 || i/s: 3278.69
TexNoise: msecs: 640 || ms/i: 0.32 || i/s: 3125
3x3Conv: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
TEncode: msecs: 312 || ms/i: 0.312 || i/s: 3205.13
TDecode: msecs: 516 || ms/i: 0.516 || i/s: 1937.98
LinDiffINT: msecs: 750 || ms/i: 0.375 || i/s: 2666.67
LinDiffINT16: msecs: 750 || ms/i: 0.375 || i/s: 2666.67
LinDiffFP16: msecs: 719 || ms/i: 0.3595 || i/s: 2781.64
LinDiffFP32: msecs: 734 || ms/i: 0.367 || i/s: 2724.8
PMTEncoded: msecs: 1187 || ms/i: 1.187 || i/s: 842.46
PMStandard: msecs: 1172 || ms/i: 1.172 || i/s: 853.242
PMBuffered: msecs: 125 || ms/i: 0.25 || i/s: 4000

Testing 64x64 image:
BufferCreateINT: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateINT16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP32: msecs: 750 || ms/i: 125 || i/s: 8
JustCopy: msecs: 703 || ms/i: 0.3515 || i/s: 2844.95
SimpleSmooth: msecs: 734 || ms/i: 0.367 || i/s: 2724.8
TexNoise: msecs: 750 || ms/i: 0.375 || i/s: 2666.67
3x3Conv: msecs: 391 || ms/i: 0.391 || i/s: 2557.54
TEncode: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
TDecode: msecs: 593 || ms/i: 0.593 || i/s: 1686.34
LinDiffINT: msecs: 750 || ms/i: 0.375 || i/s: 2666.67
LinDiffINT16: msecs: 765 || ms/i: 0.3825 || i/s: 2614.38
LinDiffFP16: msecs: 766 || ms/i: 0.383 || i/s: 2610.97
LinDiffFP32: msecs: 766 || ms/i: 0.383 || i/s: 2610.97
PMTEncoded: msecs: 1062 || ms/i: 1.062 || i/s: 941.62
PMStandard: msecs: 1031 || ms/i: 1.031 || i/s: 969.932
PMBuffered: msecs: 125 || ms/i: 0.25 || i/s: 4000

Testing 128x128 image:
BufferCreateINT: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateINT16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP32: msecs: 750 || ms/i: 125 || i/s: 8
JustCopy: msecs: 625 || ms/i: 0.3125 || i/s: 3200
SimpleSmooth: msecs: 641 || ms/i: 0.3205 || i/s: 3120.12
TexNoise: msecs: 671 || ms/i: 0.3355 || i/s: 2980.63
3x3Conv: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
TEncode: msecs: 312 || ms/i: 0.312 || i/s: 3205.13
TDecode: msecs: 531 || ms/i: 0.531 || i/s: 1883.24
LinDiffINT: msecs: 781 || ms/i: 0.3905 || i/s: 2560.82
LinDiffINT16: msecs: 797 || ms/i: 0.3985 || i/s: 2509.41
LinDiffFP16: msecs: 688 || ms/i: 0.344 || i/s: 2906.98
LinDiffFP32: msecs: 766 || ms/i: 0.383 || i/s: 2610.97
PMTEncoded: msecs: 1078 || ms/i: 1.078 || i/s: 927.644
PMStandard: msecs: 1047 || ms/i: 1.047 || i/s: 955.11
PMBuffered: msecs: 141 || ms/i: 0.282 || i/s: 3546.1

Testing 256x256 image:
BufferCreateINT: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateINT16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP32: msecs: 750 || ms/i: 125 || i/s: 8
JustCopy: msecs: 625 || ms/i: 0.3125 || i/s: 3200
SimpleSmooth: msecs: 640 || ms/i: 0.32 || i/s: 3125
TexNoise: msecs: 672 || ms/i: 0.336 || i/s: 2976.19
3x3Conv: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
TEncode: msecs: 328 || ms/i: 0.328 || i/s: 3048.78
TDecode: msecs: 547 || ms/i: 0.547 || i/s: 1828.15
LinDiffINT: msecs: 703 || ms/i: 0.3515 || i/s: 2844.95
LinDiffINT16: msecs: 687 || ms/i: 0.3435 || i/s: 2911.21
LinDiffFP16: msecs: 687 || ms/i: 0.3435 || i/s: 2911.21
LinDiffFP32: msecs: 672 || ms/i: 0.336 || i/s: 2976.19
PMTEncoded: msecs: 1078 || ms/i: 1.078 || i/s: 927.644
PMStandard: msecs: 1047 || ms/i: 1.047 || i/s: 955.11
PMBuffered: msecs: 312 || ms/i: 0.624 || i/s: 1602.56

Testing 512x512 image:
BufferCreateINT: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateINT16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP32: msecs: 750 || ms/i: 125 || i/s: 8
JustCopy: msecs: 359 || ms/i: 0.359 || i/s: 2785.52
SimpleSmooth: msecs: 391 || ms/i: 0.391 || i/s: 2557.54
TexNoise: msecs: 390 || ms/i: 0.39 || i/s: 2564.1
3x3Conv: msecs: 390 || ms/i: 0.78 || i/s: 1282.05
TEncode: msecs: 172 || ms/i: 0.344 || i/s: 2906.98
TDecode: msecs: 265 || ms/i: 0.53 || i/s: 1886.79
LinDiffINT: msecs: 422 || ms/i: 0.422 || i/s: 2369.67
LinDiffINT16: msecs: 407 || ms/i: 0.407 || i/s: 2457
LinDiffFP16: msecs: 422 || ms/i: 0.422 || i/s: 2369.67
LinDiffFP32: msecs: 625 || ms/i: 0.625 || i/s: 1600
PMTEncoded: msecs: 562 || ms/i: 1.124 || i/s: 889.68
PMStandard: msecs: 1078 || ms/i: 2.156 || i/s: 463.822
PMBuffered: msecs: 250 || ms/i: 1 || i/s: 1000

Testing 1024x1024 image:
BufferCreateINT: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateINT16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP16: msecs: 750 || ms/i: 125 || i/s: 8
BufferCreateFP32: msecs: 750 || ms/i: 125 || i/s: 8
JustCopy: msecs: 672 || ms/i: 0.672 || i/s: 1488.1
SimpleSmooth: msecs: 1406 || ms/i: 1.406 || i/s: 711.238
TexNoise: msecs: 1156 || ms/i: 1.156 || i/s: 865.052
3x3Conv: msecs: 1844 || ms/i: 3.688 || i/s: 271.15
TEncode: msecs: 172 || ms/i: 0.344 || i/s: 2906.98
TDecode: msecs: 562 || ms/i: 1.124 || i/s: 889.68
LinDiffINT: msecs: 1625 || ms/i: 1.625 || i/s: 615.385
LinDiffINT16: msecs: 1641 || ms/i: 1.641 || i/s: 609.385
LinDiffFP16: msecs: 1640 || ms/i: 1.64 || i/s: 609.756
LinDiffFP32: msecs: 2500 || ms/i: 2.5 || i/s: 400
PMTEncoded: msecs: 1516 || ms/i: 3.032 || i/s: 329.815
PMStandard: msecs: 4375 || ms/i: 8.75 || i/s: 114.286
PMBuffered: msecs: 2031 || ms/i: 8.124 || i/s: 123.092

Neat little toy. Thanks much for this thing as it does yield some interesting insight!

I should also add, this benchmark is very consistent. I can run the thing 2-5 times and get damn near identical results every time.
 
Sharkfood said:
I should also add, this benchmark is very consistent. I can run the thing 2-5 times and get damn near identical results every time.
That's because it doesn't bother itself with trivialities like actually outputting any fancy graphics ;)
But seriously, I spent some time trying to optimize benchmark runtime vs. consistency, and I believe I did quite well at that.
 
I just updated the zip linked on the first page to version 1.4 of the benchmark, which includes 4 new tests designed to clarify nv40 fp performance. If you have an nv40 based card please test this updated version!

If you already downloaded the zip of an older version, you can just get the updated exe.
 
CPU: Athlon 64 3000+
RAM: 1.5 GB
GPU: 6800GT
FW: 71.84

Code:
Results for BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for BufferCreateFP16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
Results for BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
Results for JustCopy: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for SimpleSmooth: msecs: 250 || ms/i: 0.125 || i/s: 8000
Results for TexNoise: msecs: 250 || ms/i: 0.125 || i/s: 8000
Results for 3x3Conv: msecs: 250 || ms/i: 0.25 || i/s: 4000
Results for TEncode: msecs: 156 || ms/i: 0.156 || i/s: 6410.26
Results for TDecode: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
Results for LinDiffINT: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
Results for LinDiffINT16: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
Results for LinDiffFP16: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
Results for LinDiffFP32: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
Results for LD_INT->FP16: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
Results for LD_INT->FP32: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LD_FP16->INT: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
Results for LD_FP32->INT: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for PMTEncoded: msecs: 406 || ms/i: 0.406 || i/s: 2463.05
Results for PMStandard: msecs: 312 || ms/i: 0.312 || i/s: 3205.13
Results for PMBuffered: msecs: 16 || ms/i: 0.032 || i/s: 31250

Testing 64x64 image:
Results for BufferCreateINT: msecs: 79 || ms/i: 13.1667 || i/s: 75.9494
Results for BufferCreateINT16: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateFP16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
Results for BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
Results for JustCopy: msecs: 157 || ms/i: 0.0785 || i/s: 12738.9
Results for SimpleSmooth: msecs: 171 || ms/i: 0.0855 || i/s: 11695.9
Results for TexNoise: msecs: 157 || ms/i: 0.0785 || i/s: 12738.9
Results for 3x3Conv: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for TEncode: msecs: 79 || ms/i: 0.079 || i/s: 12658.2
Results for TDecode: msecs: 93 || ms/i: 0.093 || i/s: 10752.7
Results for LinDiffINT: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
Results for LinDiffINT16: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
Results for LinDiffFP16: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
Results for LinDiffFP32: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
Results for LD_INT->FP16: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LD_INT->FP32: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
Results for LD_FP16->INT: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LD_FP32->INT: msecs: 93 || ms/i: 0.093 || i/s: 10752.7
Results for PMTEncoded: msecs: 313 || ms/i: 0.313 || i/s: 3194.89
Results for PMStandard: msecs: 297 || ms/i: 0.297 || i/s: 3367
Results for PMBuffered: msecs: 31 || ms/i: 0.062 || i/s: 16129

Testing 128x128 image:
Results for BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateINT16: msecs: 93 || ms/i: 15.5 || i/s: 64.5161
Results for BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
Results for BufferCreateFP32: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
Results for JustCopy: msecs: 156 || ms/i: 0.078 || i/s: 12820.5
Results for SimpleSmooth: msecs: 157 || ms/i: 0.0785 || i/s: 12738.9
Results for TexNoise: msecs: 156 || ms/i: 0.078 || i/s: 12820.5
Results for 3x3Conv: msecs: 79 || ms/i: 0.079 || i/s: 12658.2
Results for TEncode: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for TDecode: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LinDiffINT: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
Results for LinDiffINT16: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
Results for LinDiffFP16: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
Results for LinDiffFP32: msecs: 453 || ms/i: 0.2265 || i/s: 4415.01
Results for LD_INT->FP16: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LD_INT->FP32: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LD_FP16->INT: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LD_FP32->INT: msecs: 203 || ms/i: 0.203 || i/s: 4926.11
Results for PMTEncoded: msecs: 312 || ms/i: 0.312 || i/s: 3205.13
Results for PMStandard: msecs: 703 || ms/i: 0.703 || i/s: 1422.48
Results for PMBuffered: msecs: 125 || ms/i: 0.25 || i/s: 4000

Testing 256x256 image:
Results for BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for BufferCreateFP16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
Results for BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
Results for JustCopy: msecs: 157 || ms/i: 0.0785 || i/s: 12738.9
Results for SimpleSmooth: msecs: 234 || ms/i: 0.117 || i/s: 8547.01
Results for TexNoise: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
Results for 3x3Conv: msecs: 250 || ms/i: 0.25 || i/s: 4000
Results for TEncode: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for TDecode: msecs: 156 || ms/i: 0.156 || i/s: 6410.26
Results for LinDiffINT: msecs: 235 || ms/i: 0.1175 || i/s: 8510.64
Results for LinDiffINT16: msecs: 547 || ms/i: 0.2735 || i/s: 3656.31
Results for LinDiffFP16: msecs: 547 || ms/i: 0.2735 || i/s: 3656.31
Results for LinDiffFP32: msecs: 1703 || ms/i: 0.8515 || i/s: 1174.4
Results for LD_INT->FP16: msecs: 125 || ms/i: 0.125 || i/s: 8000
Results for LD_INT->FP32: msecs: 125 || ms/i: 0.125 || i/s: 8000
Results for LD_FP16->INT: msecs: 250 || ms/i: 0.25 || i/s: 4000
Results for LD_FP32->INT: msecs: 813 || ms/i: 0.813 || i/s: 1230.01
Results for PMTEncoded: msecs: 813 || ms/i: 0.813 || i/s: 1230.01
Results for PMStandard: msecs: 2672 || ms/i: 2.672 || i/s: 374.251
Results for PMBuffered: msecs: 765 || ms/i: 1.53 || i/s: 653.595

Testing 512x512 image:
Results for BufferCreateINT: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
Results for BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
Results for BufferCreateFP32: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
Results for JustCopy: msecs: 203 || ms/i: 0.203 || i/s: 4926.11
Results for SimpleSmooth: msecs: 390 || ms/i: 0.39 || i/s: 2564.1
Results for TexNoise: msecs: 297 || ms/i: 0.297 || i/s: 3367
Results for 3x3Conv: msecs: 406 || ms/i: 0.812 || i/s: 1231.53
Results for TEncode: msecs: 94 || ms/i: 0.188 || i/s: 5319.15
Results for TDecode: msecs: 281 || ms/i: 0.562 || i/s: 1779.36
Results for LinDiffINT: msecs: 375 || ms/i: 0.375 || i/s: 2666.67
Results for LinDiffINT16: msecs: 968 || ms/i: 0.968 || i/s: 1033.06
Results for LinDiffFP16: msecs: 969 || ms/i: 0.969 || i/s: 1031.99
Results for LinDiffFP32: msecs: 3109 || ms/i: 3.109 || i/s: 321.647
Results for LD_INT->FP16: msecs: 187 || ms/i: 0.374 || i/s: 2673.8
Results for LD_INT->FP32: msecs: 219 || ms/i: 0.438 || i/s: 2283.11
Results for LD_FP16->INT: msecs: 437 || ms/i: 0.874 || i/s: 1144.16
Results for LD_FP32->INT: msecs: 1453 || ms/i: 2.906 || i/s: 344.116
Results for PMTEncoded: msecs: 1468 || ms/i: 2.936 || i/s: 340.599
Results for PMStandard: msecs: 4922 || ms/i: 9.844 || i/s: 101.585
Results for PMBuffered: msecs: 891 || ms/i: 3.564 || i/s: 280.584

Testing 1024x1024 image:
Results for BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for BufferCreateFP16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
Results for BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
Results for JustCopy: msecs: 781 || ms/i: 0.781 || i/s: 1280.41
Results for SimpleSmooth: msecs: 1500 || ms/i: 1.5 || i/s: 666.667
Results for TexNoise: msecs: 1000 || ms/i: 1 || i/s: 1000
Results for 3x3Conv: msecs: 1531 || ms/i: 3.062 || i/s: 326.584
Results for TEncode: msecs: 328 || ms/i: 0.656 || i/s: 1524.39
Results for TDecode: msecs: 1000 || ms/i: 2 || i/s: 500
Results for LinDiffINT: msecs: 1422 || ms/i: 1.422 || i/s: 703.235
Results for LinDiffINT16: msecs: 3750 || ms/i: 3.75 || i/s: 266.667
Results for LinDiffFP16: msecs: 3750 || ms/i: 3.75 || i/s: 266.667
Results for LinDiffFP32: msecs: 12344 || ms/i: 12.344 || i/s: 81.011
Results for LD_INT->FP16: msecs: 703 || ms/i: 1.406 || i/s: 711.238
Results for LD_INT->FP32: msecs: 860 || ms/i: 1.72 || i/s: 581.395
Results for LD_FP16->INT: msecs: 1750 || ms/i: 3.5 || i/s: 285.714
Results for LD_FP32->INT: msecs: 5781 || ms/i: 11.562 || i/s: 86.4902
Results for PMTEncoded: msecs: 5547 || ms/i: 11.094 || i/s: 90.1388
Results for PMStandard: msecs: 19437 || ms/i: 38.874 || i/s: 25.7241
Results for PMBuffered: msecs: 343297 || ms/i: 1373.19 || i/s: 0.728232
 
PeterT said:
I just updated the zip linked on the first page to version 1.4 of the benchmark, which includes 4 new tests designed to clarify nv40 fp performance. If you have an nv40 based card please test this updated version!

If you already downloaded the zip of an older version, you can just get the updated exe.

I still see version 1.3 linked on the OP, but that's probably just because I am insane and refuse to see things the way they are :p. Anyways, here are the results using the linked "updated exe" on version 1.3. (PS, I even tried editing the link to read 1_4 instead of 1_3 but for some reason it still wants to download 1_3).

Geforce 6800 Ultra 425/1200
ForceWare 71.84, AGP x8, FastWrites, SideBand Addressing, AGP Aperture=256MB
K8T800Pro (Asus A8V Deluxe rev 2.0), Athlon 64 3500+
Code:
GL filter framework 1.4 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...



Testing 32x32 image:
Results for BufferCreateINT: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
No suitable INT format found. Trying FP... (Flaky 6x00 workaround)

Results for BufferCreateINT16: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
Results for BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateFP32: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
Results for JustCopy: msecs: 141 || ms/i: 0.0705 || i/s: 14184.4
Results for SimpleSmooth: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
Results for TexNoise: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
Results for 3x3Conv: msecs: 203 || ms/i: 0.203 || i/s: 4926.11
Results for TEncode: msecs: 125 || ms/i: 0.125 || i/s: 8000
Results for TDecode: msecs: 140 || ms/i: 0.14 || i/s: 7142.86
Results for LinDiffINT: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for LinDiffINT16: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for LinDiffFP16: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for LinDiffFP32: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for LD_INT->FP16: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for LD_INT->FP32: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for LD_FP16->INT: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LD_FP32->INT: msecs: 93 || ms/i: 0.093 || i/s: 10752.7
Results for PMTEncoded: msecs: 344 || ms/i: 0.344 || i/s: 2906.98
Results for PMStandard: msecs: 266 || ms/i: 0.266 || i/s: 3759.4
Results for PMBuffered: msecs: 16 || ms/i: 0.032 || i/s: 31250

Testing 64x64 image:
Results for BufferCreateINT: msecs: 79 || ms/i: 13.1667 || i/s: 75.9494
Results for BufferCreateINT16: msecs: 93 || ms/i: 15.5 || i/s: 64.5161
Results for BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
Results for JustCopy: msecs: 125 || ms/i: 0.0625 || i/s: 16000
Results for SimpleSmooth: msecs: 140 || ms/i: 0.07 || i/s: 14285.7
Results for TexNoise: msecs: 141 || ms/i: 0.0705 || i/s: 14184.4
Results for 3x3Conv: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for TEncode: msecs: 62 || ms/i: 0.062 || i/s: 16129
Results for TDecode: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for LinDiffINT: msecs: 171 || ms/i: 0.0855 || i/s: 11695.9
Results for LinDiffINT16: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for LinDiffFP16: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for LinDiffFP32: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for LD_INT->FP16: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LD_INT->FP32: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LD_FP16->INT: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for LD_FP32->INT: msecs: 79 || ms/i: 0.079 || i/s: 12658.2
Results for PMTEncoded: msecs: 281 || ms/i: 0.281 || i/s: 3558.72
Results for PMStandard: msecs: 281 || ms/i: 0.281 || i/s: 3558.72
Results for PMBuffered: msecs: 31 || ms/i: 0.062 || i/s: 16129

Testing 128x128 image:
Results for BufferCreateINT: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
Results for BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
Results for BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
Results for JustCopy: msecs: 140 || ms/i: 0.07 || i/s: 14285.7
Results for SimpleSmooth: msecs: 141 || ms/i: 0.0705 || i/s: 14184.4
Results for TexNoise: msecs: 141 || ms/i: 0.0705 || i/s: 14184.4
Results for 3x3Conv: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for TEncode: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for TDecode: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for LinDiffINT: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for LinDiffINT16: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for LinDiffFP16: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
Results for LinDiffFP32: msecs: 390 || ms/i: 0.195 || i/s: 5128.21
Results for LD_INT->FP16: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for LD_INT->FP32: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
Results for LD_FP16->INT: msecs: 93 || ms/i: 0.093 || i/s: 10752.7
Results for LD_FP32->INT: msecs: 188 || ms/i: 0.188 || i/s: 5319.15
Results for PMTEncoded: msecs: 281 || ms/i: 0.281 || i/s: 3558.72
Results for PMStandard: msecs: 609 || ms/i: 0.609 || i/s: 1642.04
Results for PMBuffered: msecs: 109 || ms/i: 0.218 || i/s: 4587.16

Testing 256x256 image:
Results for BufferCreateINT: msecs: 79 || ms/i: 13.1667 || i/s: 75.9494
Results for BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
Results for BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
Results for JustCopy: msecs: 125 || ms/i: 0.0625 || i/s: 16000
Results for SimpleSmooth: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
Results for TexNoise: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
Results for 3x3Conv: msecs: 204 || ms/i: 0.204 || i/s: 4901.96
Results for TEncode: msecs: 62 || ms/i: 0.062 || i/s: 16129
Results for TDecode: msecs: 125 || ms/i: 0.125 || i/s: 8000
Results for LinDiffINT: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
Results for LinDiffINT16: msecs: 453 || ms/i: 0.2265 || i/s: 4415.01
Results for LinDiffFP16: msecs: 484 || ms/i: 0.242 || i/s: 4132.23
Results for LinDiffFP32: msecs: 1500 || ms/i: 0.75 || i/s: 1333.33
Results for LD_INT->FP16: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
Results for LD_INT->FP32: msecs: 109 || ms/i: 0.109 || i/s: 9174.31
Results for LD_FP16->INT: msecs: 203 || ms/i: 0.203 || i/s: 4926.11
Results for LD_FP32->INT: msecs: 719 || ms/i: 0.719 || i/s: 1390.82
Results for PMTEncoded: msecs: 704 || ms/i: 0.704 || i/s: 1420.45
Results for PMStandard: msecs: 2328 || ms/i: 2.328 || i/s: 429.553
Results for PMBuffered: msecs: 719 || ms/i: 1.438 || i/s: 695.41

Testing 512x512 image:
Results for BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for JustCopy: msecs: 188 || ms/i: 0.188 || i/s: 5319.15
Results for SimpleSmooth: msecs: 328 || ms/i: 0.328 || i/s: 3048.78
Results for TexNoise: msecs: 250 || ms/i: 0.25 || i/s: 4000
Results for 3x3Conv: msecs: 328 || ms/i: 0.656 || i/s: 1524.39
Results for TEncode: msecs: 94 || ms/i: 0.188 || i/s: 5319.15
Results for TDecode: msecs: 203 || ms/i: 0.406 || i/s: 2463.05
Results for LinDiffINT: msecs: 312 || ms/i: 0.312 || i/s: 3205.13
Results for LinDiffINT16: msecs: 797 || ms/i: 0.797 || i/s: 1254.71
Results for LinDiffFP16: msecs: 813 || ms/i: 0.813 || i/s: 1230.01
Results for LinDiffFP32: msecs: 2672 || ms/i: 2.672 || i/s: 374.251
Results for LD_INT->FP16: msecs: 157 || ms/i: 0.314 || i/s: 3184.71
Results for LD_INT->FP32: msecs: 172 || ms/i: 0.344 || i/s: 2906.98
Results for LD_FP16->INT: msecs: 375 || ms/i: 0.75 || i/s: 1333.33
Results for LD_FP32->INT: msecs: 1250 || ms/i: 2.5 || i/s: 400
Results for PMTEncoded: msecs: 1282 || ms/i: 2.564 || i/s: 390.016
Results for PMStandard: msecs: 4250 || ms/i: 8.5 || i/s: 117.647
Results for PMBuffered: msecs: 797 || ms/i: 3.188 || i/s: 313.676

Testing 1024x1024 image:
Results for BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for BufferCreateINT16: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
Results for BufferCreateFP16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
Results for BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
Results for JustCopy: msecs: 687 || ms/i: 0.687 || i/s: 1455.6
Results for SimpleSmooth: msecs: 1250 || ms/i: 1.25 || i/s: 800
Results for TexNoise: msecs: 875 || ms/i: 0.875 || i/s: 1142.86
Results for 3x3Conv: msecs: 1266 || ms/i: 2.532 || i/s: 394.945
Results for TEncode: msecs: 281 || ms/i: 0.562 || i/s: 1779.36
Results for TDecode: msecs: 828 || ms/i: 1.656 || i/s: 603.865
Results for LinDiffINT: msecs: 1171 || ms/i: 1.171 || i/s: 853.971
Results for LinDiffINT16: msecs: 3125 || ms/i: 3.125 || i/s: 320
Results for LinDiffFP16: msecs: 3125 || ms/i: 3.125 || i/s: 320
Results for LinDiffFP32: msecs: 10719 || ms/i: 10.719 || i/s: 93.2923
Results for LD_INT->FP16: msecs: 578 || ms/i: 1.156 || i/s: 865.052
Results for LD_INT->FP32: msecs: 718 || ms/i: 1.436 || i/s: 696.379
Results for LD_FP16->INT: msecs: 1438 || ms/i: 2.876 || i/s: 347.705
Results for LD_FP32->INT: msecs: 5000 || ms/i: 10 || i/s: 100
Results for PMTEncoded: msecs: 4812 || ms/i: 9.624 || i/s: 103.907
Results for PMStandard: msecs: 16843 || ms/i: 33.686 || i/s: 29.6859
Results for PMBuffered: msecs: 280641 || ms/i: 1122.56 || i/s: 0.890818

Finished. Press return key to close...
                Don't forget to copy the results!
 
wireframe said:
I still see version 1.3 linked on the OP, but that's probably just because I am insane and refuse to see things the way they are :p. Anyways, here are the results using the linked "updated exe" on version 1.3. (PS, I even tried editing the link to read 1_4 instead of 1_3 but for some reason it still wants to download 1_3).

ZOMG you're right. Fixed it now. Thanks for pointing that out (and for your testing)!
 
I updated the OP with a new version of ORC (0.4).

Some of the new stuff:
- Multiple graphs (tabs)
- More results in the db
- Help (Well, a bit)
- Updated to reflect all test types of FBench 1.4
- Better colors (alpha fadeout for same test on different models)
- Image export

Screenshot:
04docuscreen.png
 
I don't mean to fuzz-up your thread in case my machine is just fubar, but has anyone tried running either the 1.3/1.4 version of your tool on the new Catalyst 5.4's + ATI? I'm seeing... really different results.. and hoping it's just I futzed up the CCC + install.
 
One more set of results for you.

Single GeForce 6800 Ultra (425/1100) 71.84 driver, FX-55, nForce4 SLI, 2GB system memory.

Code:
GL filter framework 1.4 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...



Testing 32x32 image:
Results for BufferCreateINT: msecs: 172 || ms/i: 28.6667 || i/s: 34.8837
No suitable INT format found. Trying FP... (Flaky 6x00 workaround)

Results for BufferCreateINT16: msecs: 172 || ms/i: 28.6667 || i/s: 34.8837
Results for BufferCreateFP16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
Results for BufferCreateFP32: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
Results for JustCopy: msecs: 3906 || ms/i: 1.953 || i/s: 512.033
Results for SimpleSmooth: msecs: 4015 || ms/i: 2.0075 || i/s: 498.132
Results for TexNoise: msecs: 4000 || ms/i: 2 || i/s: 500
Results for 3x3Conv: msecs: 2203 || ms/i: 2.203 || i/s: 453.926
Results for TEncode: msecs: 2031 || ms/i: 2.031 || i/s: 492.368
Results for TDecode: msecs: 2079 || ms/i: 2.079 || i/s: 481
Results for LinDiffINT: msecs: 3984 || ms/i: 1.992 || i/s: 502.008
Results for LinDiffINT16: msecs: 4015 || ms/i: 2.0075 || i/s: 498.132
Results for LinDiffFP16: msecs: 4000 || ms/i: 2 || i/s: 500
Results for LinDiffFP32: msecs: 4032 || ms/i: 2.016 || i/s: 496.032
Results for LD_INT->FP16: msecs: 2000 || ms/i: 2 || i/s: 500
Results for LD_INT->FP32: msecs: 2015 || ms/i: 2.015 || i/s: 496.278
Results for LD_FP16->INT: msecs: 2000 || ms/i: 2 || i/s: 500
Results for LD_FP32->INT: msecs: 2016 || ms/i: 2.016 || i/s: 496.032
Results for PMTEncoded: msecs: 6171 || ms/i: 6.171 || i/s: 162.048
Results for PMStandard: msecs: 6047 || ms/i: 6.047 || i/s: 165.371
Results for PMBuffered: msecs: 125 || ms/i: 0.25 || i/s: 4000

Testing 64x64 image:
Results for BufferCreateINT: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
Results for BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
Results for BufferCreateFP16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for BufferCreateFP32: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for JustCopy: msecs: 3890 || ms/i: 1.945 || i/s: 514.139
Results for SimpleSmooth: msecs: 3907 || ms/i: 1.9535 || i/s: 511.902
Results for TexNoise: msecs: 3953 || ms/i: 1.9765 || i/s: 505.945
Results for 3x3Conv: msecs: 1969 || ms/i: 1.969 || i/s: 507.872
Results for TEncode: msecs: 1953 || ms/i: 1.953 || i/s: 512.033
Results for TDecode: msecs: 2000 || ms/i: 2 || i/s: 500
Results for LinDiffINT: msecs: 3969 || ms/i: 1.9845 || i/s: 503.905
Results for LinDiffINT16: msecs: 3984 || ms/i: 1.992 || i/s: 502.008
Results for LinDiffFP16: msecs: 4016 || ms/i: 2.008 || i/s: 498.008
Results for LinDiffFP32: msecs: 4046 || ms/i: 2.023 || i/s: 494.315
Results for LD_INT->FP16: msecs: 2016 || ms/i: 2.016 || i/s: 496.032
Results for LD_INT->FP32: msecs: 1984 || ms/i: 1.984 || i/s: 504.032
Results for LD_FP16->INT: msecs: 2016 || ms/i: 2.016 || i/s: 496.032
Results for LD_FP32->INT: msecs: 2046 || ms/i: 2.046 || i/s: 488.759
Results for PMTEncoded: msecs: 6218 || ms/i: 6.218 || i/s: 160.823
Results for PMStandard: msecs: 6157 || ms/i: 6.157 || i/s: 162.417
Results for PMBuffered: msecs: 187 || ms/i: 0.374 || i/s: 2673.8

Testing 128x128 image:
Results for BufferCreateINT: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
Results for BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
Results for BufferCreateFP16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for BufferCreateFP32: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for JustCopy: msecs: 3922 || ms/i: 1.961 || i/s: 509.944
Results for SimpleSmooth: msecs: 4016 || ms/i: 2.008 || i/s: 498.008
Results for TexNoise: msecs: 4094 || ms/i: 2.047 || i/s: 488.52
Results for 3x3Conv: msecs: 2015 || ms/i: 2.015 || i/s: 496.278
Results for TEncode: msecs: 1968 || ms/i: 1.968 || i/s: 508.13
Results for TDecode: msecs: 2016 || ms/i: 2.016 || i/s: 496.032
Results for LinDiffINT: msecs: 4031 || ms/i: 2.0155 || i/s: 496.155
Results for LinDiffINT16: msecs: 4344 || ms/i: 2.172 || i/s: 460.405
Results for LinDiffFP16: msecs: 4203 || ms/i: 2.1015 || i/s: 475.851
Results for LinDiffFP32: msecs: 4281 || ms/i: 2.1405 || i/s: 467.181
Results for LD_INT->FP16: msecs: 2015 || ms/i: 2.015 || i/s: 496.278
Results for LD_INT->FP32: msecs: 2031 || ms/i: 2.031 || i/s: 492.368
Results for LD_FP16->INT: msecs: 2281 || ms/i: 2.281 || i/s: 438.404
Results for LD_FP32->INT: msecs: 2141 || ms/i: 2.141 || i/s: 467.071
Results for PMTEncoded: msecs: 6140 || ms/i: 6.14 || i/s: 162.866
Results for PMStandard: msecs: 6438 || ms/i: 6.438 || i/s: 155.328
Results for PMBuffered: msecs: 531 || ms/i: 1.062 || i/s: 941.62

Testing 256x256 image:
Results for BufferCreateINT: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
Results for BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
Results for BufferCreateFP16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for BufferCreateFP32: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for JustCopy: msecs: 4047 || ms/i: 2.0235 || i/s: 494.193
Results for SimpleSmooth: msecs: 4141 || ms/i: 2.0705 || i/s: 482.975
Results for TexNoise: msecs: 4265 || ms/i: 2.1325 || i/s: 468.933
Results for 3x3Conv: msecs: 3203 || ms/i: 3.203 || i/s: 312.207
Results for TEncode: msecs: 2109 || ms/i: 2.109 || i/s: 474.158
Results for TDecode: msecs: 2187 || ms/i: 2.187 || i/s: 457.247
Results for LinDiffINT: msecs: 4203 || ms/i: 2.1015 || i/s: 475.851
Results for LinDiffINT16: msecs: 4563 || ms/i: 2.2815 || i/s: 438.308
Results for LinDiffFP16: msecs: 4594 || ms/i: 2.297 || i/s: 435.35
Results for LinDiffFP32: msecs: 5281 || ms/i: 2.6405 || i/s: 378.716
Results for LD_INT->FP16: msecs: 2110 || ms/i: 2.11 || i/s: 473.934
Results for LD_INT->FP32: msecs: 2094 || ms/i: 2.094 || i/s: 477.555
Results for LD_FP16->INT: msecs: 2234 || ms/i: 2.234 || i/s: 447.628
Results for LD_FP32->INT: msecs: 2594 || ms/i: 2.594 || i/s: 385.505
Results for PMTEncoded: msecs: 6687 || ms/i: 6.687 || i/s: 149.544
Results for PMStandard: msecs: 8235 || ms/i: 8.235 || i/s: 121.433
Results for PMBuffered: msecs: 2578 || ms/i: 5.156 || i/s: 193.949

Testing 512x512 image:
Results for BufferCreateINT: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
Results for BufferCreateFP16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
Results for BufferCreateFP32: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
Results for JustCopy: msecs: 2234 || ms/i: 2.234 || i/s: 447.628
Results for SimpleSmooth: msecs: 2438 || ms/i: 2.438 || i/s: 410.172
Results for TexNoise: msecs: 2406 || ms/i: 2.406 || i/s: 415.628
Results for 3x3Conv: msecs: 2016 || ms/i: 4.032 || i/s: 248.016
Results for TEncode: msecs: 1141 || ms/i: 2.282 || i/s: 438.212
Results for TDecode: msecs: 1328 || ms/i: 2.656 || i/s: 376.506
Results for LinDiffINT: msecs: 2343 || ms/i: 2.343 || i/s: 426.803
Results for LinDiffINT16: msecs: 4172 || ms/i: 4.172 || i/s: 239.693
Results for LinDiffFP16: msecs: 3500 || ms/i: 3.5 || i/s: 285.714
Results for LinDiffFP32: msecs: 4813 || ms/i: 4.813 || i/s: 207.771
Results for LD_INT->FP16: msecs: 1360 || ms/i: 2.72 || i/s: 367.647
Results for LD_INT->FP32: msecs: 1265 || ms/i: 2.53 || i/s: 395.257
Results for LD_FP16->INT: msecs: 2469 || ms/i: 4.938 || i/s: 202.511
Results for LD_FP32->INT: msecs: 2453 || ms/i: 4.906 || i/s: 203.832
Results for PMTEncoded: msecs: 4266 || ms/i: 8.532 || i/s: 117.206
Results for PMStandard: msecs: 7734 || ms/i: 15.468 || i/s: 64.6496
Results for PMBuffered: msecs: 4172 || ms/i: 16.688 || i/s: 59.9233

Testing 1024x1024 image:
Results for BufferCreateINT: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
Results for BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
Results for BufferCreateFP16: msecs: 93 || ms/i: 15.5 || i/s: 64.5161
Results for BufferCreateFP32: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
Results for JustCopy: msecs: 5296 || ms/i: 5.296 || i/s: 188.822
Results for SimpleSmooth: msecs: 5094 || ms/i: 5.094 || i/s: 196.309
Results for TexNoise: msecs: 5563 || ms/i: 5.563 || i/s: 179.759
Results for 3x3Conv: msecs: 3718 || ms/i: 7.436 || i/s: 134.481
Results for TEncode: msecs: 2344 || ms/i: 4.688 || i/s: 213.311
Results for TDecode: msecs: 2250 || ms/i: 4.5 || i/s: 222.222
Results for LinDiffINT: msecs: 3453 || ms/i: 3.453 || i/s: 289.603
Results for LinDiffINT16: msecs: 10125 || ms/i: 10.125 || i/s: 98.7654
Results for LinDiffFP16: msecs: 10093 || ms/i: 10.093 || i/s: 99.0786
Results for LinDiffFP32: msecs: 17203 || ms/i: 17.203 || i/s: 58.1294
Results for LD_INT->FP16: msecs: 1875 || ms/i: 3.75 || i/s: 266.667
Results for LD_INT->FP32: msecs: 2156 || ms/i: 4.312 || i/s: 231.911
Results for LD_FP16->INT: msecs: 4750 || ms/i: 9.5 || i/s: 105.263
Results for LD_FP32->INT: msecs: 8469 || ms/i: 16.938 || i/s: 59.0388
Results for PMTEncoded: msecs: 7797 || ms/i: 15.594 || i/s: 64.1272
Results for PMStandard: msecs: 26656 || ms/i: 53.312 || i/s: 18.7575
Results for PMBuffered: msecs: 440187 || ms/i: 1760.75 || i/s: 0.56794

Finished. Press return key to close...
                Don't forget to copy the results!
 
Sharkfood said:
I don't mean to fuzz-up your thread in case my machine is just fubar, but has anyone tried running either the 1.3/1.4 version of your tool on the new Catalyst 5.4's + ATI? I'm seeing... really different results.. and hoping it's just I futzed up the CCC + install.
You are not fuzzing, you actually informed me that there are new catalysts out there! I installed them now, but they still don't seem to support the FBO extension - disappointing but not entirely unexpected. And I'm also seeing "different" results. I smell a major fuckup on either ours or ATIs part :oops:
 
Cool.. So I'm not crazy then. :)

I posted the deltas msec + i/s in the 5.4 thread.. but just picking the 512x512 arbitrarily, the change is:
Test....... 5.3 / 5.4 msec ....... 5.3 / 5.4 i/s
JustCopy: 359 / 1031 ....... 2785.52 / 969.932
SimpleSmooth: 391 / 1954 ..... 2557.54 / 511.771
TexNoise: 390 / 1171 ............. 2564.1 / 853.971
3x3Conv: 390 / 4110 ............ 1282.05 / 121.65
LinDiffINT: 422 / 2031 ............ 2369.67 / 492.368
LinDiffINT16: 407 / 2000 ........ 2457 / 500
LinDiffFP16: 422 / 2032 .......... 2369.67 /
LinDiffFP32: 625 / 2047 .......... 1600 / 488.52
PMTEncoded: 562 / 10172 ....... 889.68 / 49.1545
PMStandard: 1078 / 4125 ....... 463.822 / 121.212
PMBuffered: 250 / 1844 .......... 1000 / 135.575

Easily 400% reduction (or more) in performance with most all tests. Something is definately fubar.

I also tried copying the 4.12 atioglxxx.dll (didn't know this would work with an X800 XT!) into the directory where your tool is then running again.. instantly restored to fast performance once more, so it's apparently isolated to the ogl dll and/or it's noticing of new/fubarred registry settings is all I can fathom. Something is definately... different. :)
 
Back
Top