New GLSL / Pbuffer benchmark [Update: version 1.4 / ORCv0.4]

GeForce 6600GT, Athlon64 3400+, using Forceware 71.84
Code:
GL filter framework 1.3 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...



Testing 32x32 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
No suitable INT format found. Trying FP... (Flaky 6x00 workaround)

BufferCreateINT16: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
BufferCreateFP16: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
JustCopy: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
SimpleSmooth: msecs: 250 || ms/i: 0.125 || i/s: 8000
TexNoise: msecs: 234 || ms/i: 0.117 || i/s: 8547.01
3x3Conv: msecs: 234 || ms/i: 0.234 || i/s: 4273.5
TEncode: msecs: 156 || ms/i: 0.156 || i/s: 6410.26
TDecode: msecs: 172 || ms/i: 0.172 || i/s: 5813.95
LinDiffINT: msecs: 188 || ms/i: 0.094 || i/s: 10638.3
LinDiffINT16: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
LinDiffFP16: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
LinDiffFP32: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
PMTEncoded: msecs: 406 || ms/i: 0.406 || i/s: 2463.05
PMStandard: msecs: 297 || ms/i: 0.297 || i/s: 3367
PMBuffered: msecs: 16 || ms/i: 0.032 || i/s: 31250

Testing 64x64 image:
BufferCreateINT: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateFP32: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
JustCopy: msecs: 156 || ms/i: 0.078 || i/s: 12820.5
SimpleSmooth: msecs: 157 || ms/i: 0.0785 || i/s: 12738.9
TexNoise: msecs: 156 || ms/i: 0.078 || i/s: 12820.5
3x3Conv: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
TEncode: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
TDecode: msecs: 93 || ms/i: 0.093 || i/s: 10752.7
LinDiffINT: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
LinDiffINT16: msecs: 187 || ms/i: 0.0935 || i/s: 10695.2
LinDiffFP16: msecs: 203 || ms/i: 0.1015 || i/s: 9852.22
LinDiffFP32: msecs: 219 || ms/i: 0.1095 || i/s: 9132.42
PMTEncoded: msecs: 312 || ms/i: 0.312 || i/s: 3205.13
PMStandard: msecs: 343 || ms/i: 0.343 || i/s: 2915.45
PMBuffered: msecs: 31 || ms/i: 0.062 || i/s: 16129

Testing 128x128 image:
BufferCreateINT: msecs: 79 || ms/i: 13.1667 || i/s: 75.9494
BufferCreateINT16: msecs: 93 || ms/i: 15.5 || i/s: 64.5161
BufferCreateFP16: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
BufferCreateFP32: msecs: 63 || ms/i: 10.5 || i/s: 95.2381
JustCopy: msecs: 156 || ms/i: 0.078 || i/s: 12820.5
SimpleSmooth: msecs: 172 || ms/i: 0.086 || i/s: 11627.9
TexNoise: msecs: 234 || ms/i: 0.117 || i/s: 8547.01
3x3Conv: msecs: 141 || ms/i: 0.141 || i/s: 7092.2
TEncode: msecs: 78 || ms/i: 0.078 || i/s: 12820.5
TDecode: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
LinDiffINT: msecs: 171 || ms/i: 0.0855 || i/s: 11695.9
LinDiffINT16: msecs: 235 || ms/i: 0.1175 || i/s: 8510.64
LinDiffFP16: msecs: 250 || ms/i: 0.125 || i/s: 8000
LinDiffFP32: msecs: 906 || ms/i: 0.453 || i/s: 2207.51
PMTEncoded: msecs: 688 || ms/i: 0.688 || i/s: 1453.49
PMStandard: msecs: 1234 || ms/i: 1.234 || i/s: 810.373
PMBuffered: msecs: 172 || ms/i: 0.344 || i/s: 2906.98

Testing 256x256 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 62 || ms/i: 10.3333 || i/s: 96.7742
JustCopy: msecs: 313 || ms/i: 0.1565 || i/s: 6389.78
SimpleSmooth: msecs: 453 || ms/i: 0.2265 || i/s: 4415.01
TexNoise: msecs: 453 || ms/i: 0.2265 || i/s: 4415.01
3x3Conv: msecs: 390 || ms/i: 0.39 || i/s: 2564.1
TEncode: msecs: 94 || ms/i: 0.094 || i/s: 10638.3
TDecode: msecs: 265 || ms/i: 0.265 || i/s: 3773.58
LinDiffINT: msecs: 328 || ms/i: 0.164 || i/s: 6097.56
LinDiffINT16: msecs: 875 || ms/i: 0.4375 || i/s: 2285.71
LinDiffFP16: msecs: 906 || ms/i: 0.453 || i/s: 2207.51
LinDiffFP32: msecs: 3156 || ms/i: 1.578 || i/s: 633.714
PMTEncoded: msecs: 1437 || ms/i: 1.437 || i/s: 695.894
PMStandard: msecs: 4735 || ms/i: 4.735 || i/s: 211.193
PMBuffered: msecs: 1110 || ms/i: 2.22 || i/s: 450.45

Testing 512x512 image:
BufferCreateINT: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateINT16: msecs: 93 || ms/i: 15.5 || i/s: 64.5161
BufferCreateFP16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP32: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
JustCopy: msecs: 531 || ms/i: 0.531 || i/s: 1883.24
SimpleSmooth: msecs: 844 || ms/i: 0.844 || i/s: 1184.83
TexNoise: msecs: 1265 || ms/i: 1.265 || i/s: 790.514
3x3Conv: msecs: 688 || ms/i: 1.376 || i/s: 726.744
TEncode: msecs: 140 || ms/i: 0.28 || i/s: 3571.43
TDecode: msecs: 500 || ms/i: 1 || i/s: 1000
LinDiffINT: msecs: 516 || ms/i: 0.516 || i/s: 1937.98
LinDiffINT16: msecs: 1500 || ms/i: 1.5 || i/s: 666.667
LinDiffFP16: msecs: 1500 || ms/i: 1.5 || i/s: 666.667
LinDiffFP32: msecs: 5875 || ms/i: 5.875 || i/s: 170.213
PMTEncoded: msecs: 2579 || ms/i: 5.158 || i/s: 193.874
PMStandard: msecs: 8890 || ms/i: 17.78 || i/s: 56.243
PMBuffered: msecs: 1281 || ms/i: 5.124 || i/s: 195.16

Testing 1024x1024 image:
BufferCreateINT: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateINT16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 2218 || ms/i: 2.218 || i/s: 450.857
SimpleSmooth: msecs: 3110 || ms/i: 3.11 || i/s: 321.543
TexNoise: msecs: 2453 || ms/i: 2.453 || i/s: 407.664
3x3Conv: msecs: 2640 || ms/i: 5.28 || i/s: 189.394
TEncode: msecs: 579 || ms/i: 1.158 || i/s: 863.558
TDecode: msecs: 1875 || ms/i: 3.75 || i/s: 266.667
LinDiffINT: msecs: 2000 || ms/i: 2 || i/s: 500
LinDiffINT16: msecs: 6047 || ms/i: 6.047 || i/s: 165.371
LinDiffFP16: msecs: 5891 || ms/i: 5.891 || i/s: 169.75
LinDiffFP32: msecs: 23234 || ms/i: 23.234 || i/s: 43.0404
 
wireframe said:
Do you know which ForceWare revision your application needs as a minimum to run? I have a theory...maybe not so much a theory, but something that may be worth testing.
I don't know for sure, but I seem to remember the 6800 cards to have ATI_float support from the very beginning (ie. release). What is your theory?
 
I've never been quite sure why, but I've often felt that NVIDIA caches more in local RAM and ATI does, and this is evidenced by some of the past tests with FSAA enabled.
 
DaveBaumann said:
I've never been quite sure why, but I've often felt that NVIDIA caches more in local RAM and ATI does, and this is evidenced by some of the past tests with FSAA enabled.
Yes, it certainly seems like they do. Thanks for the information.

Para's performance in the last benchmark is more "normal" than the other nv4x results so far... I wonder why?
 
PeterT said:
wireframe said:
Do you know which ForceWare revision your application needs as a minimum to run? I have a theory...maybe not so much a theory, but something that may be worth testing.
I don't know for sure, but I seem to remember the 6800 cards to have ATI_float support from the very beginning (ie. release). What is your theory?

You mentioned swapping, but you don't reveal the footprint of your app. Like you said, a 128MB card was apparently running it without swapping and then I am puzzled why my 6800Ultra and 6800GTs with 256MB of memory are running out. My theory is less of a theory and more of a hunch, to be honest. The 6800 used to chug along in a great many things, seeming to be working very inefficiently. Maybe even running out of local memory when it shouldn't :?: . I was thinking it is possible that Nvidia has implemented some memory allocation clamp in the driver to prevent this. It's possible that this 'fix' is a quick-n-dirty and is somehow forcing your app to run out of space. Reverting to older drivers may suddenly cure this problem.
 
wireframe said:
PS. Any predictions of how SLI will impact performance on your code? We need someone with SLI to post their numbers, but I'd like to read your prediction before such numbers are revealed :D

That's a hard question to answer without the driver code to look at...

3 things could happen:
- it is not faster at all: this could be the case if only rendering to the FB is accellerated by SLI
- it is sped up similarly to other OpenGL apps: if pBuffer rendering is accellerated, there could be no inherent problem using SLI
- SLI is slower: there could be unforseen problems, like context switch synchronisation, buffer sharing troubles or similar stuff

So, in short, anything could happen :p
I'm happy that I'm not one of the guys who has to work on the multi-GPU code at either NV or ATI...
 
madmartyau said:
Would AGP apeture affect these scores?
Perhaps, if you set it very low it could prevent the driver from swapping.

[edit]
I did some fast calculations. The memory footprint for the final benchmark should be about 136 MB.
 
PeterT said:
madmartyau said:
Would AGP apeture affect these scores?
Perhaps, if you set it very low it could prevent the driver from swapping.

Hmm. I have 256MB on-board and a 256MB aperture. Fastwrites and sidebanding enabled BTW (sidebanding is hardwired so this is not something I can flip). This I will now test quickly because it will not require any software change. Then, I may go ahead and install ForceWare 61.77 or earlier and see what happens.
 
Athlon FX-55, Quadro FX 1400 (350/600, 128MB), nForce4 SLI, 71.84 Forceware

Code:
GL filter framework 1.3 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...

Testing 32x32 image:
BufferCreateINT: msecs: 234 || ms/i: 39 || i/s: 25.641
No suitable INT format found. Trying FP... (Flaky 6x00 workaround)

BufferCreateINT16: msecs: 157 || ms/i: 26.1667 || i/s: 38.2166
BufferCreateFP16: msecs: 93 || ms/i: 15.5 || i/s: 64.5161
BufferCreateFP32: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
JustCopy: msecs: 2422 || ms/i: 1.211 || i/s: 825.764
SimpleSmooth: msecs: 2546 || ms/i: 1.273 || i/s: 785.546
TexNoise: msecs: 2516 || ms/i: 1.258 || i/s: 794.913
3x3Conv: msecs: 1453 || ms/i: 1.453 || i/s: 688.231
TEncode: msecs: 1282 || ms/i: 1.282 || i/s: 780.031
TDecode: msecs: 2532 || ms/i: 2.532 || i/s: 394.945
LinDiffINT: msecs: 4937 || ms/i: 2.4685 || i/s: 405.104
LinDiffINT16: msecs: 4891 || ms/i: 2.4455 || i/s: 408.914
LinDiffFP16: msecs: 4891 || ms/i: 2.4455 || i/s: 408.914
LinDiffFP32: msecs: 4906 || ms/i: 2.453 || i/s: 407.664
PMTEncoded: msecs: 7515 || ms/i: 7.515 || i/s: 133.067
PMStandard: msecs: 7407 || ms/i: 7.407 || i/s: 135.007
PMBuffered: msecs: 171 || ms/i: 0.342 || i/s: 2923.98

Testing 64x64 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 110 || ms/i: 18.3333 || i/s: 54.5455
JustCopy: msecs: 2438 || ms/i: 1.219 || i/s: 820.345
SimpleSmooth: msecs: 2453 || ms/i: 1.2265 || i/s: 815.328
TexNoise: msecs: 2484 || ms/i: 1.242 || i/s: 805.153
3x3Conv: msecs: 1250 || ms/i: 1.25 || i/s: 800
TEncode: msecs: 1203 || ms/i: 1.203 || i/s: 831.255
TDecode: msecs: 2468 || ms/i: 2.468 || i/s: 405.186
LinDiffINT: msecs: 4907 || ms/i: 2.4535 || i/s: 407.581
LinDiffINT16: msecs: 4953 || ms/i: 2.4765 || i/s: 403.796
LinDiffFP16: msecs: 4953 || ms/i: 2.4765 || i/s: 403.796
LinDiffFP32: msecs: 5062 || ms/i: 2.531 || i/s: 395.101
PMTEncoded: msecs: 7438 || ms/i: 7.438 || i/s: 134.445
PMStandard: msecs: 7641 || ms/i: 7.641 || i/s: 130.873
PMBuffered: msecs: 234 || ms/i: 0.468 || i/s: 2136.75

Testing 128x128 image:
BufferCreateINT: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateINT16: msecs: 109 || ms/i: 18.1667 || i/s: 55.0459
BufferCreateFP16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP32: msecs: 78 || ms/i: 13 || i/s: 76.9231
JustCopy: msecs: 2453 || ms/i: 1.2265 || i/s: 815.328
SimpleSmooth: msecs: 2547 || ms/i: 1.2735 || i/s: 785.238
TexNoise: msecs: 2578 || ms/i: 1.289 || i/s: 775.795
3x3Conv: msecs: 1328 || ms/i: 1.328 || i/s: 753.012
TEncode: msecs: 1234 || ms/i: 1.234 || i/s: 810.373
TDecode: msecs: 2516 || ms/i: 2.516 || i/s: 397.456
LinDiffINT: msecs: 4984 || ms/i: 2.492 || i/s: 401.284
LinDiffINT16: msecs: 5094 || ms/i: 2.547 || i/s: 392.619
LinDiffFP16: msecs: 5094 || ms/i: 2.547 || i/s: 392.619
LinDiffFP32: msecs: 5640 || ms/i: 2.82 || i/s: 354.61
PMTEncoded: msecs: 7719 || ms/i: 7.719 || i/s: 129.55
PMStandard: msecs: 8469 || ms/i: 8.469 || i/s: 118.078
PMBuffered: msecs: 704 || ms/i: 1.408 || i/s: 710.227

Testing 256x256 image:
BufferCreateINT: msecs: 79 || ms/i: 13.1667 || i/s: 75.9494
BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
BufferCreateFP16: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateFP32: msecs: 125 || ms/i: 20.8333 || i/s: 48
JustCopy: msecs: 2641 || ms/i: 1.3205 || i/s: 757.289
SimpleSmooth: msecs: 2766 || ms/i: 1.383 || i/s: 723.066
TexNoise: msecs: 2735 || ms/i: 1.3675 || i/s: 731.261
3x3Conv: msecs: 1656 || ms/i: 1.656 || i/s: 603.865
TEncode: msecs: 1297 || ms/i: 1.297 || i/s: 771.01
TDecode: msecs: 2719 || ms/i: 2.719 || i/s: 367.782
LinDiffINT: msecs: 5265 || ms/i: 2.6325 || i/s: 379.867
LinDiffINT16: msecs: 5781 || ms/i: 2.8905 || i/s: 345.961
LinDiffFP16: msecs: 5656 || ms/i: 2.828 || i/s: 353.607
LinDiffFP32: msecs: 8079 || ms/i: 4.0395 || i/s: 247.555
PMTEncoded: msecs: 8719 || ms/i: 8.719 || i/s: 114.692
PMStandard: msecs: 11766 || ms/i: 11.766 || i/s: 84.9907
PMBuffered: msecs: 2438 || ms/i: 4.876 || i/s: 205.086

Testing 512x512 image:
BufferCreateINT: msecs: 78 || ms/i: 13 || i/s: 76.9231
BufferCreateINT16: msecs: 140 || ms/i: 23.3333 || i/s: 42.8571
BufferCreateFP16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP32: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
JustCopy: msecs: 1641 || ms/i: 1.641 || i/s: 609.385
SimpleSmooth: msecs: 1937 || ms/i: 1.937 || i/s: 516.262
TexNoise: msecs: 1703 || ms/i: 1.703 || i/s: 587.199
3x3Conv: msecs: 1516 || ms/i: 3.032 || i/s: 329.815
TEncode: msecs: 766 || ms/i: 1.532 || i/s: 652.742
TDecode: msecs: 1781 || ms/i: 3.562 || i/s: 280.741
LinDiffINT: msecs: 3235 || ms/i: 3.235 || i/s: 309.119
LinDiffINT16: msecs: 4078 || ms/i: 4.078 || i/s: 245.218
LinDiffFP16: msecs: 3921 || ms/i: 3.921 || i/s: 255.037
LinDiffFP32: msecs: 8704 || ms/i: 8.704 || i/s: 114.89
PMTEncoded: msecs: 6375 || ms/i: 12.75 || i/s: 78.4314
PMStandard: msecs: 13047 || ms/i: 26.094 || i/s: 38.323
PMBuffered: msecs: 4515 || ms/i: 18.06 || i/s: 55.371

Testing 1024x1024 image:
BufferCreateINT: msecs: 125 || ms/i: 20.8333 || i/s: 48
BufferCreateINT16: msecs: 125 || ms/i: 20.8333 || i/s: 48
BufferCreateFP16: msecs: 94 || ms/i: 15.6667 || i/s: 63.8298
BufferCreateFP32: msecs: 93 || ms/i: 15.5 || i/s: 64.5161
JustCopy: msecs: 2875 || ms/i: 2.875 || i/s: 347.826
SimpleSmooth: msecs: 4062 || ms/i: 4.062 || i/s: 246.184
TexNoise: msecs: 2938 || ms/i: 2.938 || i/s: 340.368
3x3Conv: msecs: 4093 || ms/i: 8.186 || i/s: 122.16
TEncode: msecs: 1297 || ms/i: 2.594 || i/s: 385.505
TDecode: msecs: 3328 || ms/i: 6.656 || i/s: 150.24
LinDiffINT: msecs: 5578 || ms/i: 5.578 || i/s: 179.276
LinDiffINT16: msecs: 8312 || ms/i: 8.312 || i/s: 120.308
LinDiffFP16: msecs: 8984 || ms/i: 8.984 || i/s: 111.309
LinDiffFP32: msecs: 24875 || ms/i: 24.875 || i/s: 40.201
PMTEncoded: msecs: 14171 || ms/i: 28.342 || i/s: 35.2833
PMStandard: msecs: 39359 || ms/i: 78.718 || i/s: 12.7036
PMBuffered: msecs: 313406 || ms/i: 1253.62 || i/s: 0.797687
 
DaveBaumann said:
I thought you couldn't post! ;)

What about if I quit and come and work for you instead? I could review a fan or something, then we'd be beyond 3D and the legal action would fall on its arse? :LOL: :LOL:
 
I just tested dropping the AGP aperture from 256 to 64. Status quo. Same abysmal performance. Left Fastwrites on. Would be nice to see some more 6800 scores on different platforms to see if one magically works.

PS. Thanks for catching and responding to my late edit regarding SLI.
 
I was going to post some SLI results... but using both SFR and AFR (modes 1 & 2 in nvapps.xml) causes a blue screen relating to nv4_disp - I think we can say it's driver related. ;)

It runs through and locks, followed by a BSOD after PMStandard in the 256x256 image batch tests. The other results seemed quite low though looking at them briefly - it didnt' save the results on the fly, so I can't show the results dump up until the crash.

error technical info from BSOD:

*** STOP: 0x000000EA (0x861CB798, 0x85EE76E8, 0xF794BCB4, 0x00000001)

nv4_disp

Error report info:

FBencherror.jpg


I ran a test before forcing an SLI mode (it was running in single GPU mode)

FX-55, DFI nF4 SLI-DR, 2x GeForce 6800 Ultra 425/1100MHz, 1GB OCZ PC3500 EB @ 400MHz 2.5-2-2-6 1T

Edit: using ForceWare 71.84 WHQL & nForce 4 Standalone driver version 6.53 on Windows XP Pro Service Pack 2.

Code:
GL filter framework 1.2999 test application by Peter Thoman 2004-2005

Gui initialized successfully.
DevIL initialized successfully.
 - DevIL Version: 167
OpenGL initialized successfully.
ILUT OpenGL mode set successfully.
Loaded required OpenGL extensions for GLPixelShader.
Loaded required OpenGL extensions for GLRenderTexture.
Loaded required OpenGL extensions for GLFilterStep.
Initialization complete.

Press return key to start benchmark...



Testing 32x32 image:
BufferCreateINT: msecs: 1222 || ms/i: 203.667 || i/s: 4.90998
BufferCreateINT16: msecs: 1191 || ms/i: 198.5 || i/s: 5.03778
BufferCreateFP16: msecs: 1212 || ms/i: 202 || i/s: 4.9505
BufferCreateFP32: msecs: 1172 || ms/i: 195.333 || i/s: 5.11945
JustCopy: msecs: 1152 || ms/i: 0.576 || i/s: 1736.11
SimpleSmooth: msecs: 1161 || ms/i: 0.5805 || i/s: 1722.65
TexNoise: msecs: 842 || ms/i: 0.421 || i/s: 2375.3
3x3Conv: msecs: 731 || ms/i: 0.731 || i/s: 1367.99
TEncode: msecs: 711 || ms/i: 0.711 || i/s: 1406.47
TDecode: msecs: 501 || ms/i: 0.501 || i/s: 1996.01
LinDiffINT: msecs: 861 || ms/i: 0.4305 || i/s: 2322.88
LinDiffINT16: msecs: 851 || ms/i: 0.4255 || i/s: 2350.18
LinDiffFP16: msecs: 832 || ms/i: 0.416 || i/s: 2403.85
LinDiffFP32: msecs: 811 || ms/i: 0.4055 || i/s: 2466.09
PMTEncoded: msecs: 1262 || ms/i: 1.262 || i/s: 792.393
PMStandard: msecs: 1292 || ms/i: 1.292 || i/s: 773.994
PMBuffered: msecs: 131 || ms/i: 0.262 || i/s: 3816.79

Testing 64x64 image:
BufferCreateINT: msecs: 1212 || ms/i: 202 || i/s: 4.9505
BufferCreateINT16: msecs: 1191 || ms/i: 198.5 || i/s: 5.03778
BufferCreateFP16: msecs: 1232 || ms/i: 205.333 || i/s: 4.87013
BufferCreateFP32: msecs: 1212 || ms/i: 202 || i/s: 4.9505
JustCopy: msecs: 821 || ms/i: 0.4105 || i/s: 2436.05
SimpleSmooth: msecs: 801 || ms/i: 0.4005 || i/s: 2496.88
TexNoise: msecs: 1142 || ms/i: 0.571 || i/s: 1751.31
3x3Conv: msecs: 411 || ms/i: 0.411 || i/s: 2433.09
TEncode: msecs: 681 || ms/i: 0.681 || i/s: 1468.43
TDecode: msecs: 480 || ms/i: 0.48 || i/s: 2083.33
LinDiffINT: msecs: 831 || ms/i: 0.4155 || i/s: 2406.74
LinDiffINT16: msecs: 901 || ms/i: 0.4505 || i/s: 2219.76
LinDiffFP16: msecs: 862 || ms/i: 0.431 || i/s: 2320.19
LinDiffFP32: msecs: 871 || ms/i: 0.4355 || i/s: 2296.21
PMTEncoded: msecs: 1332 || ms/i: 1.332 || i/s: 750.751
PMStandard: msecs: 1241 || ms/i: 1.241 || i/s: 805.802
PMBuffered: msecs: 250 || ms/i: 0.5 || i/s: 2000

Testing 128x128 image:
BufferCreateINT: msecs: 1192 || ms/i: 198.667 || i/s: 5.03356
BufferCreateINT16: msecs: 1262 || ms/i: 210.333 || i/s: 4.75436
BufferCreateFP16: msecs: 1201 || ms/i: 200.167 || i/s: 4.99584
BufferCreateFP32: msecs: 1212 || ms/i: 202 || i/s: 4.9505
JustCopy: msecs: 851 || ms/i: 0.4255 || i/s: 2350.18
SimpleSmooth: msecs: 831 || ms/i: 0.4155 || i/s: 2406.74
TexNoise: msecs: 1162 || ms/i: 0.581 || i/s: 1721.17
3x3Conv: msecs: 721 || ms/i: 0.721 || i/s: 1386.96
TEncode: msecs: 691 || ms/i: 0.691 || i/s: 1447.18
TDecode: msecs: 501 || ms/i: 0.501 || i/s: 1996.01
LinDiffINT: msecs: 851 || ms/i: 0.4255 || i/s: 2350.18
LinDiffINT16: msecs: 851 || ms/i: 0.4255 || i/s: 2350.18
LinDiffFP16: msecs: 862 || ms/i: 0.431 || i/s: 2320.19
LinDiffFP32: msecs: 841 || ms/i: 0.4205 || i/s: 2378.12
PMTEncoded: msecs: 1342 || ms/i: 1.342 || i/s: 745.156
PMStandard: msecs: 1341 || ms/i: 1.341 || i/s: 745.712
PMBuffered: msecs: 731 || ms/i: 1.462 || i/s: 683.995

Testing 256x256 image:
BufferCreateINT: msecs: 1212 || ms/i: 202 || i/s: 4.9505
BufferCreateINT16: msecs: 1192 || ms/i: 198.667 || i/s: 5.03356
BufferCreateFP16: msecs: 1231 || ms/i: 205.167 || i/s: 4.87409
BufferCreateFP32: msecs: 1212 || ms/i: 202 || i/s: 4.9505
JustCopy: msecs: 912 || ms/i: 0.456 || i/s: 2192.98
SimpleSmooth: msecs: 1472 || ms/i: 0.736 || i/s: 1358.7
TexNoise: msecs: 1191 || ms/i: 0.5955 || i/s: 1679.26
3x3Conv: msecs: 2654 || ms/i: 2.654 || i/s: 376.79
TEncode: msecs: 661 || ms/i: 0.661 || i/s: 1512.86
TDecode: msecs: 1222 || ms/i: 1.222 || i/s: 818.331
LinDiffINT: msecs: 2033 || ms/i: 1.0165 || i/s: 983.768
LinDiffINT16: msecs: 2033 || ms/i: 1.0165 || i/s: 983.768
LinDiffFP16: msecs: 2023 || ms/i: 1.0115 || i/s: 988.631
LinDiffFP32: msecs: 2373 || ms/i: 1.1865 || i/s: 842.815
PMTEncoded: msecs: 1752 || ms/i: 1.752 || i/s: 570.776
PMStandard: msecs: 3796 || ms/i: 3.796 || i/s: 263.435
PMBuffered: msecs: 2624 || ms/i: 5.248 || i/s: 190.549

Testing 512x512 image:
BufferCreateINT: msecs: 1222 || ms/i: 203.667 || i/s: 4.90998
BufferCreateINT16: msecs: 1212 || ms/i: 202 || i/s: 4.9505
BufferCreateFP16: msecs: 1201 || ms/i: 200.167 || i/s: 4.99584
BufferCreateFP32: msecs: 1222 || ms/i: 203.667 || i/s: 4.90998
JustCopy: msecs: 1252 || ms/i: 1.252 || i/s: 798.722
SimpleSmooth: msecs: 2714 || ms/i: 2.714 || i/s: 368.46
TexNoise: msecs: 1692 || ms/i: 1.692 || i/s: 591.017
3x3Conv: msecs: 5108 || ms/i: 10.216 || i/s: 97.8857
TEncode: msecs: 371 || ms/i: 0.742 || i/s: 1347.71
TDecode: msecs: 2283 || ms/i: 4.566 || i/s: 219.01
LinDiffINT: msecs: 3795 || ms/i: 3.795 || i/s: 263.505
LinDiffINT16: msecs: 3786 || ms/i: 3.786 || i/s: 264.131
LinDiffFP16: msecs: 3786 || ms/i: 3.786 || i/s: 264.131
LinDiffFP32: msecs: 4777 || ms/i: 4.777 || i/s: 209.336
PMTEncoded: msecs: 3114 || ms/i: 6.228 || i/s: 160.565
PMStandard: msecs: 7170 || ms/i: 14.34 || i/s: 69.735
PMBuffered: msecs: 4907 || ms/i: 19.628 || i/s: 50.9476

Testing 1024x1024 image:
BufferCreateINT: msecs: 1202 || ms/i: 200.333 || i/s: 4.99168
BufferCreateINT16: msecs: 1162 || ms/i: 193.667 || i/s: 5.16351
BufferCreateFP16: msecs: 1222 || ms/i: 203.667 || i/s: 4.90998
BufferCreateFP32: msecs: 1201 || ms/i: 200.167 || i/s: 4.99584
JustCopy: msecs: 4776 || ms/i: 4.776 || i/s: 209.38
SimpleSmooth: msecs: 10616 || ms/i: 10.616 || i/s: 94.1974
TexNoise: msecs: 5027 || ms/i: 5.027 || i/s: 198.926
3x3Conv: msecs: 20239 || ms/i: 40.478 || i/s: 24.7048
TEncode: msecs: 1112 || ms/i: 2.224 || i/s: 449.64
TDecode: msecs: 9063 || ms/i: 18.126 || i/s: 55.1694
LinDiffINT: msecs: 14901 || ms/i: 14.901 || i/s: 67.1096
LinDiffINT16: msecs: 14901 || ms/i: 14.901 || i/s: 67.1096
LinDiffFP16: msecs: 14891 || ms/i: 14.891 || i/s: 67.1547
LinDiffFP32: msecs: 17575 || ms/i: 17.575 || i/s: 56.899
PMTEncoded: msecs: 12258 || ms/i: 24.516 || i/s: 40.7897
PMStandard: msecs: 28290 || ms/i: 56.58 || i/s: 17.6741
PMBuffered: msecs: 21461 || ms/i: 85.844 || i/s: 11.649

Finished. Press return key to close...
                Don't forget to copy the results!
 
More weirdness:

I went ahead and installed ForceWare 61.77 and, like magic, running the last test went from a i/s = ~ 0.9 using ForceWare 71.84 to i/s = ~ 11.9. But there is more. I had some strange flux with alternating runs producing different scores. At first I thought maybe this was due to having changed the desktop resolution. I set it back to 800 x 600 on which I had conducted my first 61.77 run with a score of 11.9 to see if it was the resolution impacting performance. Then I observed scores alternating between ~11.9 and ~7.7. It suddenly seemed to stabilize at 11.7 and changing desktop resolution had no impact.

I realize this was developed using ATI hardware and using an ATI extension, but I must confess to being somewhat puzzled and concerned about the 6800's lacking performance. This is OpenGL after all.

Oh, and because I am a complete prat I forgot to make a full run using ForceWare 61.77 before reverting to 71.84 (which is back to scoring ~0.9 as expected).

Some results (the notes I took while testing. not very clear. these were taken when I was still suspecting desktop resolution to be affecting the scores):
Code:
ForceWare 61.77, FastWrites Enabled, AGP x8 @ 256MB aperture

Desktop resolution 800x600
Testing 1024x1024 image:
PMBuffered: msecs: 20968 || ms/i: 83.872 || i/s: 11.9229

Desktop resolution 1600x1200
Testing 1024x1024 image:
PMBuffered: msecs: 32469 || ms/i: 129.876 || i/s: 7.69965

Back to 800x600
Testing 1024x1024 image:
PMBuffered: msecs: 24610 || ms/i: 98.44 || i/s: 10.1585

Testing 1024x1024 image:
PMBuffered: msecs: 32219 || ms/i: 128.876 || i/s: 7.7594

stabilized @ 11.7
 
Hmmm. I see Bigz is getting the same score in the final test with 71.84 as I was getting using 61.77. Coincidence?
 
Back
Top