GPUBench - a CPU benchmark for GPU's

tb

Newcomer
http://gpubench.sourceforge.net/

6800 GT : 66.81 driver, all optimizations disabled
http://www.tommti-systems.de/main-Dateien/reviews/gpubench/6800gt_hq/index.html

X800 Pro : 4.11 beta driver, all optimizations disabled
http://www.tommti-systems.de/main-Dateien/reviews/gpubench/x800pro_hq/index.html

This benchmark is more like a CPU benchmark, texture filtering / mipmaps don't play a role in this kind of test, but number crunching / bus throughput is far more important.

Results of other GPU's(Volari, DeltaChrome, Radeon 9800, GeForce FX, ...) would be very interessing....

Thomas
 
tb said:
http://gpubench.sourceforge.net/

6800 GT : 66.81 driver, all optimizations disabled
http://www.tommti-systems.de/main-D...t_hq/index.html

X800 Pro : 4.11 beta driver, all optimizations disabled
http://www.tommti-systems.de/main-D...o_hq/index.html

This benchmark is more like a CPU benchmark, texture filtering / mipmaps don't play a role in this kind of test, but number crunching / bus throughput is far more important.

Results of other GPU's(Volari, DeltaChrome, Radeon 9800, GeForce FX, ...) would be very interessing....
Links don't work.

-FUDie
 
Can someone please explain the cache hit fetch costs? Specifically, why does the GT (if I understand the test description) become computation limited so darned quickly? Is this b/c it doesn't have a separate texturing unit, but rather has to substitute a tex fetch for a shader calc (thus being essentially "compute bound" from the get go)?

The GT is practically 2x faster in the instruction co-/issue tests, and .5-2x faster in the bandwidth tests. Its MRT results (2x ATi) also seem a little ironic (FX didn't support MRT, right?).

Thomas, does enabling driver optimizations (AI/tri/AF) affect the scores significantly?
 
Driver optimizations like Catalyst AI or NVIDIA's stuff don't have an effect on this benchmark, because this benchmark doesn't use linear / aniso. filtering.

Thomas
 
What about those completely out of the line ADD results? Why is the ADD result on NV40 double that of what you'd expect (with SU1 not being ADD capable). And more, why isn't the SUB result exactly the same?
Why does R420 perform ADD faster than any other op, but not twice as fast?
 
tb said:
Driver optimizations like Catalyst AI or NVIDIA's stuff don't have an effect on this benchmark, because this benchmark doesn't use linear / aniso. filtering.
You think those are the only driver optimizations that can be made?
 
Chalnoth said:
tb said:
Driver optimizations like Catalyst AI or NVIDIA's stuff don't have an effect on this benchmark, because this benchmark doesn't use linear / aniso. filtering.
You think those are the only driver optimizations that can be made?

I meant the optimizations, which the user can control.

Thomas
 
It would probably ne useful to show what the shader code is in these simple cases.

It's possible that some of the more peculiar results are a function of the optimisers in the drivers (which is fine), but artificial tests like this need to find a way to avoid them.
 
Here is the shader code for "instrissue -n -a -l 64 -m"
Code:
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
TEMP T2;
TEMP T3;
MOV T0, C0;
MOV T1, C1;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD T1.xyzw, T2, T3;
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;
ADD T0.xyzw, T2, T1;
ADD result.color.xyzw, T0, T3;
END

512      10.3172       ADD          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
TEMP T2;
TEMP T3;
MOV T0, C0;
MOV T1, C1;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB T1.xyzw, T2, T3;
SUB T2.xyzw, T0, T0;
SUB T3.xyzw, T1, T1;
SUB T0.xyzw, T2, T1;
SUB result.color.xyzw, T0, T3;
END

512      5.4158       SUB          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
TEMP T2;
TEMP T3;
MOV T0, C0;
MOV T1, C1;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL T1.xyzw, T2, T3;
MUL T2.xyzw, T0, T0;
MUL T3.xyzw, T1, T1;
MUL T0.xyzw, T2, T1;
MUL result.color.xyzw, T0, T3;
END

512      10.4717       MUL          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD T0.xyzw, T0, T0, T0;
MAD T1.xyzw, T1, T1, T1;
MAD result.color.xyzw, T0, T1, T1;
END

512      5.2442       MAD          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
EX2 T0, T0.x;
EX2 T1, T1.x;
ADD result.color, T0, T1;
END

512      4.5594       EX2          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
LG2 T0, T0.x;
LG2 T1, T1.x;
ADD result.color, T0, T1;
END

512      5.0349       LG2          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW T1, T1.x, T1.x;
POW T0, T0.x, T0.x;
POW result.color, T0.x, T1.x;
END

512      2.4475       POW          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
FLR T1, T1;
FLR T0, T0;
ADD result.color, T0, T1;
END

512      5.2862       FLR          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
FRC T1, T1;
FRC T0, T0;
ADD result.color, T0, T1;
END

512      5.4045       FRC          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
RSQ T0, T0.x;
RSQ T1, T1.x;
ADD result.color, T0, T1;
END

512      2.7068       RSQ          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
RCP T0, T0.x;
RCP T1, T1.x;
ADD result.color, T0, T1;
END

512      5.4758       RCP          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
SIN T0, T0.x;
SIN T1, T1.x;
ADD result.color, T0, T1;
END

512      5.3360       SIN          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
COS T0, T0.x;
COS T1, T1.x;
ADD result.color, T0, T1;
END

512      5.3356       COS          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
SCS T0, T0.x;
SCS T1, T1.x;
ADD result.color, T0, T1;
END

512      5.2454       SCS          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
DP3 T1, T1, T1;
DP3 T0, T0, T0;
ADD result.color, T0, T1;
END

512      5.4167       DP3          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
DP4 T1, T1, T1;
DP4 T0, T0, T0;
ADD result.color, T0, T1;
END

512      5.3765       DP4          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
MOV T0, C0;
MOV T1, C1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
XPD T1, T1, T1;
XPD T0, T0, T0;
ADD result.color, T0, T1;
END

512      4.9102       XPD          4         64

Thomas
 
ADD T2.xyzw, T0, T0;
ADD T3.xyzw, T1, T1;

I'd look at these lines (and those like them) as possible culprits, does anyone know if the 6800 can still do the free multiply by 2. Obviously this would explain the disparity in the add/sub tests.

You'd probably be better off with code that always accumulates to the previous result, but that gives the scheduler no leeway and it might give artificially low results.
 
Actually looking at more of the code it looks like the drivers are doing at most some simple peephole optimisation, the add and the sub could both be significantly simplified by a smart optimiser. They probably concentrate on register allocations and functional unit optimisation, rather than expression simplification. Make me wonder how much expression work they do in the HLSL compilers where people are a lot less careful with there source code.

Of course from a driver standpoint optimizing for this sort of test is somewhat pointless, because in real programs you have significant data dependencies, and you can usually rely on the programmer for something in the ballpark of optimal.

It's a pity there is no way to tell how many final ALU ops are actually being executed, and no way to control the way the assember will optimize.
 
A clever optimizer would reduce the ADD shader to
result = x * c0 + y * c1
and the SUB shader to
result = 0

The MUL shader might be reduced to two POW and a MUL.
 
I've changed the code for (add, sub and mul) a little bit:

Code:
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
TEMP T2;
TEMP T3;
MOV T0, C0;
MOV T1, C1;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD T3.xyzw, T3, T2;
ADD T0.xyzw, T0, T1;
ADD T1.xyzw, T1, T0;
ADD T2.xyzw, T2, T1;
ADD result.color.xyzw, T0, T3;
END

512      5.8586       ADD          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
TEMP T2;
TEMP T3;
MOV T0, C0;
MOV T1, C1;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB T3.xyzw, T3, T2;
SUB T0.xyzw, T0, T1;
SUB T1.xyzw, T1, T0;
SUB T2.xyzw, T2, T1;
SUB result.color.xyzw, T0, T3;
END

512      5.7866       SUB          4         64
shader: !!ARBfp1.0
PARAM C0=program.env[0];
PARAM C1=program.env[1];
TEMP T0;
TEMP T1;
TEMP T2;
TEMP T3;
MOV T0, C0;
MOV T1, C1;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL T3.xyzw, T3, T2;
MUL T0.xyzw, T0, T1;
MUL T1.xyzw, T1, T0;
MUL T2.xyzw, T2, T1;
MUL result.color.xyzw, T0, T3;
END

http://www.tommti-systems.de/main-Dateien/reviews/gpubench/6800gt_hq_v2/index.html

Thomas
 
Well that makes the add result look more reasonable, I don't see anything in the mul code that would be obvious for the optimiser, but it might look different after it does instruction reordering and register reassigments.

Part of the problem is that I'd put money on the heuristics the "assembler" uses being heavilly weighted towards instruction sequences that have appeared in games. You might want to try a couple of other sequences and see if the results vary.

To be honest I don't have enough detailed information on the platforms to know what the "right" answers are.
 
Code:
C:\temp\docs\gpubench\bin>instrissue -n -a -l 64 -m
512      0.5845       ADD          4         64
512      0.5757       SUB          4         64
512      0.5845       MUL          4         64
512      0.5701       MAD          4         64
512      0.5785       EX2          4         64
512      0.5785       LG2          4         64
512      0.2998       POW          4         64
512      0.5872       FLR          4         64
512      0.5872       FRC          4         64
512      0.2976       RSQ          4         64
512      0.5785       RCP          4         64
512      0.5785       SIN          4         64
512      0.5785       COS          4         64
512      0.5463       SCS          4         64
512      0.5872       DP3          4         64
512      0.5872       DP4          4         64
512      0.2962       XPD          4         64

FX5600 325MHz/550MHz 66.81

GeForce FX can not execute some benchmarks because not support MRT/glDrawBuffersATI.

how to output a results html file like tb linked ?
 
I dont see many shaders in the /shader directory. Looks like most of the shaders are hard-coded into the .exe, after reading the src code. Is there any other better way to modify the shaders without a re-compile?
 
cho said:
Code:
C:\temp\docs\gpubench\bin>instrissue -n -a -l 64 -m
512      0.5845       ADD          4         64
512      0.5757       SUB          4         64
512      0.5845       MUL          4         64
512      0.5701       MAD          4         64
512      0.5785       EX2          4         64
512      0.5785       LG2          4         64
512      0.2998       POW          4         64
512      0.5872       FLR          4         64
512      0.5872       FRC          4         64
512      0.2976       RSQ          4         64
512      0.5785       RCP          4         64
512      0.5785       SIN          4         64
512      0.5785       COS          4         64
512      0.5463       SCS          4         64
512      0.5872       DP3          4         64
512      0.5872       DP4          4         64
512      0.2962       XPD          4         64

FX5600 325MHz/550MHz 66.81

GeForce FX can not execute some benchmarks because not support MRT/glDrawBuffersATI.

how to output a results html file like tb linked ?

For the html results, you have to run the gpubench.pl perl script, I reccomed you install cygwin (with perl, jgraph...) and let it run from there.

Thomas
 
This seems like a rather daft benchmark. Unless I've missread the code, it looks like every instruction is dependent on the previous one. That is highly unlikely to give a realistic estimate of system performance. At least try coding something where there is a decent percentage of independent instructions. (Think throughput VS latency)
 
Back
Top