pocketmoon_
Newcomer
I have 5 shaders in both full and partial precision versions:
1) A fake noise shader - no samplers, lots of nasty maths requires full precision.
2) Simple 2x sampler (same texture) , add and output
3) 5xSampler - cascaded dependend reads
4) A median filter - 5 samples, LOTS of 'conditional branching'
5) A bilinear filter - 4 samples and some maths.
I have used NV30 and ARBFP1 profiles for OpenGL and PS_2_x for DirectX9 tested in a Quadro FX 2000.
Results.
1) PS2X runs fasters but all are close. Only NV30 profile with FP displays correct results (FP needed for this one)
2) NV30 FP is about 10% faster than PS2X and 20% faster than ARBFP1. PP was slowest of them all!
3)Similar results to 2.
4)NV30 wins. Both FP and PP are >6x faster than PS2X and >3x ARBFP1
5)NV30 30% faster than PS2X
What does this tell us ? God knows
If you shader is mostly texture samplers than using PP appears to slow things down ?!
With lots of conditionals the NV30 profile is very very quick.
Some optimisations work better on floats e.g.
collapsing
a = (b.x + b.y + b.z) * 0.20
into
a = dot(b.xyz, float(.2,.2,.2))
gives a speed up for floats but not halfs ?!
Of course a LOT depends on the compiled shader code produced by each Cg profile. E.g. the NV30 profile for shader 4 (the median filter) is 104 instructions, but for PS2X is 149 instructions.
It will be interesting to see the impact the DirectX PP fix will have in the future.
1) A fake noise shader - no samplers, lots of nasty maths requires full precision.
2) Simple 2x sampler (same texture) , add and output
3) 5xSampler - cascaded dependend reads
4) A median filter - 5 samples, LOTS of 'conditional branching'
5) A bilinear filter - 4 samples and some maths.
I have used NV30 and ARBFP1 profiles for OpenGL and PS_2_x for DirectX9 tested in a Quadro FX 2000.
Results.
1) PS2X runs fasters but all are close. Only NV30 profile with FP displays correct results (FP needed for this one)
2) NV30 FP is about 10% faster than PS2X and 20% faster than ARBFP1. PP was slowest of them all!
3)Similar results to 2.
4)NV30 wins. Both FP and PP are >6x faster than PS2X and >3x ARBFP1
5)NV30 30% faster than PS2X
What does this tell us ? God knows
If you shader is mostly texture samplers than using PP appears to slow things down ?!
With lots of conditionals the NV30 profile is very very quick.
Some optimisations work better on floats e.g.
collapsing
a = (b.x + b.y + b.z) * 0.20
into
a = dot(b.xyz, float(.2,.2,.2))
gives a speed up for floats but not halfs ?!
Of course a LOT depends on the compiled shader code produced by each Cg profile. E.g. the NV30 profile for shader 4 (the median filter) is 104 instructions, but for PS2X is 149 instructions.
It will be interesting to see the impact the DirectX PP fix will have in the future.