CG compiler optimizer and commutative addition

stereosoba

Newcomer
Hello,

I'd like to state before all that I'm quite a newbie in GPU technology and may be overlooking lots of stuff, using the wrong vocabulary or mistaking the purpose of this forum. Sorry in advance and don't hesitate to correct me :)

I'm now writing vertex programs in CG for a RSX (PS3) profile, although I believe my question is not strictly PS3 related.

My VP has:
out float4 out_pos : POSITION0

And in my context I have 3 float2 values named sr2, sg2, sb2. I was surprised to find that when passing the shader thru NvShaderPerf those two blocks resulted in a difference in the output:

// <some code>
#if 1
// Total shader is 36 cycles
out_pos.xy += sr2;
out_pos.xy += sg2;
out_pos.xy += sb2;
#else
// Total shader is 33 cycles
out_pos.xy += sr2 + sg2 + sb2;
#endif
// <some code>

Now I know the shader is globally optimized, but regardless of other contents in the code, I cannot tell why those two blocks are any different in the first place. I thought that shader compilers optimized quite agressively and would naturally produce the same output for this case.

Can you explain it or direct me to information to understand this phenomena and perhaps how shader units works more in details?

Thank you.
 
The reason for the difference is due to floating point maths, the two statements are not actually equivilant. If out_pos.xy is a big number and the other values are small there may be a difference in the final result depending on which code is used.
 
I think I had the misguided impression that CG compilers were more lax regarding floating points accuracy artefacts, than say, C++ compilers, but now I that I think about it there is no reason for that to be.

Thanks for your answer!

(Note: I tried CG compiler parameters such as --fastmath or --fastprecision but it didn't change anything in that specific case.)
 
I think I had the misguided impression that CG compilers were more lax regarding floating points accuracy artefacts, than say, C++ compilers, but now I that I think about it there is no reason for that to be.

Thanks for your answer!

(Note: I tried CG compiler parameters such as --fastmath or --fastprecision but it didn't change anything in that specific case.)
I think that you can generally expect that any shader compiler is likely to be very conservative about floating point optimizations when applied to calculations that will directly contribute to vertex position.

Vertex position is bound to be very twitchy when it comes to small differences in calculations potentially generating large differences in the output image. Surfaces that are co-planar or close to co-planar will very quickly start to show very visible problems if the math operations generating their locations are not identical. I think that generally compilers under these circumstances would attempt to guarantee that these errors are as far as possible solely the fault of the developer rather than the compiler, regardless of which optimization mode you select... :)
 
Back
Top