stereosoba
Newcomer
Hello,
I'd like to state before all that I'm quite a newbie in GPU technology and may be overlooking lots of stuff, using the wrong vocabulary or mistaking the purpose of this forum. Sorry in advance and don't hesitate to correct me![Smile :) :)](data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7)
I'm now writing vertex programs in CG for a RSX (PS3) profile, although I believe my question is not strictly PS3 related.
My VP has:
out float4 out_pos : POSITION0
And in my context I have 3 float2 values named sr2, sg2, sb2. I was surprised to find that when passing the shader thru NvShaderPerf those two blocks resulted in a difference in the output:
// <some code>
#if 1
// Total shader is 36 cycles
out_pos.xy += sr2;
out_pos.xy += sg2;
out_pos.xy += sb2;
#else
// Total shader is 33 cycles
out_pos.xy += sr2 + sg2 + sb2;
#endif
// <some code>
Now I know the shader is globally optimized, but regardless of other contents in the code, I cannot tell why those two blocks are any different in the first place. I thought that shader compilers optimized quite agressively and would naturally produce the same output for this case.
Can you explain it or direct me to information to understand this phenomena and perhaps how shader units works more in details?
Thank you.
I'd like to state before all that I'm quite a newbie in GPU technology and may be overlooking lots of stuff, using the wrong vocabulary or mistaking the purpose of this forum. Sorry in advance and don't hesitate to correct me
I'm now writing vertex programs in CG for a RSX (PS3) profile, although I believe my question is not strictly PS3 related.
My VP has:
out float4 out_pos : POSITION0
And in my context I have 3 float2 values named sr2, sg2, sb2. I was surprised to find that when passing the shader thru NvShaderPerf those two blocks resulted in a difference in the output:
// <some code>
#if 1
// Total shader is 36 cycles
out_pos.xy += sr2;
out_pos.xy += sg2;
out_pos.xy += sb2;
#else
// Total shader is 33 cycles
out_pos.xy += sr2 + sg2 + sb2;
#endif
// <some code>
Now I know the shader is globally optimized, but regardless of other contents in the code, I cannot tell why those two blocks are any different in the first place. I thought that shader compilers optimized quite agressively and would naturally produce the same output for this case.
Can you explain it or direct me to information to understand this phenomena and perhaps how shader units works more in details?
Thank you.