Wel, this might not be as interesting as the NV40/R420 discussion, but I want an answer.
I did some quick test on R300, here's the result:
takes 1 clock
takes 2 clocks
These results suggest R300's mini alu can only do add(I know it can also be register modifier). And I have more:
takes 2 clocks, which leads me to think maybe the full shader core can not do add, a little similiar to NV40. So I change the shader to:
takes 1 clock, as you can see, the last instruction is obviously taken care of by the mini alu, so the full alu can do add for sure.
Now the question is: what makes the full and mini alu can't do add simutaneously?
I did some quick test on R300, here's the result:
Code:
mov r1, c0
texld r0, t0, s0
mul r0, r0, c2
add r0, r0, r1
Code:
mov r1, c0
texld r0, t0, s0
add r0, r0, r1
mul r0, r0, c2
These results suggest R300's mini alu can only do add(I know it can also be register modifier). And I have more:
Code:
mov r1, c0
texld r0, t0, s0
add r0, r0, r1
add r0, r0, c2
Code:
def c5, 2.0f, 4.0f, 8.0f, 1.0f
mov r1, c0
texld r0, t0, s0
add r0, r0, r1
mul r0, r0, c5.r
Now the question is: what makes the full and mini alu can't do add simutaneously?