I don't see how I could have been any more explicit. Anyway...Dio said:Can you paste the tested loop in?
Code:
#include <xmmintrin.h>
void main()
{
__m128 x;
__m128 y;
for(int i = 0; i < 1000000000; i++)
{
__asm
{
movaps xmm0, x
movaps xmm1, y
addps xmm0, xmm1
addps xmm1, xmm0
addps xmm0, xmm1
addps xmm1, xmm0
movaps x, xmm0
movaps y, xmm1
}
}
}
I scanned through swShader/Shader/ps20Assembler.cpp and found 272 *ps instructions and 93 *ss instructions. So I think it's reasonable to say that in average I use 80% of the ALU. Ok that could have been 100% by using SoA but that 20% isn't going to make up for the data reordering and register spilling.[/code]You are right that swizzles are rare in pixel shaders. But how many operations actually need all the components? Typically, there are some calculations which are scalar, and many which don't require the alpha component. That's 3/4 and 1/4 of the SSE ALU (respectively) that's doing nothing useful...