If I understand correctly, the software vertex shader of DirectX has been written by Intel. Now, for best performance with SSE instructions, they have to put the data in a SoA format, right?
But how can they do this with vs 2.0, which allows dynamic branching? With the SoA format you have four components of four different shader registers in an SSE register. But since branch control is per vertex they can't keep it in this format.
So do they really use SoA or is it plain AoS where every SSE register corresponds with a shader register? If there is a big performance difference between fixed-function vertex processing and the corresponding shader, they must be using two implementation...
Any ideas?
But how can they do this with vs 2.0, which allows dynamic branching? With the SoA format you have four components of four different shader registers in an SSE register. But since branch control is per vertex they can't keep it in this format.
So do they really use SoA or is it plain AoS where every SSE register corresponds with a shader register? If there is a big performance difference between fixed-function vertex processing and the corresponding shader, they must be using two implementation...
Any ideas?