3 vs 4 components vectors

My question is simple though more subtle than it may first seem : should I use 3 or 4 components vectors for my vertices data (position, normals...) ?

With 4 components vectors, I can use SSE instructions to speed up the computations done on the CPU side but obviously I'm going to waste some GPU memory if I send ma data "as is".
I could convert my vectors to 3 component vectors before sending them but the process seems like a waste of time for streamed data sent each frame to the GPU.

And on the other hand, 3 components vectors aren't properly aligned so I can't use SSE on these. However, I can send them to the GPU "as is" without wasting GPU memory.

Neither solution feels good to me. I know I could do some benchmarking (and I probably will) but I suspect results may vary widely depending on the situation and I'd like to know how this scopes with full scale applications.

That's why I'd be glad to know how you handle this in your projects.

PS : if it wasn't clear I'm targeting PC architectures, though hearing about how this applies to consoles would be interesting too.
 
You should look at the alignment of your *whole vertex structure*, not just the one element. If using either 3/4 vectors puts you at a multiple of 16 bytes, definitely use that! I wouldn't be too concerned about the extra memory usage, but keeping your vertex data structure at a regular size will help the GPU quite a bit. In many cases it's even worth padding it up to the next multiple of 16, particularly if you're already "close".
 
I've seen (very recently) a game bottlenecked by vertex transfer, so I'm inclined to say you should pack your vertices optimally. What are you doing to them on the CPU every frame, so CPU performance and SSE2 alignment etc is so important? Ideally you shouldn't touch (most) of your vertex buffers on the CPU.

Also, what Andrew said: you should prefer 16/32/48/64-byte vertices.
 
And don't forget to utilize the fourth byte in the DWORD ;-)

Better yet, store only the xy components of the normal (to the first two bytes of the DWORD), and calculate z = sqrt(1-x*x+y*y) in the shader. This way the tangent xy fits nicely to the last two bytes (calculate tangent z also using same formula). Bitangent then can be calculated by a cross product of these two 3d vectors. However if you calculate these 3 vectors like this, you have to store 3 sign bits somewhere (an extra byte in some other vertex input).

Alternatively to save lots of ALU instructions you could just store normal in the first 3 bytes of the DWORD (as suggested above), and tangent to another DWORD. Use the fourth byte as a sign for the bitangent (cross product).
 
Last edited by a moderator:
Thanks you all for your answers !

What are you doing to them on the CPU every frame, so CPU performance and SSE2 alignment etc is so important?

I was planning to make skinned animation on the CPU but the more I think about it the more I think I should do it on the GPU. I just hope it won't be a problem on low end video cards.

Zengar said:
BTW, consider storing your normals in byte format (one DWORD for normal).

Indeed, but can you send signed bytes to the video card with Direct3D ? I though you only could send unsigned bytes (thought you can send signed 16 bits words). It's not a problem since I can convert the values in the vertex shaders but I may have not seen a way to send them directly as signed values.
 
Back
Top