Matrix storage for vector instructions - row major vs column major

FoxMcCloud · May 8, 2013

Are there any articles discussing row major vs column major matrix storage from a performance perspective on modern game architectures? I'm familiar with the memory layout but none of the articles I found discuss performance considerations on SSE, AVX, PPU / SPU etc.

Ethatron · May 8, 2013

Arrange it in such a way that you can do fast dots(). Horizontal dot() is slower, at least on CPU, maybe on GPU as well because of banking conflicts.
The fastest should be like this:

result.xyzw =
vector.xxxx * matrix0.xyzw +
vector.yyyy * matrix1.xyzw +
vector.zzzz * matrix2.xyzw +
vector.wwww * matrix3.xyzw;

3 mad + 1 mul

FoxMcCloud · May 9, 2013

Yeah, I know that's what you ultimately want for linear transformations - I'm curious about which one tends to be used to make dot products fast on various current and near-future architectures.

Ethatron · May 10, 2013

That example is row-major. It's not used in shading languages by default. It's performance benefit can be negated if you use a higher amount of transposes. In general you may want to count the number of M and M^T multiplications you have to do, and use row-major for M-majority and column-major for M^T-majority.

Matrix storage for vector instructions - row major vs column major

FoxMcCloud

Ethatron

FoxMcCloud

Ethatron

Similar threads