Shader swizzles and write-mask confusion

Ethatron

Regular
Supporter
I found the shader-assembly reference not quite clear, where the exact difference between lane-masking and assignment applies. I got some like this:

add r7.yz, r7.x, c2.xxyw

// lanes:
add r7.x, r7.x, c2.x -> masked
add r7.y, r7.x, c2.x
add r7.z, r7.x, c2.y
add r7.w, r7.x, c2.w -> masked

I assume r7.x is recognized as scalar, and internally put into this form:

add r7.yz, r7.xxxx, c2.xxyw

But what about this?:

add r7.yz, r7.xy, c2.xxyw // illegal?

Will it be:

add r7.yz, r7.xyyy, c2.xxyw

or is it even illegal syntax?

Thanks for any clarification.
 
With swizzles, the last given swizzle is replicated across the remaining channels. Thus "xy" is equivalent to "xyyy", "wz" is equivalent to "wzzz", etc.

Write masks are, just that, a mask so you can't change the order of them.
 
With swizzles, the last given swizzle is replicated across the remaining channels. Thus "xy" is equivalent to "xyyy", "wz" is equivalent to "wzzz", etc.

Write masks are, just that, a mask so you can't change the order of them.

The opcodes only have scalar or 4-vector form, even dot3 have 4-vectors parameters given. Did you know .xxy is extended by .y by deduction, or because you have some explicitly stating documentation of the assembler about it?

The vaguesness or rather incompleteness in MS' documentation crept up once you try to turn assembly into expressions:

add r0.xw, r1.xyyy, r2.xywz

which could have been:

r0.xw = r1.xy[yz] + r2.xy[wz] // naive swizzle, r1&r2 are truncated float2
r0.xw = r1.xy + r2.xy // then demultiplexed

which is wrong, it must be:

r0.x??w = r1.x[yy]z + r2.x[yw]z // lane masked swizzle
r0.xw = r1.xz + r2.xz // then demultiplexed

This distinction beween the ways of treating basically the same expression (asm isn't that different an expression) in asm. vs. HLSL is nowhere to be found explained; or maybe hard to come up with a sensefull search-term for this, if there are explanations in peoples blogs.
Maybe it only crops up when you go asm. -> HLSL, maybe humans never even really leverage the swizzle in the first place. I can't even really tell how I managed to get the idea the naive form is wrong without ever executing the translated codes (this is my first hard contact with shader assembly & HLSL & DX as a whole BTW).
 
The opcodes only have scalar or 4-vector form, even dot3 have 4-vectors parameters given. Did you know .xxy is extended by .y by deduction, or because you have some explicitly stating documentation of the assembler about it?
I know this because I've worked on this stuff for 9 years ;) I can't recall where the documentation is, but I assume the D3D9 docs have it some where.

The vaguesness or rather incompleteness in MS' documentation crept up once you try to turn assembly into expressions:

add r0.xw, r1.xyyy, r2.xywz

which could have been:

r0.xw = r1.xy[yz] + r2.xy[wz] // naive swizzle, r1&r2 are truncated float2
r0.xw = r1.xy + r2.xy // then demultiplexed

which is wrong, it must be:

r0.x??w = r1.x[yy]z + r2.x[yw]z // lane masked swizzle
r0.xw = r1.xz + r2.xz // then demultiplexed
There is no ambiguity here. DX asm code is inherently a vector-based language. Thus, write masks are *always* masks and not swizzles.

Your corrected version is still wrong. How can r1.xyyy be interpreted as r1.xz?!
Code:
add r0.xw, r1.xyyy, r2.xywz
is equivalent to
Code:
r0.x__w = r1.xy + r2.xz
 
Back
Top