ATi, Cg and swizzling

Randell

Senior Daddy
Veteran
Is it true one of the issues with the R300 & Cg range is that they do not fully follow the ARB specs in relation to swizzling, which is one reason why Cg doesnt compile well to the R300 in OGL? If that is the case, why didn't ATI do this, is there an alternative route?

Remebr I'm a layman :)
 
As far as I know there is no problem with the spec and swizzling on R300, so I would have thought that any problem lies elsewhere. I can't say I've looked at this specific case though.
 
Ok, it was soemthing someone posted on anotehr board - here is what they say.

'Now, going back to what i said about the output produced by Cg not being 100% compatible with the Radeon series of cards (I speak from an OpenGL pov btw, I dont know about D3D stuff) when compiled with teh ARB profile..
This ISNT because they are not following the Nvidia Cg specs, this is because they [ATI] havent completely covered teh ARB shader specs which say that a card should be able to do swizzling in one operation (swizzling being an instruction such as this : mov reg.xyzw reg.xywz so the z and w components are swapped around during the move instruction), which the 9700 and below cant do.

In conclusion, the Cg output correctly follows the ARB specs, its the card which doesnt follow them 100%, so this isnt a case of Nvidia dictating things, its a case of one of the (few) cases where ATI dont quite do things properly.'

I thought I'd check with you guys, before I responded :)
 
Randell said:
Ok, it was soemthing someone posted on anotehr board - here is what they say.

'Now, going back to what i said about the output produced by Cg not being 100% compatible with the Radeon series of cards (I speak from an OpenGL pov btw, I dont know about D3D stuff) when compiled with teh ARB profile..
This ISNT because they are not following the Nvidia Cg specs, this is because they [ATI] havent completely covered teh ARB shader specs which say that a card should be able to do swizzling in one operation (swizzling being an instruction such as this : mov reg.xyzw reg.xywz so the z and w components are swapped around during the move instruction), which the 9700 and below cant do.

In conclusion, the Cg output correctly follows the ARB specs, its the card which doesnt follow them 100%, so this isnt a case of Nvidia dictating things, its a case of one of the (few) cases where ATI dont quite do things properly.'

I thought I'd check with you guys, before I responded :)

My understanding of the ARB fragment program spec (based on reading the language syntax) is that swizzling any register requires a special swizzle instruction - you can't arbitrarily specify a swizzle on each source register as you can in vertex programs. I can't claim to be an expert on this, though so don't quote me. ;)
 
Ok, now you are losing me. I understood that the R300 can 'swizzle', but not perform 'arbitrary swizzling' is how I understood it, after nVidia claimed the R300 couldn't perform swizzling. So are saying the ARB specs dont allows for arbitrary swizzling anyway?

Either way, does his comments have the ring of truth?
 
The way I read the ARB extension, complete rgba/xyzw swizzles are available for all source arguments to all instructions that take vector arguments. There is also a special instruction "SWZ" that can include the numeric constants 0.0 and 1.0 into the swizzle as well.

Dunno what limitations the ATI architecture has, though.
 
Randell said:
Ok, now you are losing me. I understood that the R300 can 'swizzle', but not perform 'arbitrary swizzling' is how I understood it, after nVidia claimed the R300 couldn't perform swizzling. So are saying the ARB specs dont allows for arbitrary swizzling anyway?

Either way, does his comments have the ring of truth?

I'm not sure - I've just read it again and it looks like arbitrary swizzling is in the ARB spec.

I think I'll have to plead ignorance - I haven't really looked at the ARB fragment shader spec before.
 
Randell said:
Np andypski, thanks for answering.

shame to see Man U lose wasn't it :devilish:

I'm kind of torn - I always feel that I should want English teams to do well in European competitions, but it's always hardest when it's ManU. :)

I had no problem supporting the barcodes at all (while they were still in it).
 
andypski said:
Randell said:
Np andypski, thanks for answering.

shame to see Man U lose wasn't it :devilish:

I'm kind of torn - I always feel that I should want English teams to do well in European competitions, but it's always hardest when it's ManU. :)

I had no problem supporting the barcodes at all (while they were still in it).

I'm always torn, until I see them losing, then a sort of joy overtakes me :)
 
arjan de lumens said:
The way I read the ARB extension, complete rgba/xyzw swizzles are available for all source arguments to all instructions that take vector arguments. There is also a special instruction "SWZ" that can include the numeric constants 0.0 and 1.0 into the swizzle as well.

Dunno what limitations the ATI architecture has, though.

Yes, full swizzling is available in the ARB_fragment_program. However, the r300 does not support all possible swizzles in hardware, so sometimes more than one instruction is needed. I'm not entirely sure about what swizzles it can't do, but I think it can propagate or reorder components, that is, combinations such as x/y/z/xyz/yzx/xzy etc, but I don't think it can do xxy/xyy/yzz etc, so these will require additional instructions. This should work fine though as long as the shader keeps within the hardware instruction count.
 
so Cg not compiling R300 code properly using the ARB path is easily fixed - if ATI wanted to? Does Cg assumes it can do a kind of swizzling which the R300 cant do then?
 
Humus said:
Yes, full swizzling is available in the ARB_fragment_program. However, the r300 does not support all possible swizzles in hardware, so sometimes more than one instruction is needed. I'm not entirely sure about what swizzles it can't do, but I think it can propagate or reorder components, that is, combinations such as x/y/z/xyz/yzx/xzy etc, but I don't think it can do xxy/xyy/yzz etc, so these will require additional instructions. This should work fine though as long as the shader keeps within the hardware instruction count.

Well if PS2.0 matches the R300 abilities then the following swizzles are available:

Code:
xyzw  yzxw  zxyw  wzyx
xxxx  yyyy  zzzz  wwww
 
Randell said:
so Cg not compiling R300 code properly using the ARB path is easily fixed - if ATI wanted to? Does Cg assumes it can do a kind of swizzling which the R300 cant do then?

To my knowledge the driver already supports arbitrary swizzling, at least it should, since the ARB_fragment_program declares it should, it will just take more instructions. For instance,

MUL a, b, c.xxy;

would be expanded to something like

MOV temp, c.x;
MOV temp.z, c.y;
MUL a, b.xyz, temp;

Unless the instruction count overflows the 64 ALU instruction limit this should work fine.
 
Humus said:
To my knowledge the driver already supports arbitrary swizzling, at least it should, since the ARB_fragment_program declares it should, it will just take more instructions. For instance,

MUL a, b, c.xxy;

would be expanded to something like

MOV temp, c.x;
MOV temp.z, c.y;
MUL a, b.xyz, temp;

Unless the instruction count overflows the 64 ALU instruction limit this should work fine.
Yes, it is possible to recreate every possible swizzle by combining those eight patterns. Unfortunately this might be costly in terms of temp register usage and instruction slots.

Another weak point is that an "optimized" algorithm with clever swizzling where appropriate may become significantly slower than a non-optimized version on limited swizzle hardware. But that's another reason why you usually should use scalar ops where appropriate and leave the optimization to the compiler.

btw, that reminds me of the mandelbrot shader. Humus, are you going to update your article/demo with the faster versions? ;)
 
Gah, have forgot all about that. Will have to check that out, the deadline is coming close.
 
Humus said:
For instance,

MUL a, b, c.xxy;

would be expanded to something like

MOV temp, c.x;
MOV temp.z, c.y;
MUL a, b.xyz, temp;

Are you sure about this?
The ARB_fragment_program spec does not describe 3 component swizzles.
Assuming the behaviour is the same as in DX9 then it should be expanded as:

MOV temp.xy, c.x;
MOV temp.zw, c.y;
MUL a, b, temp;
 
Well ... seams like I've worked too much in HLSL's. :) Haven't written a single line assembly shader code the latest two months.
 
Back
Top