HLSL doesn't need big extensions to support scalar unit registers and operations. The simplest implementation would be to add a "wave_invariant" keyword. This keyword could be used at variable declarations ( wave_invariant int4 myInteger; ) to make these variables scalar registers. Data loads to variables of this type could be optimized to use scalar loads. Assignments from vector registers would take the first lane, assignments from scalar to vector register would broadcast the value to all lanes. And the HLSL sampling and load instruction would of course need another overload that takes a int4 scalar (or two of them) as a parameter (to support generated resource descriptors). Simple as that. The shader would obviously only work on certain wave width (all AMD hardware is 64 wide). Similar thing has been the case with CUDA since the beginning (32 wide warps) and all optimized code assumes that. There have been no compatiblity problems so far.You'd need non-trivial modifications to the shading language to support shader generating descriptors (unless you want to go through memory, which defeats the purpose). AFAIK Mantle currently only supports ~HLSL with minor extensions.
GCN scalar unit has full integer instruction set for 32 bit scalars, and reduced instruction set fo 64 bit scalars (no multiply, but all the necessary bitwise operations are present to construct a resource descriptor solely by scalar ALU code). I haven't programmed Mantle, so I don't know whether it allows developers to write shaders directly by GCN microcode. I haven't followed the PC AMD GPU low level microcode details that closely to say whether this would break compatibility between the different GCN versions.
Last edited: