DiGuru said:
EDIT: another interesting question:
How and where do they store if 16 or 32 bit precision is used?
If it's anything like how CPU instruction sets are done, the information whether a register stores a single 32 bit value or two packed 16 bit values is "stored" in the instructions that are used. So, for example, there will be an ADD32 instruction that treats the two source registers as 32-bit values, and an ADD16 instruction that treats them as 16-bit values.
Presumably the extra 16-bit registers are architected into the native machine instructions. If we take the NV_fragment_program spec as a clue, NV3x seems to have 32 architected 32-bit registers, and 64 16-bit registers. So we might assume that 32-bit instructions can only address 32 registers (called R0 - R31 in NV_fragment_program), while with the 16-bit instructions you can address 64 (H0 - H63). The important point is, if that scheme is correct, a 16-bit instruction writing to H32 would also write to the upper half of R0 (if you read it with a 32-bit instruction).
Of course it's not clear this is exactly what's going on--for one thing, it doesn't seem NV3x actually has 32/64
physical registers (even though, again, that's the number architected in NV_fragment_program), so who knows how many are architected into the internal machine instruction set. But it seems pretty likely that something like this is what's going on. Of course the aliasing issue I mentioned above would be very confusing if the internal machine language were programmer visible, but as it isn't it shouldn't be a problem.
And do they combine the mantissa and offset or split the register in half?
Well, the exponent and mantissa fields in FP32 are not just double the width of the corresponding fields in FP16: FP16 is s10e5, while FP32 is s23e8, so it would seem like the something like the latter must be what's going on. In any case, this shouldn't make any programmer-visible difference, unless the aliasing issue I mentioned above can be triggered through a programmer visible API (i.e. PS 2.0, ARB_f_p, NV_f_p); but I don't think it can, so the question is sort of irrelevent.