I have read conflicting statements about the usefulness of Xenos's FP10 mode. It's my understanding that in the NV3x cores, there were 2 different levels of floating point precision available within the ALU's themselves (FP16 or FP32). At times with long shaders the accumulation errors of FP16 were noticable. All ATI pixel shader ALU's operate internally at FP24, and new NV4x+ ALU always work at FP32, but intermediate results can be rounded back to FP16 to save register space.
With Xenos, I assume, all ALU's are always working in FP32 internally and that intermediate results to registers can be saved at that same level of precision. FP16 is an available input and output format, along with other non-float formats.
Is it correct to say that FP10 is simply another available output format, and is not a precision level used within the ALU's at all? That it is a coarser float format to save bandwidth between parent and daughter dies, and more importantly, to save space in eDram. And that any errors incurred through its use, would be a result of multipass techniques or alpha blends. Both of which "should" be minimal.
For HDR, an FP16 framebuffer could/would still be used for render to texture operations, but developers would just target sizes that best fit the available eDram. Final rendering passes would then be performed, and output with FP10 to reduce the tile requirement. Yes / No ?
With Xenos, I assume, all ALU's are always working in FP32 internally and that intermediate results to registers can be saved at that same level of precision. FP16 is an available input and output format, along with other non-float formats.
Is it correct to say that FP10 is simply another available output format, and is not a precision level used within the ALU's at all? That it is a coarser float format to save bandwidth between parent and daughter dies, and more importantly, to save space in eDram. And that any errors incurred through its use, would be a result of multipass techniques or alpha blends. Both of which "should" be minimal.
For HDR, an FP16 framebuffer could/would still be used for render to texture operations, but developers would just target sizes that best fit the available eDram. Final rendering passes would then be performed, and output with FP10 to reduce the tile requirement. Yes / No ?