That would not be reason enough. The partial precision hints can be used to indicate FP32 or FP16, and the same code can be used on both ATI and NVIDIA paths.Uttar said:However, a NV3X path is still required, because there's no way for the compiler to legally decide which precision is required - deciding if an instruction should be FP32 or FP16 ( or even FX12 in NVIDIA's proprietary extensions ) remains in the developer's hands.
RussSchultz said:That would not be reason enough. The partial precision hints can be used to indicate FP32 or FP16, and the same code can be used on both ATI and NVIDIA paths.Uttar said:However, a NV3X path is still required, because there's no way for the compiler to legally decide which precision is required - deciding if an instruction should be FP32 or FP16 ( or even FX12 in NVIDIA's proprietary extensions ) remains in the developer's hands.
MfA said:If the compiler knows that the error from using lower precision is small enough, <1 LSB of the final colour, then letting it use lower precision even if the developer didnt indicate it isnt an issue IMO.
This isn't really a big problem. It's not like you're switching texture/output buffer formats for every render call. And, for example, adding two 32bit RGBA texture values doesn't introduce any error even in FP12 mode.andypski said:Ergo those optimisations cannot be made, unless you want to carry around multiple compiled versions of the program for every possible configuration of input/output. The maximum allowable error when rendering into a 16-bit float buffer is different from an 8-bit fixed buffer.
Yes - the compiler shouldn't automagically determine precision - as this is a method that guarantees that it can generate incorrect results depending on inputs/outputs. In essence it would guarantee bugs.DemoCoder said:That said, it's difficult for the compiler to automagically determine precision intent, which is why the HLSL should support it as a hint.
At compile time I need to make a decision as to how to compile the code. If I predict that an input will be an 8-bit texture and then find later that a float texture is bound to it then it completely changes my precision and hence my code. A similar situation occurs with the output buffer. I might render using the same shader into a fixed point buffer in the general case, but into a float buffer for some special effects.Xmas said:This isn't really a big problem. It's not like you're switching texture/output buffer formats for every render call. And, for example, adding two 32bit RGBA texture values doesn't introduce any error even in FP12 mode.
You can either defer "compile time" until the input/output formats are known (rendering is an asynchronous process anyway). Or do as DemoCoder suggested and compile for the best case and a safe fallback, and then choose between the two.andypski said:At compile time I need to make a decision as to how to compile the code. If I predict that an input will be an 8-bit texture and then find later that a float texture is bound to it then it completely changes my precision and hence my code. A similar situation occurs with the output buffer. I might render using the same shader into a fixed point buffer in the general case, but into a float buffer for some special effects.
Sure it becomes very complex. I didn't argue against this.The example you give is also not always safe in FX12 mode - the act of filtering the textures could easily introduce extra bits between the 8-bit values of the components that would be inaccurately represented in an FX12 space. And this is with a shader so simple that it is not really a particularly interesting optimisation task for the compiler. Once I get to 20 or so instructions with a variety of source data precisions then the task of choosing appropriate levels without help becomes more complex.
Xmas said:You can either defer "compile time" until the input/output formats are known (rendering is an asynchronous process anyway). Or do as DemoCoder suggested and compile for the best case and a safe fallback, and then choose between the two.
True - I was just using your example to point out how something that people might regard as 'safe' might actually not be safe at all depending on the underlying architecture. Of course an IHV writing the compiler would be aware of these issues - if the architecture does produce extra bits in filtering then running at FX12 internally would then be 'robbing' you of these extra bits again - the low-level compiler writers might feel justified in this approach if they are looking for ways to give you more speed.Texture filtering could introduce extra bits, but there's no requirement to that. Texture filtering in general is rather ill-defined when it comes to precision.
There are problems with JIT compilation. For example, how will the application react to having less CPU time available to AI, physics, etc.?Xmas said:You can either defer "compile time" until the input/output formats are known (rendering is an asynchronous process anyway). Or do as DemoCoder suggested and compile for the best case and a safe fallback, and then choose between the two.andypski said:At compile time I need to make a decision as to how to compile the code. If I predict that an input will be an 8-bit texture and then find later that a float texture is bound to it then it completely changes my precision and hence my code. A similar situation occurs with the output buffer. I might render using the same shader into a fixed point buffer in the general case, but into a float buffer for some special effects.
The first couple of frames would be slow and jumpyOpenGL guy said:There are problems with JIT compilation. For example, how will the application react to having less CPU time available to AI, physics, etc.?Xmas said:You can either defer "compile time" until the input/output formats are known (rendering is an asynchronous process anyway). Or do as DemoCoder suggested and compile for the best case and a safe fallback, and then choose between the two.andypski said:At compile time I need to make a decision as to how to compile the code. If I predict that an input will be an 8-bit texture and then find later that a float texture is bound to it then it completely changes my precision and hence my code. A similar situation occurs with the output buffer. I might render using the same shader into a fixed point buffer in the general case, but into a float buffer for some special effects.
Why? Its still the same source that runs on every other board out there.Uttar said:Agreed, but I'd personally call that a "NVIDIA path" because beside NVIDIA, no IHV benefits from the partial precision hint.
I gotta agree with ya. As long as the developer isn't making concessions for register limits (still using a minimum number of registers, but not sacrificing a low instruction count to achieve that, thus leaving the optimization to the compiler) or dumping everything down to partial precision, it's quite a standard render path.RussSchultz said:Why? Its still the same source that runs on every other board out there.Uttar said:Agreed, but I'd personally call that a "NVIDIA path" because beside NVIDIA, no IHV benefits from the partial precision hint.
I think it'd be more of a case of working backwards from the intended result. If the target is only an 8 or 10 bit fixed-point colour channel, then a 16bit precision operation is probably ok. (Of course you'd have to assume that you weren't subtracting a huge number from another huge value but, then again, that sort of practice can get you into trouble with any floating point precision)andypski said:If I have a float input texture and an 8-bit input texture and add them then what precision should my compiler be using? Should it make a different precision choice if the 8-bit texture is point-sampled as opposed to if it is bilinear filtered?
I would highly discourage compiling any shaders inside the game loop. ATI have always been very clear about this in all the developer documentation that we've produced, and there are extremely good reasons for it.zeckensack said:The first couple of frames would be slow and jumpy
That's a function of shader complexity and compile effort/efficiency, of course. I've found 'jitting' tons of small, low level shaders to be completely unnoticable on your current drivers. (ARB_fp, <=20 instructions)