Nvidia's unified compiler technology

rwolf

Rock Star
Regular
Does Nvidia's unified compiler technology mean the end of writing an NV3X specific path for software? Can game developers now create one DX9 path for all software?
 
From my understanding, this can reduce the developer's job, because he doesn't have to worry about register usage or instruction order anymore.

However, a NV3X path is still required, because there's no way for the compiler to legally decide which precision is required - deciding if an instruction should be FP32 or FP16 ( or even FX12 in NVIDIA's proprietary extensions ) remains in the developer's hands.

Of course, it should still be possible to gain performance by worrying about register usage, but I doubt it'd be worth it - most of the time, I suspect you'd lose hours of your time for just 2-3% higher performance.


Uttar

EDIT: But of course, even with this, ATI still beats the hell out of NVIDIA :p
 
Uttar said:
However, a NV3X path is still required, because there's no way for the compiler to legally decide which precision is required - deciding if an instruction should be FP32 or FP16 ( or even FX12 in NVIDIA's proprietary extensions ) remains in the developer's hands.
That would not be reason enough. The partial precision hints can be used to indicate FP32 or FP16, and the same code can be used on both ATI and NVIDIA paths.
 
Well, that's the hope, anyway. Given that this compiler is so new, I don't expect a dev will be able to completely ignore the FX's limitations in coding games. While they may be able to stop worying about texture and shader intstruction ordering/packing, they'll still have to worry about shader precision (not just adding a _pp, but also possibly using PS1.x rather than PS2.x whenever possible). But it would appear to be a step toward simplifying a dev's work.
 
RussSchultz said:
Uttar said:
However, a NV3X path is still required, because there's no way for the compiler to legally decide which precision is required - deciding if an instruction should be FP32 or FP16 ( or even FX12 in NVIDIA's proprietary extensions ) remains in the developer's hands.
That would not be reason enough. The partial precision hints can be used to indicate FP32 or FP16, and the same code can be used on both ATI and NVIDIA paths.

Agreed, but I'd personally call that a "NVIDIA path" because beside NVIDIA, no IHV benefits from the partial precision hint.
Of course, it's much less of a path than what developers needed before, but still... :)

Yes, of course, developers can still worry about instruction ordering and stuff and might squeeze a bit more performance - but they'd need a so good understanding of the hardware in order to do a better job than the compiler that I'd say it's not worth it, and it most likely won't happen either.


Uttar
 
If the compiler knows that the error from using lower precision is small enough, <1 LSB of the final colour, then letting it use lower precision even if the developer didnt indicate it isnt an issue IMO.
 
MfA said:
If the compiler knows that the error from using lower precision is small enough, <1 LSB of the final colour, then letting it use lower precision even if the developer didnt indicate it isnt an issue IMO.

The compiler cannot know that without knowing the state of the colour buffer or any other external buffers bound to the program. Information about that state is not available at compile time.

Ergo those optimisations cannot be made, unless you want to carry around multiple compiled versions of the program for every possible configuration of input/output. The maximum allowable error when rendering into a 16-bit float buffer is different from an 8-bit fixed buffer.
 
You don't need to carry around all different compilations for every possible combination of input/output, you just need to profile for the most common (e.g. driver detects for shader X, input set Y represents 90% of the time), and carry one optimized for that, plus one conservative fallback.

(see HLSL thread where I describe driver profiles and "speculative" compilation ala Smalltalk/Java Hotspot)

That said, it's difficult for the compiler to automagically determine precision intent, which is why the HLSL should support it as a hint.
 
andypski said:
Ergo those optimisations cannot be made, unless you want to carry around multiple compiled versions of the program for every possible configuration of input/output. The maximum allowable error when rendering into a 16-bit float buffer is different from an 8-bit fixed buffer.
This isn't really a big problem. It's not like you're switching texture/output buffer formats for every render call. And, for example, adding two 32bit RGBA texture values doesn't introduce any error even in FP12 mode.
 
DemoCoder said:
That said, it's difficult for the compiler to automagically determine precision intent, which is why the HLSL should support it as a hint.
Yes - the compiler shouldn't automagically determine precision - as this is a method that guarantees that it can generate incorrect results depending on inputs/outputs. In essence it would guarantee bugs.

I also agree that having one conservative fallback is enough, but if the fallback is really conservative (generic, and in some way significantly slower than the optimal case) there will always be temptation to allow your output precision standards to slip in order to accommodate higher performance by allowing more instances of the faster shader to be used.

Xmas said:
This isn't really a big problem. It's not like you're switching texture/output buffer formats for every render call. And, for example, adding two 32bit RGBA texture values doesn't introduce any error even in FP12 mode.
At compile time I need to make a decision as to how to compile the code. If I predict that an input will be an 8-bit texture and then find later that a float texture is bound to it then it completely changes my precision and hence my code. A similar situation occurs with the output buffer. I might render using the same shader into a fixed point buffer in the general case, but into a float buffer for some special effects.

The example you give is also not always safe in FX12 mode - the act of filtering the textures could easily introduce extra bits between the 8-bit values of the components that would be inaccurately represented in an FX12 space. And this is with a shader so simple that it is not really a particularly interesting optimisation task for the compiler. Once I get to 20 or so instructions with a variety of source data precisions then the task of choosing appropriate levels without help becomes more complex.

If I have a float input texture and an 8-bit input texture and add them then what precision should my compiler be using? Should it make a different precision choice if the 8-bit texture is point-sampled as opposed to if it is bilinear filtered?
 
andypski said:
At compile time I need to make a decision as to how to compile the code. If I predict that an input will be an 8-bit texture and then find later that a float texture is bound to it then it completely changes my precision and hence my code. A similar situation occurs with the output buffer. I might render using the same shader into a fixed point buffer in the general case, but into a float buffer for some special effects.
You can either defer "compile time" until the input/output formats are known (rendering is an asynchronous process anyway). Or do as DemoCoder suggested and compile for the best case and a safe fallback, and then choose between the two.

The example you give is also not always safe in FX12 mode - the act of filtering the textures could easily introduce extra bits between the 8-bit values of the components that would be inaccurately represented in an FX12 space. And this is with a shader so simple that it is not really a particularly interesting optimisation task for the compiler. Once I get to 20 or so instructions with a variety of source data precisions then the task of choosing appropriate levels without help becomes more complex.
Sure it becomes very complex. I didn't argue against this.

Texture filtering could introduce extra bits, but there's no requirement to that. Texture filtering in general is rather ill-defined when it comes to precision.
 
Xmas said:
You can either defer "compile time" until the input/output formats are known (rendering is an asynchronous process anyway). Or do as DemoCoder suggested and compile for the best case and a safe fallback, and then choose between the two.

Whether deferring until use is practical depends on the length of the compile process, but for short shaders we can probably make the assumption that this is ok. The fallback path might be more attractive for real-time use in general to avoid possible hitches in rendering caused by situations where many recompiles turn out to be required within a single frame.

Texture filtering could introduce extra bits, but there's no requirement to that. Texture filtering in general is rather ill-defined when it comes to precision.
True - I was just using your example to point out how something that people might regard as 'safe' might actually not be safe at all depending on the underlying architecture. Of course an IHV writing the compiler would be aware of these issues - if the architecture does produce extra bits in filtering then running at FX12 internally would then be 'robbing' you of these extra bits again - the low-level compiler writers might feel justified in this approach if they are looking for ways to give you more speed.

A higher-level compiler going to an intermediate representation should obviously not be making these decisions in cases where the precision is 'ill-defined'.
 
maybe it could be done on a selective basis, shaders that the compiler thinks would be especially slow (alot of instructions, all of them PS2) could use "forced" partial precision. of course this would be an illegal optimization...
 
Xmas said:
andypski said:
At compile time I need to make a decision as to how to compile the code. If I predict that an input will be an 8-bit texture and then find later that a float texture is bound to it then it completely changes my precision and hence my code. A similar situation occurs with the output buffer. I might render using the same shader into a fixed point buffer in the general case, but into a float buffer for some special effects.
You can either defer "compile time" until the input/output formats are known (rendering is an asynchronous process anyway). Or do as DemoCoder suggested and compile for the best case and a safe fallback, and then choose between the two.
There are problems with JIT compilation. For example, how will the application react to having less CPU time available to AI, physics, etc.?
 
OpenGL guy said:
Xmas said:
andypski said:
At compile time I need to make a decision as to how to compile the code. If I predict that an input will be an 8-bit texture and then find later that a float texture is bound to it then it completely changes my precision and hence my code. A similar situation occurs with the output buffer. I might render using the same shader into a fixed point buffer in the general case, but into a float buffer for some special effects.
You can either defer "compile time" until the input/output formats are known (rendering is an asynchronous process anyway). Or do as DemoCoder suggested and compile for the best case and a safe fallback, and then choose between the two.
There are problems with JIT compilation. For example, how will the application react to having less CPU time available to AI, physics, etc.?
The first couple of frames would be slow and jumpy ;)
That's a function of shader complexity and compile effort/efficiency, of course. I've found 'jitting' tons of small, low level shaders to be completely unnoticable on your current drivers. (ARB_fp, <=20 instructions)
 
Uttar said:
Agreed, but I'd personally call that a "NVIDIA path" because beside NVIDIA, no IHV benefits from the partial precision hint.
Why? Its still the same source that runs on every other board out there.
 
RussSchultz said:
Uttar said:
Agreed, but I'd personally call that a "NVIDIA path" because beside NVIDIA, no IHV benefits from the partial precision hint.
Why? Its still the same source that runs on every other board out there.
I gotta agree with ya. As long as the developer isn't making concessions for register limits (still using a minimum number of registers, but not sacrificing a low instruction count to achieve that, thus leaving the optimization to the compiler) or dumping everything down to partial precision, it's quite a standard render path.

On the topic of partial precisions, though, it's interesting to consider that the difference in performance appears to come solely from being able to use more registers. That makes me wonder if partial precision could be of any use (in terms of performance) on a video card that didn't suffer from register limitations such as the GeforceFX has.
 
andypski said:
If I have a float input texture and an 8-bit input texture and add them then what precision should my compiler be using? Should it make a different precision choice if the 8-bit texture is point-sampled as opposed to if it is bilinear filtered?
I think it'd be more of a case of working backwards from the intended result. If the target is only an 8 or 10 bit fixed-point colour channel, then a 16bit precision operation is probably ok. (Of course you'd have to assume that you weren't subtracting a huge number from another huge value but, then again, that sort of practice can get you into trouble with any floating point precision)
 
zeckensack said:
The first couple of frames would be slow and jumpy ;)
That's a function of shader complexity and compile effort/efficiency, of course. I've found 'jitting' tons of small, low level shaders to be completely unnoticable on your current drivers. (ARB_fp, <=20 instructions)
I would highly discourage compiling any shaders inside the game loop. ATI have always been very clear about this in all the developer documentation that we've produced, and there are extremely good reasons for it.

Compile times are never going to get faster than they are now, and I would expect them to get substantially slower.
 
JIT profiling can be done during test runs (e.g. on install "we are now checking/benching your video card", standby ) Obviously requires developers to know that the driver has this feature. Of course, OGL2.0 has compiler builtin driver, so they will be aware of this sooner or later.

Ideally, there would be an extension or API for feedback-profiling ala modern C compilers. Of course, this only matters if you have a non-trivial architecture (e.g. not a direct map to dx9 assembly), where you might have a significant disconnect between what people are targeting, and what your hardware actually does.
 
Back
Top