Nvidia's unified compiler technology

These techniques your talking about aren't zero cost at all. There just moving the cost elsewhere.

Dynamic branch with branch prediction has a hardware space cost, gate cost (IHV have to implement it) and usually a high penalty when prediction fails.
Static branches without branch prediction have runtime execution cost.
Runtime static branch removal has a runtime CPU cost, upload cost and memory cost.
Offline branch removal has a offline CPU and a runtime memory and upload cost.

There are no free lunches.

Currently on consoles shader hardware, we have
a) no branch predictors
b) branches cost a certain amount to execute
c) CPU is a very scarse resource (compared to shaders)
d) Upload cost can be lost and the memory cost is 'reasonable'

So offline branch removal (lots of little shaders) makes sense, I personally predict the same costs will mean PC shaders for a few years will balance the same way.
 
DeanoC said:
These techniques your talking about aren't zero cost at all. There just moving the cost elsewhere.
Isn't that what performance optimization is all about?

Yes, support for optimized static branching will cost transistors (I don't expect dynamic branching to ever be very important for 3D graphics). But, it can save memory.

So, with support for optimized static branching, and smart compilers, overall performance can be increased in the long-term (there shouldn't be much difference for today's shaders, but the memory savings in the future can be very significant).
 
Simon F said:
If you are taking the "ultra-shader" code, removing all the "constant Boolean based" branching, and re-optimising the code, then how is this in any way different to just storing each of the optimised shaders in the first place? <shrug>
Programmers shouldn't be bothered with low-level optimization of shaders. This should be the domain of the compiler anyway. This is why I like OpenGL's HLSL so much, but you're right, it might be nice to be able to save compiled shader object files, to prevent large compile times from becoming in issue.
 
Simon F said:
If you are taking the "ultra-shader" code, removing all the "constant Boolean based" branching, and re-optimising the code, then how is this in any way different to just storing each of the optimised shaders in the first place? <shrug>

Simon
One allows you to find a balance by breaking it down as much as you like, the other does not.
 
Back
Top