Building shaders 'on the fly' - how?

As I understand, shaders are descriptors of material properties - how light and surface interact. Normally it is written in HLSL which at run time, compiles the assembly code for the graphics card which is installed in the system. This is known as compiling shaders 'dynamically'. But there's also building shaders dynamically (on the fly), which I'm failing to understand.
Here's a quote from a developer who I can't name:

"Our solution is to procedurally make only the shaders that are needed for rendering the next frame. If a new material appears in the next frame, we make a shader for it, but no sooner than that. This way the code and content is easier to manage, and the rendering engine is more flexible."

The context to the above is that in normal game development, shaders are created by hand and with thousands of materials to code for, this becomes inpractical. The answer is to automate shader generation.

What I don't understand is - how is this possible? One still have to sit down and write down the light to surface interaction in one way or another. How can this just be generated at 'run time'? How is the automated process as described above possible?
 
OK if you look at most shaders (vertex shaders in particular) they are really very modular, stuff like

Transform Position
Compute Light1
Compute Light2
Project Texture Coordinate
GetEnvMap Tex coord

So what you do is collect together a set of these "shader fragments", and assemble them together for the particular task at hand. The simplest way is with a set of flags based on some internal engine render state. Call compile on the assembled shader and use it. This is usually coupled with a cache of various flag combinations, so you only do it the first

This is not a new concept it's in use right now in both PC and XBox games. There are some gotchas with it, the primary one being when you do the compile, it takes significant CPU time and in some cases you can't know the exact shader until you try and render the object, which can cause unpredictable occasional framerate hickups.
 
What I don't understand is - how is this possible? One still have to sit down and write down the light to surface interaction in one way or another. How can this just be generated at 'run time'? How is the automated process as described above possible?

A shader in terms of HLSL is a complete program that handles the lighting based on particular fixed input. Eg if you want to handle material M with a spotlight, that is one shader, and if you want to handle material M with a pointlight, that is another shader. And if you want to handle material N with a pointlight, that is another shader again.
But all shaders share some of the code, only the code related to the light is different. So basically you can create a macro for each type of light, and a macro for each material, and make combinations of these at runtime.
This can be done at sourcecode level, but also with DirectX 9.0c's new fragment linker functionality.
 
For over 20 years C coders have sometimes used the power of a cast operation to change a data component into executable code, loading that data component with code and then executing it - so why can't that be done here?

Sometimes a device driver needs to be coded on the fly for say UNIX so you declare a function that points to a data block (say char run_me[2000]) then you cast this variable to code ((void *)())run_me) and execute it ((void *)())run_me)();

static char run_me[2000];
#typedef PFRI int (*pf) (); /*pointer to a function returning integer */

int driver();
{
return ( ((PFRI)run_me)() ); /*cast run_me to a function and execute it */
}

Of course before you do this you need to load the data element (run_me[2000]) with executable instructions by one of many mechanisms (e.g. asm("movl _sp, sp") into run_me, re-cast run_me then execute it).

So this gives you the power within a program to extensibly create and execute its own code within defined parameters - and its a capability that has been used since the early eighties (but very carefully by some very hard core programmers - like the team that re-wrote the optimising C compiler- Bruce Ellis at Bell Labs taught me this one). Dennis Ritchie said one day "I wouldn't do this unless you are Brucee".

Now that is 20 years old technology, I am sure in todays higher level more abstract programming languages there is a much simpler and cleaner way of doing this.
 
As Scali said, DirectX's fragment linker provides that functionality. You basically write snippets of shader code, each snippet consisting of shading or lighting calculations and link the appropriate snippets together at run-time.

This would still require the programmers to write shader code. A better method would be to generate the shaders from scratch at run-time. To do that, the shader library would require some rules based on which it would generate the shaders. For example, say a material requires a tangent space light vector as an input to its pixel shader. The shader library would need to construct a vertex shader that would provide the tangent space light vector to the pixel shader. Similarily, say the material needs to sample a bump map. The shader library would "know" how to add code to fetch a bump map.

The latter would completely eliminate the need for programmers to write shader code. But this is too advanced (technology being the limitation) for current or next generation games, so it most likely won't be used anytime soon.
 
poly-gone said:
As Scali said, DirectX's fragment linker provides that functionality. You basically write snippets of shader code, each snippet consisting of shading or lighting calculations and link the appropriate snippets together at run-time.

This would still require the programmers to write shader code. A better method would be to generate the shaders from scratch at run-time. To do that, the shader library would require some rules based on which it would generate the shaders. For example, say a material requires a tangent space light vector as an input to its pixel shader. The shader library would need to construct a vertex shader that would provide the tangent space light vector to the pixel shader. Similarily, say the material needs to sample a bump map. The shader library would "know" how to add code to fetch a bump map.

The latter would completely eliminate the need for programmers to write shader code. But this is too advanced (technology being the limitation) for current or next generation games, so it most likely won't be used anytime soon.

I know of at least 1 PC game that already does this (I'd assume that there are others) and a few Xbox games that use a similar system.

It looks at the render state and material inputs and when the geometry is submitted it dynamically constructs the shader. Obviously someone had to write the fragments at some point, so maybe I'm missing your point.
 
ERP said:
There are some gotchas with it, the primary one being when you do the compile, it takes significant CPU time and in some cases you can't know the exact shader until you try and render the object, which can cause unpredictable occasional framerate hickups.

I'm thinking this could be solved with some advanced support from the DX runtime or the driver in the OpenGL case. It could be done so that when you need a new shader you tell the API that you're in the middle of something so do it as fast as possible. The shader can then be compiled with no or very few optimisations. But instead of just accepting that the shader is unoptimized and continue with that from then on you could perhaps tell the API that you want it to continue optimizing the shader but spreading the cost over many frames, and thus removing any hickups. The DX runtime / GL driver could then continue the optimization during free timeslots such as when waiting for vsync or when it's otherwise stalled waiting for the GPU. It would complicate the compiler though I imagine. :)
 
If shader compile times are too long for realtime, then it'd be better to just eat the storage space and cache pre-compiled shaders.
 
Humus said:
ERP said:
There are some gotchas with it, the primary one being when you do the compile, it takes significant CPU time and in some cases you can't know the exact shader until you try and render the object, which can cause unpredictable occasional framerate hickups.

I'm thinking this could be solved with some advanced support from the DX runtime or the driver in the OpenGL case. It could be done so that when you need a new shader you tell the API that you're in the middle of something so do it as fast as possible. The shader can then be compiled with no or very few optimisations. But instead of just accepting that the shader is unoptimized and continue with that from then on you could perhaps tell the API that you want it to continue optimizing the shader but spreading the cost over many frames, and thus removing any hickups. The DX runtime / GL driver could then continue the optimization during free timeslots such as when waiting for vsync or when it's otherwise stalled waiting for the GPU. It would complicate the compiler though I imagine. :)


That certainly sounds feasible, being from the console side I always cringe when people suggest adding more hidden functionality to a driver but it seems like a reasonable solution on the PC side.

FWIW I think a better solution is to solve the problem at the engine level, rather than hiding what the engine is doing, you allow the app to force the compile prior to rendering. It's actually none trivial since you need to know all of the important state that will be set prior to the compile to know which fragments are important, what you really have to do is make a "dummy draw" call when you load it.
 
Chalnoth said:
If shader compile times are too long for realtime, then it'd be better to just eat the storage space and cache pre-compiled shaders.

I'm not suggesting that it's done per frame, of course there is a cache. However you still pay the compile cost when a shader is first used, and that can cause a hitch in framerate, mostly these will all be hidden in the first few frames, but there is no guarantee of that.

As an ideal I'm looking to build an application with reasonably predictable CPU usage from frame to frame. For various reasons this is pretty much impossible on PC, but on a console it's still practical.
 
Right. I meant to cache them sometime before rendering begins (level load time, game load time, at installation, or when the game ships).
 
Chalnoth said:
Right. I meant to cache them sometime before rendering begins (level load time, game load time, at installation, or when the game ships).

It turns out that even this is difficult in very dynamic games, in most traditional state based API's you can't know everything needed by the shader until you draw. The answer is of course to change the API to solve the issue, but it means going away from the state machine model (ala OpenGL/D3D).

None of this is insoluble of course but some of these issues aren't really obvious until after you implement it, and start seeing the hitches.
 
Well, it seems to me that the number of permutations should be rather limited, enough so that it shouldn't be that hard to precompile these things.

And you could even use branching to eliminate the need to have different shaders in the first place.
 
Thanks for the replies guys. :)

For the engines that generate shader code from scratch at run time, what defines the material properties of the objects in the scene?

Suppose the game loads and the first frame has a tree, some grass and a box. Once the engine goes and processes the tree,
1. How does the engine know a particular polygon or fragment belongs to a tree?
2. Once it does know, and it needs to build some leaf shaders 'from scratch', how does it know the visual characteristics of foliage? Where does it get the info on ambient, diffusse and specular for example?

I guess what I'm trying asking is, if there's no pre-compiled shader code describing the material properties of objects in the game, then there must be some other description of how different materials in the engies appear, and at run time, the engine can refer to this and compile into shader code. What's this 'other description'? Is it just a bunch of parameters the artists get to play with and if so how does it deal with more complex shaders?
 
1. How does the engine know a particular polygon or fragment belongs to a tree?

The engine will render the scene one object at a time (or actually one mesh at a time). Something like Draw( tree );
The tree object will contain all the polygons and some kind of description of the surface (texture, material properties etc).

2. Once it does know, and it needs to build some leaf shaders 'from scratch', how does it know the visual characteristics of foliage? Where does it get the info on ambient, diffusse and specular for example?

Those are stored in some way with each object, as stated before.
So there is some kind of 'recipe' from which the engine can dynamically create a shader.
For example, the object could contain a set of properties like this:
ambient: 0, 1, 1
diffuse: 1, 1, 1
specular: 0.5, 0.5, 1
shininess: 16
reflection: no
cast shadow: yes
texture: tree.jpg
bump: treebump.jpg

Then the engine has a list of active lights, and if you combine the two, you should know everything you want to know for your lighting model.
 
g__day said:
Now that is 20 years old technology, I am sure in todays higher level more abstract programming languages there is a much simpler and cleaner way of doing this.
Nope. Language/tool support for so-called self-modifying code has AFAICT not improved at all over the last 20 years (it's harder to do in Java/C# than in C, for example), and modern processors tend to handle self-modifying code worse and worse for each new generation. Pentium4 will handle it correctly, but flush its entire instruction cache, costing you many thousand cycles per modification; Itanium and most RISC processors will often fail to handle such code correctly at all (unless you manually flush their caches, which is very slow).
 
Well, except for more exotic sorts of experimental programming (such as, for example, attempting to make a self-optimizing program), I don't see that much reason to have self-modifying code. But even if you do, you'd think that you could optimize how well processors handle such a thing by not calling code immediately after it's modified. I don't see any graceful way to handle self-modifying code in pipelined architectures otherwise.
 
Back
Top