Nvidia's unified compiler technology

DeanoC said:
Dio said:
If '<1 second to compile 100 shaders' is acceptable, then things should look OK until we get into the thousands-of-instructions range.

100 shaders per second will be fine. We know it takes time to do the compile, just making sure that its known that some of us can't really on a trusty loading screen to hide things :)

So that brute force optimisor that takes several seconds might not be a good idea :D
Please be a little careful with shader creation. Creating duplicates should be avoided at all costs... particularly if you are going to be creating oodles of shaders. If you are planning on destroying shaders that won't be used for a while, that would help some, but there are potential resource issues involved (each shader causes the driver to use memory resources as there are associated structures).

We've already seen cases where memory fragmentation is a problem and the amount of memory a driver is allowed to allocate is not infinite...

Once we move to platforms with larger address spaces, many of these problems will go away, but if you can avoid allocating resources that you don't need then that's always preferable.
 
DeanoC said:
You are trying to convince us to lower our art quality in the name of speed.

No, I'm not. I'm saying that using 10,000 shaders for 10,000 materials is misuse of the API, since even if you have 10,000 permutations you don't have 10,000 completely different situations. You got a couple of dozens at most, and thus should use a couple of dozen shaders. Just learn to use the if-statements, constants, and all that.

Shaders are very flexible, and it should be used as such. It wasn't invented as a replacement for glPushAttrib()/glPopAttrib(). Shaders aren't an excuse for writing naive and brute force applications.

To illustrate my point, did anyone ever design a game editor in which the artist paints the texture on all walls individually? If we wanted we could just let the game generate a texture for every surface, bake the lightmap into the base texture and simply use plain texturing on everything. How would it run? It would run like ass. It would take a lot of time to load all the textures, we would overflow the texture memory, would get a lot of texture switches.

How would you view such a design? I would view it as flawed, badly written, and all that. I wouldn't expect the driver to handle such misuse well, and would oppose to spending driver developer time on trying to get something useful out of the situation. No, you don't design your app that way, instead you design it so that the artist creates a texture, loads it into his editor and applies it on numerous surfaces. If you do lighting you perhaps make that with lightmaps. You don't create a lightmap for every surface. No, you pack many lightmaps into a larger texture. Yes, it requires more work, but we're not here to satisfy lazy developers. You just don't feed your API with 10,000 textures if you want performance and short load times. It's not the driver's responsibility to handle suboptimal use of the API and hardware.

10,000 shaders it pretty much at the same level. A shader takes time to load, just like a texture. A shader takes up resources, just like a texture. Switching shaders is expensive, just like texture. A developer must be aware of all this, and write the application accordingly, or suffer to consequences of bad performance and long load times.

In your standard lightmapped game, every surface looks different. That's not reason enough to make a different texture for every surface, no, you just assign every surface a small portion of a lightmap texture and modulate it with you base texture. You reuse your resources, and unique resources for surface or objects are small. For a normal wall you maybe get a 16x16 lightmap area. In the shader enabled world you store a small bunch of constants or vertex attributes that describes the properties of that wall. You don't assign a different shader because your other wall has a blue tint and this wall is green. No, you pass the color to the shader.

It's not about reducing quality. It's about sensible use of the API and hardware.
 
How to easily handle a multi-material complex object?

Say we have a robot, with shiny metal parts, dull gun-metal gray parts, rubber parts, and transparent parts with bubbly liquids in them. (And a strange haze that follows it around, but lets not dwell on that)

How would a prudent developer handle that?

Break each material into a separate state? Static branching?
 
Humus, I can only assume you haven't worked on any large scale games projects. That kind of thinking is only valid for small scale projects.
I certainly don't work on any games. But mind you, that doesn't make our thoughts valuable input you might take into account. I am very disturbed as huge gamer by the amount of badly designed engines there are among many games.

Shaders are related to art, and that means the more flexibility you allow, the better art you get. We give options to ours artists and create shaders from there choices.
That is an awkward statement. Why on earth do you still limit the number and size of textures? Because of memory limits? Or because the AGP bus transfers simply decreases performance too much? Why don't you put millions of vertices per frame on our screens? Maybe because it wouldn't perform?

The same goes for your amount of shaders. Everything has a limit, and YOUR job is finding the right balance between those limits. Something, to my regrets, doesn't happen in many games. It is just awkward to see games where many parts of the game are so demanding that it is even unplayable on high end gaming rigs.

(May I remind Unreal II and Halo for example and many others I can't remember the name of.)
 
RussSchultz said:
How to easily handle a multi-material complex object?

Say we have a robot, with shiny metal parts, dull gun-metal gray parts, rubber parts, and transparent parts with bubbly liquids in them. (And a strange haze that follows it around, but lets not dwell on that)

How would a prudent developer handle that?

Break each material into a separate state? Static branching?

Those are all very different materials with very different properties, so they would require different shaders. But for the other robot, next to this one, that may look very differen, you will still probably have it made of the same or a similar set of materials. So you don't need another set of shaders for the next robot. You need different shaders to deal with metal vs. wood, or rubber vs. plastic etc., since they got significantly different properties, but you don't need different shaders for iron vs. steel, or blue plastic vs. green.
 
Well, exploring further, would it be more prudent to have a "robot uber shader" that would be include all of the different materials, but the pieces would be reached by jumps? Or have a different shader per material?
 
Humus said:
Shaders are very flexible, and it should be used as such. It wasn't invented as a replacement for glPushAttrib()/glPopAttrib(). Shaders aren't an excuse for writing naive and brute force applications.

We have high resources counts for years, how many textures do you think a modern game has? Does that mean we want them all displayed on the same frame? No we manage it so the game runs 'fast enough'. What were saying is that a part of the management involves creation and destruction.

If the path of fast creation isn't available, our quality will suffer. It will have a progammer picking 'n' shaders and thats it, rather than the artist picking which visual attributes they want to see and us creating a shader to display it, first time its needed. Indeed the way most of us are handling the problem is to have a shader cache, say 500 shaders ready, but that means potentially anytime a shader is needed thats not in the cache, a destruction and creation cycle.

Its got nothing to do with A) shaders per frame and everything to do with B) number of shader creations. Your assumming that an increase in B means more of A, but shader LOD increases B to reduce A!

I know that state changes are expensive (and have been for years, its not a new issue), shaders are just another bit of state to us.

I agree 10,000 shaders per frame is stupid but why is 10,000 shaders per GAME? (in the future, I'm not talking about now). If your claiming that we can have 1000 instructions shaders but only 10 of time, then you will find us treating the long shader as 100 small shaders and sticking a jump table at the top (BTW one method thats used on PS2 VU1, which is effectively a vertex shader with 1024 instructions).

As all shader hardware I've dealt with at a low level have multiple programs resident and can choose the entry point, what exactly is the issue of having 100 10 instructions rather than 1 1000 instruction shaders? Any change of material even in a 'uber' shader will involve that same constants changes, which may stall the pipe. The only difference between 1000 small shader or 10 large shaders are if they can't be held resident.

Long shader aren't a new thing, the games community have been working with long vertex shaders at a low level (on PS2 and Xbox) for years. We've tried many strategys and multiple small shaders is the best way of getting high performance and the flexible shader look. The 'uber' shader just wastes cycles even when the hardware has branching capabilities (reduces optimisations and the branches usually cost some speed themselves)

We have the experience with complex branching shaders and while its nice to have a fallback 'uber' shader, for production quality work you need lots of little specialized shaders. If the API requires us to hide this via our own small shader management system (jump tables) fine we will but why not program the driver correctly to handle the case.
 
Humus said:
Those are all very different materials with very different properties, so they would require different shaders. But for the other robot, next to this one, that may look very differen, you will still probably have it made of the same or a similar set of materials. So you don't need another set of shaders for the next robot. You need different shaders to deal with metal vs. wood, or rubber vs. plastic etc., since they got significantly different properties, but you don't need different shaders for iron vs. steel, or blue plastic vs. green.

You thinking about things in a 'renderman' style, where execution time isn't as vital as the end results. Say I write a shader that has all the parameterisation for steel vs iron. If you actually look at steel vs iron with an artistic eye, they look quite different (in fact steel has a wide range of colours itself, based on firing temperatures, carbon content, etc), even ignoring things like rust etc. And artists will ask for the steel on his head to be a bit 'bluer' than the steel on his legs. So my procedural texture has to have all theses parameters evaluated real-time. OR I can evaluate these parameters off-line and produce a few much smaller/faster shaders to use run-time. No reduction in flexibility to the artists or the game (unless you want to change the parameters real-time).

As long as the API handle 'n' shaders as well as 1 shader, the second method may be better for ISV's and IHV's. Faster shader = better graphics = more sales of games and video cards (at least thats the theory :) ).

Now of course if the API/hardware ends up prefering a few long shaders rather than lots of short shaders, then this offline 'optimisation' would be useless.

I'm not saying this is the right way, but remember games are different from DCC packages or off-line CGI and we don't need to evaluate everything real-time and thay may mean a different way of doing things.
 
Humus said:
To illustrate my point, did anyone ever design a game editor in which the artist paints the texture on all walls individually? If we wanted we could just let the game generate a texture for every surface, bake the lightmap into the base texture and simply use plain texturing on everything. How would it run? It would run like ass. It would take a lot of time to load all the textures, we would overflow the texture memory, would get a lot of texture switches.

Actually for outdoors, unique texturing works very well. I've written a outdoor rendering system that used unique textures. 16Kx16K texture DXT1 on disk with a streaming system to bring it in real-time. Its all down to management, a fixed cache of texture space (12 Meg IIRC) allowed us to have effective unlimited worlds (disk space was the only limitation), each texture was baked from over 20 materials + lighting and shadows. Looked lovely and run really well on machines with low fillrate, crap multi-texturing (like a GF2MX).

Sure dumping the full 150 Mb of textures to the video card killed it, but thats just bad programming, quad tree, just to late texturing and vertices, precalced VIPM etc give us a terrain of 2M Tris and 16Kx16K texture. You can fly high enough to see it all or stand at human height and look to the horizon. IIRC it used ~5ms CPU time (on a 1.x Ghz machine).
 
DeanoC said:
I know that state changes are expensive (and have been for years, its not a new issue), shaders are just another bit of state to us. <snip> The only difference between 1000 small shader or 10 large shaders are if they can't be held resident.
That's pretty close to the truth, although shaders are kind of halfway between state and textures in terms of their 'cost'. In particular, shaders will generally need a good deal of support overhead in a driver (and potentially on the hardware).

DeanoC said:
Any change of material even in a 'uber' shader will involve that same constants changes, which may stall the pipe.
Changing constants is still likely to be more lightweight than changing shaders. You can assume that constant changes are more like state changes and it is probably substantially less likely that a constant change will stall the hardware than a shader change. This is likely to be even more true in the future - in the long term one can envisage that much more 'state' will be constants, and the cost of changing 'state' will need to stay as low as possible, of course.
 
RussSchultz said:
Well, exploring further, would it be more prudent to have a "robot uber shader" that would be include all of the different materials, but the pieces would be reached by jumps? Or have a different shader per material?

Probably not, but that may be hardware dependent. For significantly different materials, like metal vs. rubber, you're probably better of just writing another shader.
 
DeanoC said:
We have high resources counts for years, how many textures do you think a modern game has? Does that mean we want them all displayed on the same frame? No we manage it so the game runs 'fast enough'. What were saying is that a part of the management involves creation and destruction.

If the path of fast creation isn't available, our quality will suffer. It will have a progammer picking 'n' shaders and thats it, rather than the artist picking which visual attributes they want to see and us creating a shader to display it, first time its needed. Indeed the way most of us are handling the problem is to have a shader cache, say 500 shaders ready, but that means potentially anytime a shader is needed thats not in the cache, a destruction and creation cycle.

There's no guarantees of fast creation of textures or vertex buffers either, so why should we demand this from shaders?
'n' shaders doesn't mean that you have 'n' visual apperances. It may very well 'n' times 'a lot', since you can feed it with infinitely a lot of different parameters. If you do it right you don't need to lose a single piece of appearance.

DeanoC said:
Its got nothing to do with A) shaders per frame and everything to do with B) number of shader creations. Your assumming that an increase in B means more of A, but shader LOD increases B to reduce A!

I did not assume anything such.

DeanoC said:
I agree 10,000 shaders per frame is stupid but why is 10,000 shaders per GAME? (in the future, I'm not talking about now). If your claiming that we can have 1000 instructions shaders but only 10 of time, then you will find us treating the long shader as 100 small shaders and sticking a jump table at the top (BTW one method thats used on PS2 VU1, which is effectively a vertex shader with 1024 instructions).

I seriously doubt that you'll ever get into a situation were 10,000 shaders are motivated, even in the future. If you decide to go along such a path, and you find that it works fine, then fine. If you find that it's slow, or that it stutters upon load time, then it's not the driver's fault since the API doesn't guarantee short compile times. I'm not against the idea of compiler hints that can speed up things, nor am I against the idea of feeding back binary precompiled hardware specific shaders for reuse and fast load. But I'm against the idea of first designing an engine around an inefficient usage of the API, and then if it turns out to be slow demand that the driver is supposed to adhere to the needs of this inefficient usage. And I'm certainly against a 10,000 shader scenario as an argument against driver side compilers.
 
Humus,
I understand precisely what you mean and you're right, but I still can't help wondering what you're arguing against. Is it about JIT or about 10k shaders per frame? The former doesn't automatically imply the latter.

JIT is a technique that's useful for reducing the working set. Noone's going to design a renderer around 10k shaders for a single frame. And everyone who even has 1k+ possible shader permutations already knows that this must be carefully managed. In my laughable pet project I use a diligently simple yet fast shader cache w 1024 maximum entries, lookup the correct shader with a hash over some state and reuse the established shader object. If it's not in, I add it. If the cache is full, I destroy old shaders to make room. Dead simple, and it works for hours on end even though it's technically JIT and thus should provoke the nastiest of issues (memory fragmentation).
I would be flattered if it were so, but I don't believe for a second that I could make it work and all the big dogs couldn't ;)

You may simply not know beforehand what shaders will be required in a minute. 'Preloading' all possible permutations that may be required as a result of game scripting at once is bad API usage, agreed. Nobody's tried to deny it AFAICS.

Removing shader compilation from 'level load' and doing it JIT is of course a tradeoff, but not an inherently stupid one, if done right. Graphics porgrammers have dynamically managed limited resources for decades, be it texture memory in the Voodoo era or NV_var just lately. It can be done, that's for sure.

Let me draw an analogy:
If we are in agreement that there's no dedicated fixed function logic in R300 (right?) ... what happens for 'legacy' applications that use ARB_texture_env_combine?
I'd suspect that shaders are generated on the fly (checking state changes at each draw call as usual) to match the fixed function configuration. Now, first, how many combinations of ARB_tec state are possible (even if we restrict ourselves to quad texturing), and second, how many combinations will any given single application use?
Would you prefer ATI generating the thousands of resulting shaders by default, or would you prefer JIT?
I'd prefer JIT ;)

DemoCoder said:
If you leave out displacement shaders, do any RenderMan short films even use 10,000+ shaders?
They are short films, right? ;)
 
Humus said:
There's no guarantees of fast creation of textures or vertex buffers either, so why should we demand this from shaders?
Tell that to the last IHV who made texture creation or vertex creation a lot slower (its been tried in the past). Doesn't ever last long, the API is not what ARB or MS decide is what we the developers use and the way we use it. There are no 'guarentees' that a particular call will be fast, but if an IHV chooses to slow down a common call, then the benchmarks (and reduced sales) show it. Thats why its a 2 way discussion, the IHV's would like to be able to do everything 'there way', we would like everything 'our way'. We meet in the middle (usually). Just look a NVIDIA recently to see where 'arrogance' about what IHV can do will get you.

Humus said:
'n' shaders doesn't mean that you have 'n' visual apperances. It may very well 'n' times 'a lot', since you can feed it with infinitely a lot of different parameters. If you do it right you don't need to lose a single piece of appearance.
But long branching shader are slow. Why execture instructions that don't contribute to the final image, or calculate temp that are constant throughout the entire shader? We can take a long shader, 'bake' out the variables and make a shorter shader that gets the same results. We can do this all off-line, using neither precious CPU or GPU time. Its just means we get many short shaders rather than a few long shader. Why do it? Because it increases visual fidelity via increased vertex or pixel throughput.

The best optimisation is not to execute something.

Also from a production point of view, this system will also works well on platforms that we pass native shaders to. The means 1 rule that works on all platforms including PC. The only real downside is if compalition is slow (for some value of slow). By the sounds of it the ATI guys get the point I was making. Take some time doing the compilation but don't think you have unlimited time, you don't.


Humus said:
I seriously doubt that you'll ever get into a situation were 10,000 shaders are motivated, even in the future. If you decide to go along such a path, and you find that it works fine, then fine. If you find that it's slow, or that it stutters upon load time, then it's not the driver's fault since the API doesn't guarantee short compile times. I'm not against the idea of compiler hints that can speed up things, nor am I against the idea of feeding back binary precompiled hardware specific shaders for reuse and fast load. But I'm against the idea of first designing an engine around an inefficient usage of the API, and then if it turns out to be slow demand that the driver is supposed to adhere to the needs of this inefficient usage. And I'm certainly against a 10,000 shader scenario as an argument against driver side compilers.

We'll have to agree to disagree whether 10,000 shaders sounds alot for the future. We will also have to agree to disagree about driver side compilers.

I and many others are going to have to get a shader into the system at run-time. If there isn't a fast path for that, the game will stall. Never think there is a right way or wrong way of making games. IMO Long 'uber' shaders are wasteful with lots of redundent instructions, a simple dead code elimination process/constant baker can eliminant most of them. The argument that we shouldn't do this, just because the drivers wants to spend along time converting shader into its favorite format and doesn't want to handle lots of small shader doesn't work for me.
 
I still don't see where these oodles of shaders come from. There are a finiite number of material types and lighting equations, I simply don't see the combinatorial explosion.

Are there any games today that use 10,000 textures?
 
DeanoC said:
We will also have to agree to disagree about driver side compilers. <snip>

The argument that we shouldn't do this, just because the drivers wants to spend along time converting shader into its favorite format and doesn't want to handle lots of small shader doesn't work for me.
This is a stickier part of the discussion. I can see some legs of the trousers of time in which the ability to see very high level code inside the driver is not just a win, but a massive win, because some hardware magician has come up with something incredibly smart that is very much out of left field.

There are plenty of hardware magicians inside ATI, I can't rule out wanting high-level code to compile at some future point.

At the same time I completely understand the performance issue. We'll make sure we find an acceptable compromise.
 
DemoCoder said:
Are there any games today that use 10,000 textures?
I would expect so. Anyone can unpack the q3 texture set and see there are about 2000 textures in there. I'm certain some games designers are well over 5x this now - game asset complexity is matching or outstripping Moore's Law, which would therefore suggest at least 20k textures.

Whether we have yet hit 10k textures per level I don't know. Probably not, but not far off. But the reasons we haven't are nothing to do with technology and everything to do with time and money.

Just as 'unique texturing' is becoming a technology checkbox, it will not be a long time before 'unique shading' is too. In both cases, the asset creation is probably the larger hurdle than the technology; if game asset creation could be brought down to 10% of the current cost (in money and time) then I'm sure we would be seeing efforts towards unique shading even now - but the asset creation crisis in the games industry will prevent this in the near future, I would estimate.

Once more I must reference the mighty Greg Costikyan's notes on the subject, in case anyone missed it last time.
http://www.costik.com/weblog/2003_05_01_blogchive.html
http://www.costik.com/digitalgenres.ppt
 
DemoCoder said:
I still don't see where these oodles of shaders come from. There are a finiite number of material types and lighting equations, I simply don't see the combinatorial explosion.

Let's take a realistic example - you have to support legacy lighting capabilities trough VS1.1. Do that optimized, avoiding dead code.

Let's support point, directional and spot lights. FF supports 8 of them, but let's support only 4. (I think thats 35x combinations).
You can have diffuse texture or not (2x), specular texture or not (2x), envirnment map or not (2x), self-illumination texture or not (2x), bump map or not (2x).
You can have no-deformation, 2 source tweening, 3 source tweening, 1 bone skinning, 2 bone skinning at least (5x).
You can have fog enabled or not (2x).

Thats 35*2*2*2*2*2*5*2 = 11200 shaders.

And notice that nothing fancy is going on yet. What about shadow buffering (possibly for multiple light sources)? What about more thant 4 light sources? What about alternative lighting models?
 
Back
Top