Does OpenGL 2.0 automaticly generate multipass???

anyways,Chalnoth if u thinks my post was directly focused on u,
you're sadly mistaken!

it was meant as a general comment! anyways...
 
yes, the developer would handle this at a material level, i.e. each pass would go through all surfaces sharing a given shader. why wouldn't the driver/HLSL compiler carry out the same obvious solution? just for the sake of being IMR -politically-correct? -nah.

And now you're suggesting one of the following two things:

A) the driver builds a per-triangle/triangle-group scene graph based on no prior knowledge of the scene. Or,

B) the driver provides the application a long list of render state and compiled program passes corresponding to the requested compiled shader.

For A, you have basically violated the first rule of optimization -- take advantage of everything you know. The shader has absolutely no knowledge of context -- what it's doing, when/where/how it's used, how important precision is, etc. All it knows is what to do. When you compile a shader, that is all the information the driver has. Getting optimal performance out of an application with _no_ prior knowledge is a lot harder than getting optimal performance out of an application with lots of prior knowledge. And, in this case, the application knows everything about the shader and how it is being used. When you ask the driver to do a completely fire-and-forget multipass system, you are ignoring a huge knowledge base that the application already has. And what happens if the application needs to maintain data in the stencil buffer (which isn't part of any shading language currently)? Suddenly a _whole_ lot of additional work must be performed (and additional state maintained) in order to ensure that rendering with the shader yields the same results in the stencil buffer, multi-pass or not.

For B, I think you're vastly underestimating just what needs to get passed around for an auto-multipass system to work. Textures get re-mapped, vertex data may need to be passed in differently (especially on chips like NV20 and NV25), all OpenGL state can change (color/write masks, depth tests, alpha tests, blend modes, stencil tests, etc.), the order things are drawn may need to change completely (in fact, new Pbuffers may need to be created in order to handle intermediate render-to-texture passes for each object), the number of passes returned may vary (not to mention the totally pathological shaders with statements like for (i=0; i<texlookup(); i+=0.01)), etc. Managing all of the state that a compiler may return (especially a non-optimal compiler) would probably be more complicated than just doing the multipass yourself, and certainly more complicated than creating a simplified shader that may not look as good, but runs quickly and yields acceptable results.

Auto multi-passing with a simple system is sort of a holy-grail for shading languages. The Stanford Shading Language is a good start at this; however, its performance leaves a lot to be desired, and it even manages to produce some fairly compact pixel shading code (or, more accurately, register combiner/texture shader state).

The goal for immediate-mode APIs is to be as transparent and efficient as possible. If you think you've seen bad drivers with the current OpenGL API, just wait until driver manufacturers even try to multipass shaders automatically. Performance will plummet, visual artifacts will appear everywhere, and you'll probably see quite a few stability problems (as drivers now have to manage a significant (and constantly-changing) amount of data on the heap).

This is why the compiler has to report back that multipass was enabled. As long as the software knows that multipass is required, and each compiled pass is stored in a different location, there should be no performance problems (other than less-than-perfect optimizations).

Seeing as how the compiler has no knowledge of context, and "optimally" compiling a program is NP-complete, I'd imagine you'll see a rather large number of performance problems.

you mean exactly the way it's exposed in ogl2 ?

Nope -- I mean the way Stanford Shading Language sits on top of OpenGL 1.2. The hardware limits are there and queryable (and the final path to the driver _must_ heed them); however, if a developer wants to kill performance, he has the option of using an additional API to compile shaders down into multipass. The "assembly code" will be compatible across IHVs, so there will be one compiler (rather than compilers in each driver) that observes the resource limits the driver and hardware have.
 
gking said:
And now you're suggesting one of the following two things:

A) the driver builds a per-triangle/triangle-group scene graph based on no prior knowledge of the scene. Or,

B) the driver provides the application a long list of render state and compiled program passes corresponding to the requested compiled shader.

For A, you have basically violated the first rule of optimization -- take advantage of everything you know. The shader has absolutely no knowledge of context -- what it's doing, when/where/how it's used, how important precision is, etc. All it knows is what to do. When you compile a shader, that is all the information the driver has. Getting optimal performance out of an application with _no_ prior knowledge is a lot harder than getting optimal performance out of an application with lots of prior knowledge. And, in this case, the application knows everything about the shader and how it is being used. When you ask the driver to do a completely fire-and-forget multipass system, you are ignoring a huge knowledge base that the application already has. And what happens if the application needs to maintain data in the stencil buffer (which isn't part of any shading language currently)? Suddenly a _whole_ lot of additional work must be performed (and additional state maintained) in order to ensure that rendering with the shader yields the same results in the stencil buffer, multi-pass or not.

hmm, i should admit i completely forgot about the stencil operations. back to the drawing board.

For B, I think you're vastly underestimating just what needs to get passed around for an auto-multipass system to work. Textures get re-mapped, vertex data may need to be passed in differently (especially on chips like NV20 and NV25), all OpenGL state can change (color/write masks, depth tests, alpha tests, blend modes, stencil tests, etc.), the order things are drawn may need to change completely (in fact, new Pbuffers may need to be created in order to handle intermediate render-to-texture passes for each object), the number of passes returned may vary (not to mention the totally pathological shaders with statements like for (i=0; i<texlookup(); i+=0.01)), etc. Managing all of the state that a compiler may return (especially a non-optimal compiler) would probably be more complicated than just doing the multipass yourself, and certainly more complicated than creating a simplified shader that may not look as good, but runs quickly and yields acceptable results.

i'd say this approach boils down to proper implementation. the idea of passing countless amounts of state info back and forth is not necessarily the only one at hand.

Auto multi-passing with a simple system is sort of a holy-grail for shading languages. The Stanford Shading Language is a good start at this; however, its performance leaves a lot to be desired, and it even manages to produce some fairly compact pixel shading code (or, more accurately, register combiner/texture shader state).

The goal for immediate-mode APIs is to be as transparent and efficient as possible. If you think you've seen bad drivers with the current OpenGL API, just wait until driver manufacturers even try to multipass shaders automatically. Performance will plummet, visual artifacts will appear everywhere, and you'll probably see quite a few stability problems (as drivers now have to manage a significant (and constantly-changing) amount of data on the heap).

auto multi-passing is imminent in the future. time passes, architectures mature, don't expect the state of graphics processors industry to be like today forever. point is, we need to start thinking/working about it today if we want to enjoy it tomorrow.

Nope -- I mean the way Stanford Shading Language sits on top of OpenGL 1.2. The hardware limits are there and queryable (and the final path to the driver _must_ heed them); however, if a developer wants to kill performance, he has the option of using an additional API to compile shaders down into multipass. The "assembly code" will be compatible across IHVs, so there will be one compiler (rather than compilers in each driver) that observes the resource limits the driver and hardware have.

ah, ok.
 
auto multi-passing is imminent in the future. time passes, architectures mature, don't expect the state of graphics processors industry to be like today forever. point is, we need to start thinking/working about it today if we want to enjoy it tomorrow.

I'm not saying that auto-multipass isn't the future; however, I don't think it should be the responsibility of drivers to handle it. Let a third party take on the task of writing a driver that can multipass arbitrarily complicated shaders, set up intermediate buffers, etc. Driver developers already have enough work laid out for them without having to worry about multipassing extremely complicated shaders, too. Auto-multipass is a potentially huge research project that Stanford has really only scratched the surface of, and I'd rather driver developers focus on making fast, stable drivers than spend tons of effort researching an unsolved problem.
 
One thing that you have to consider is that beyond auto-multipass is unlimited-length programs. Will the next-gen hardware support programs with unlimited length (at least theoretically...they'd obviously have to fit in the computer's memory...be it cache, video, system, or virtual...)? I do hope so, and it does appear the P10 currently does, at least in hardware (I don't think the software architecture is there yet for the P10...).

Anyway, what I guess I'm trying to say is, will we just skip auto-multipass and go straight to having no need for it?

Granted, there will be a need for auto-multipass until all DX8 and below hardware (or DX10...whichever is the last generation of limited program sizes) is below the minimum requirements for new games. If DX9 hardware is unlimited, then it will be another 2-4 years yet.
 
Some time ago, I asked Tim Sweeney about his opinion on automatic multipass in the driver. His reply is quite interesting:

Yes, definitely, I think that needs to happen in the future, the sooner the better.

From a practical point of view, we don't really need this in the DirectX9 generation, because there is one well-defined spec that all hardware follows so we can easily do our own multipass. But by the DirectX10 generation, there will be hardware with a very wide variety of different capabilities, and the automatic multipass fallback will be necessary so that we don't have to write N different versions of our renderer -- resolution and polygon counts become the only aspect to scalability, not features.
I think automatic multipass is in everyone's long-term plans, but it's a
very hard feature to implement, so it's not surprising that this next
generation of hardware / API's / drivers don't support it.
 
Back
Top