yes, the developer would handle this at a material level, i.e. each pass would go through all surfaces sharing a given shader. why wouldn't the driver/HLSL compiler carry out the same obvious solution? just for the sake of being IMR -politically-correct? -nah.
And now you're suggesting one of the following two things:
A) the driver builds a per-triangle/triangle-group scene graph based on no prior knowledge of the scene. Or,
B) the driver provides the application a long list of render state and compiled program passes corresponding to the requested compiled shader.
For A, you have basically violated the first rule of optimization -- take advantage of everything you know. The shader has absolutely no knowledge of context -- what it's doing, when/where/how it's used, how important precision is, etc. All it knows is what to do. When you compile a shader, that is all the information the driver has. Getting optimal performance out of an application with _no_ prior knowledge is a lot harder than getting optimal performance out of an application with lots of prior knowledge. And, in this case, the application knows everything about the shader and how it is being used. When you ask the driver to do a completely fire-and-forget multipass system, you are ignoring a huge knowledge base that the application already has. And what happens if the application needs to maintain data in the stencil buffer (which isn't part of any shading language currently)? Suddenly a _whole_ lot of additional work must be performed (and additional state maintained) in order to ensure that rendering with the shader yields the same results in the stencil buffer, multi-pass or not.
For B, I think you're vastly underestimating just what needs to get passed around for an auto-multipass system to work. Textures get re-mapped, vertex data may need to be passed in differently (especially on chips like NV20 and NV25), all OpenGL state can change (color/write masks, depth tests, alpha tests, blend modes, stencil tests, etc.), the order things are drawn may need to change completely (in fact, new Pbuffers may need to be created in order to handle intermediate render-to-texture passes for each object), the number of passes returned may vary (not to mention the totally pathological shaders with statements like
for (i=0; i<texlookup(); i+=0.01)), etc. Managing all of the state that a compiler may return (especially a non-optimal compiler) would probably be more complicated than just doing the multipass yourself, and certainly more complicated than creating a simplified shader that may not look as good, but runs quickly and yields acceptable results.
Auto multi-passing with a simple system is sort of a holy-grail for shading languages. The Stanford Shading Language is a good start at this; however, its performance leaves a lot to be desired, and it even manages to produce some fairly compact pixel shading code (or, more accurately, register combiner/texture shader state).
The goal for immediate-mode APIs is to be as transparent and efficient as possible. If you think you've seen bad drivers with the current OpenGL API, just wait until driver manufacturers even try to multipass shaders automatically. Performance will plummet, visual artifacts will appear everywhere, and you'll probably see quite a few stability problems (as drivers now have to manage a significant (and constantly-changing) amount of data on the heap).
This is why the compiler has to report back that multipass was enabled. As long as the software knows that multipass is required, and each compiled pass is stored in a different location, there should be no performance problems (other than less-than-perfect optimizations).
Seeing as how the compiler has no knowledge of context, and "optimally" compiling a program is NP-complete, I'd imagine you'll see a rather large number of performance problems.
you mean exactly the way it's exposed in ogl2 ?
Nope -- I mean the way Stanford Shading Language sits on top of OpenGL 1.2. The hardware limits are there and queryable (and the final path to the driver _must_ heed them); however, if a developer wants to kill performance, he has the option of using an additional API to compile shaders down into multipass. The "assembly code" will be compatible across IHVs, so there will be one compiler (rather than compilers in each driver) that observes the resource limits the driver and hardware have.