gking said:Resorting to multi-pass in the driver is a very bad idea on many levels. It may make things easier for developers, but performance (and potentially quality) would suffer dramatically as a result, potentially even hurting performance when multi-pass isn't needed.
Further, performance problems can be compensated by the user by lowering the resolution, lowering geometry detail, lowering rendering quality and so on. Dealing with the limits of the hardware is just a pain for developers when coding with a high level language.
Sure performance will go down, but todays high end DX8 cards and coming DX9 cards (where these languages are targeded for) certainly do not have any performance problems. Just have a look at Anands latest UT2K3 scores, a Ti 4600 already is just blazing fast in the latest 3d engine
gking said:Most people assume that turning shaders into multiple passes is just some simple flag that should be done automatically. Sure, it's nice for application developers; however, it has *many* negative consequences at the driver level, and you really don't want unnecessary baggage weighting down your drivers.
And just becuase hardware is fast doesn't mean you should throw away performance. That's just ridiculous.
gking said:When you multipass, you typically want to render the whole scene with the first pass, and then render the subsequent passes with an equals depth test in order to take advantage of early Z rejection hardware to minimize the number of cycles spent on shading unseen pixels. Unfortunately, the hardware doesn't know when it renders a triangle if a later triangle will obscure it, so all passess will be rendered on all visible triangles. This is *much, much* less efficient than letting the application developer handle the multipassing.
gking said:When you multipass, you typically want to render the whole scene with the first pass, and then render the subsequent passes with an equals depth test in order to take advantage of early Z rejection hardware to minimize the number of cycles spent on shading unseen pixels. Unfortunately, the hardware doesn't know when it renders a triangle if a later triangle will obscure it, so all passess will be rendered on all visible triangles. This is *much, much* less efficient than letting the application developer handle the multipassing.
Additionally, some problems that can be solved in hardware on some GPUs (e.g., power functions per-pixel) may require textures on other GPUs. How does the driver decide what resolution texture to create for this lookup function (assuming a lookup can be created at all)? If the game has detected a card with 64M free memory, what happens when the driver creates a 1M lookup texture?
Also, should the driver look for all differentiable functions in order to collapse extra math passes into textures? If it finds such a function, can it collapse it on hardware that doesn't require multi-pass?
Most people assume that turning shaders into multiple passes is just some simple flag that should be done automatically.
Sure, it's nice for application developers; however, it has *many* negative consequences at the driver level, and you really don't want unnecessary baggage weighting down your drivers.
The best solution for allowing graceful multipass would be to make the driver-level interface hardware-limited (like it is currently), and then introduce a GLU-like layer that can gracefully create multipass when requested. That way, it's the application developer's fault if their program runs like crap.
Taking performance away from them (and from users) in order to gracefully multipass shaders on older hardware, when the better solution would be to just run a simpler shader, is a bad idea on many levels.
And just becuase hardware is fast doesn't mean you should throw away performance. That's just ridiculous.
DemoCoder said:Auto-multipass is a pipedream until the APIs operate at the scenegraph level. Yes, it can be done at the triangle level, but it will be woefully inefficient. You could get pathological cases out of relatively simple shaders that require dozens or more passes! And each pass is inefficient because of all the state changes, no ability to be intelligent about Z, transparency, stencil, etc.
DemoCoder said:Auto-multipass is a pipedream until the APIs operate at the scenegraph level. Yes, it can be done at the triangle level, but it will be woefully inefficient. You could get pathological cases out of relatively simple shaders that require dozens or more passes! And each pass is inefficient because of all the state changes, no ability to be intelligent about Z, transparency, stencil, etc.
Chalnoth said:However, any decent HLSL that compiles to multipass should certainly allow developer control over multipass, allowing something like this (pseudocode):
compile program
for x = 1 to numpasses
render pass x
next x
With something of this form, there shouldn't be any problem with state change stalls.
pcchen said:Actually the DX8 "Effects and Techniques" tries to do the same thing. It does not automatically compile into multiple passes but it allows similar operation.
Unfotunately, this does not always work. For example, if you want to draw transparent triangles, multi-pass algorithm may fail. You'll need to render the triangles into a temporary off-screen buffer and blend them into the frame buffer when they are all processed.
darkblu said:DemoCoder said:Auto-multipass is a pipedream until the APIs operate at the scenegraph level. Yes, it can be done at the triangle level, but it will be woefully inefficient. You could get pathological cases out of relatively simple shaders that require dozens or more passes! And each pass is inefficient because of all the state changes, no ability to be intelligent about Z, transparency, stencil, etc.
care to pin-point the arguments you have re auto-multipass inefficiency so we could go through them more thoroughly?
DemoCoder said:I posted them awhile ago on these forums, I'm not about to rewrite such a huge post.
Essentially, #1 you want to group primitives together which share like state. Unless the application takes control of pass rendering (polling the driver for each pass, getting the next set of compiled shaders, and doing the rendering itslef), you can't do this without scene capture.
Chalnoth keeps talking about this API, but I don't see it in the OpenGL2.0 specs online. Maybe someone can point me to it. When I talk about auto-multipass, I am talking about the driver multipassing the scene without ANY control from the application.
State changes are one of the biggest performance killers today, and is detailed in every DX/OGL programming FAQ (that, and mismanaging vertex buffers/stalling the GPU) However, immediate mode renderers are for the most part, procedural/stream based APIs, not object/stateful APIs, so any driver wanting to do the right thing has to do scene capture.
#2 transparency kills you.
#3 There are very simple operations like a user implemented version pow(), noise(), image shading, procedural textures, that require a very high number of passes and/or render-to-texture. Take computing Perlin noise/turbulence. How will a fragment shader that has a loop with 128 iterations, be unrolled onto the GF4 into god knows how many passes, perform if this procedural texture is used all over the place?
But we're talking about hypothetical application assists to the API, which I haven't seen yet.
pikkachu said:bah,nvidia just wanna hold back technology,
for them to say multi-pass is bad just because
their GPUs can't handle it,is loads of poop!