John Carmack said:So the solution that I’m looking at for outdoor lighting on there is a multi-level sort of mipmap... cropped mipmaps of shadow buffers on there where you have your 1k by 1k shadow buffer which renders only the, say two thousand units nearest you, and it’s cropped exactly to cover that area dynamically, and then you go ahead and take, you know start scaling by powers of two out there until you’ve covered the entire world, which may require five or six shadow buffers depending on how big your outdoor area is...
***
...I actually dynamically scale all of the resolutions for every single light that’s drawn based upon how big it is in screen, and you can throw other parameters into the heuristic that you use for deciding that, but because of the way I select out the areas that are going to be receiving shadow calculations on there, which I actually use stencil buffer tests (all the work with stencil buffers and the algorithms for that is still having some pay-off in the new engine, even though we’re not directly using that for shadowing), but because of the way I select areas in the screen for that, I don’t require clamping or even power-of-two texturing on the shadow buffers...
Can anyone give me an example of situations where we would need to "evaluate vertex shaders multiple times for multiple passes"?John Carmack said:That has been my position as well.Reverend said:I was discussing with Tim about the importance of vertex shaders and then about shadows. Tim had told me quite some time back that vertex shaders should be fairly simple ones to set up vectors for more complex pixel shaders (and that is his opinion, given the current industry -- software and hardware -- progress).
I have both methods running in the new codebase, because the driver paths for shadow buffers aren't optimized enough to make a determination about final performance yet.Reverend said:However, with the current choice of "favourite" shadowing being a toss-up between shadow buffers and shadow volumes (hacked, when necessary, maybe for soft-shadowing), shouldn't more emphasis be placed (mostly by the IHVs) on vertex shader performance and features? It, of course, depends on the way shadowing shapes out in time to come -- stencils is fill-costly due to shadow geometry while with buffers we'd need to be sufficiently high-rez enough to avoid aliasing. There's also the influence on developers of multisampling.
How do you see vertex shading features and performance shaping up when it comes to the consideration of shadowing?
The biggest hit in vertex shader performance right now is the fact that you may need to evaluate them multiple times for multiple passes. Allowing shaders to write back to vertex buffers will be a better improvement than doubling vertex shader performance.
John Carmack
Reverend said:Can anyone give me an example of situations where we would need to "evaluate vertex shaders multiple times for multiple passes"?
Allowing shaders to write back to vertex buffers will be a better improvement than doubling vertex shader performance.
V3 said:Allowing shaders to write back to vertex buffers will be a better improvement than doubling vertex shader performance.
I like to see something like that. Playstation 2 could do it, PC should be able to do it as well, if they're going the multiple passes route.
V3 said:Really ? That's cool. Which OGL extension can be used for this ?
BTW You didn't mean Render to Vertex Array did you ?
John Carmack said:Drivers might be maxed out on D3D, but they certainly aren't on OpenGL.Reverend said:It's not currently a hardware (NV4x, R4xx) issue and you think driver improvements can be made? I was also talking to a <snipped game developer> about shadow buffers and how drivers *plus* hardware limitations appear to be maxxed out already with regards to the latest from both NVIDIA and ATI... what sort of things in the drivers could be improved?John Carmack said:I have both methods running in the new codebase, because the driver paths for shadow buffers aren't optimized enough to make a determination about final performance yet.
Reverend said:More from John :
John Carmack said:Drivers might be maxed out on D3D, but they certainly aren't on OpenGL.Reverend said:It's not currently a hardware (NV4x, R4xx) issue and you think driver improvements can be made? I was also talking to a <snipped game developer> about shadow buffers and how drivers *plus* hardware limitations appear to be maxxed out already with regards to the latest from both NVIDIA and ATI... what sort of things in the drivers could be improved?John Carmack said:I have both methods running in the new codebase, because the driver paths for shadow buffers aren't optimized enough to make a determination about final performance yet.
Okay... so what are the differences between OGL and D3D wrt shadow buffers? Is this simply a matter of IHV priorities (driver development re OGL vs MS/D3D) or are there some important differences between the two APIs as far as shadow buffers are concerned? I'm no driver developer/engineer... any input from the IHV personnels/ driver developers here?
Wouldn't Doom 3 or 3DMark03's Proxycon be examples? For each pass, the vertices have to be transformed again, since the results are not stored in memory. In the case of Proxycon, skinning must also occur. Doom 3 performs skinning in software, leaving only transformation to the vertex shader -- an extremely light operation.Reverend said:Can anyone give me an example of situations where we would need to "evaluate vertex shaders multiple times for multiple passes"?
Ostsol said:What Carmack is suggesting is actually quite attractive. One could set up a target vertex buffer where the vertex shader would write skinned and/or transformed vertices to, reducing the load on vertex processing for subsequent passes and therefore making higher polygon counts and/or more lights much more practical.
Reverend said:More from John :
Okay... so what are the differences between OGL and D3D wrt shadow buffers? Is this simply a matter of IHV priorities (driver development re OGL vs MS/D3D) or are there some important differences between the two APIs as far as shadow buffers are concerned? I'm no driver developer/engineer... any input from the IHV personnels/ driver developers here?
With shadow buffers, the new versions that I've been working with, there's a few things that have changed since the
time of the original Doom 3 specifications. One this that we have fragment programs now, so we can do pretty
sophisticated filtering on there, and that turns out to be the key critical thing. Even if you take the built
in hardware percentage closer filtering [PCF], and you render obscenely high resolution shadow maps (2000x2000 or more
than that), it still doesn't look good. In general, it's easy to make them look much worse than the stencil shadow
volumes when you're in that basic kind of hardware-only level filtering on it. You end up with all the problems
you have with biases, and pixel grain issues on there, and it's just not all that great. However, when you start
to add a bit of randomized jitter to the samples, you have to take quite a few samples to make it look decent,
it changes the picture completely. Four randomized samples is probably going to be our baseline spec for normal
kind of shipping quality on the next game. That looks pretty good. There's a little bit of, if you look at broader
soft shadows on there, there's a little bit of fizzly pixel jitter as things jump around on there, but the randomized
stuff does look a lot better than any kind of static allocation on it. It should be a good enough level on there,
and the nice thing is because the shadow sampling calculation is completely seperated from the other aspects of
the rendering engine, you can toss in basically as many samples as you want. In my current research I've got a
zero-sample one which is the hardware PCF for comparison, a single sample that's randomly jittered, four samples
as kind of the baseline, and also a sixteen sample version which can give you very nice, high quality soft shadows
on there. And I'll probably toss in even higher ones like a 25 or 64 sample version on there which will mostly be
used for offline rendering work if people want to go ahead and have something render and they don't mind
if it's running a few frames a second, you can get literally film quality shadowing effects out of this, by just
changing out the number of samples that are going on in there. This wind up being very close to the algorithm
that Pixar has used for a great many of the Renderman based movies, and it's just running in the GPU's now in
real time at the lower sample levels.
Zeross said:Mordenkainen>are you sure he is not using PCF ? I remember him saying that PCF was not enough to solve the aliasing problem by itself, but combining PCF with multiple jittered samples offered pretty good results. 4 PCF samples would be close to 16 non filtered samples and so on. I don't see a good reason for not using PCF, but it has been a long time since I listen to his keynote so maybe I'm wrong.
John Carmack said:However, at this point right now, the shadow buffer solution is quite a bit slower than
the existing stencil shadow solution.
Some of that is due to hardware API issues. Right now I'm using the OpenGL p-buffer and render-to-texture interface
which is a GOD AWFUL interface, it has far too much inheritence from bad design decisions back in the SGI days,
and I've had some days where it's the closest I'd ever been to switching over to D3D because the APIs where just
that appaulingly bad. Both ATI and Nvidia have their preferred direction for doing efficient render-to-texture, because
the problem with the existing APIs is not only are they crummy bad APIs, they also have a pretty high performance
overhead because they require you to switch OpenGL rendering contexts, and for shadow buffers that's something that
has to happen hundreds of times per frame, and it's a pretty big performance hit right now. So, both ATI and Nvidia
have their preferred solutions to this and as usual they're not agreeing on exactly what should b done on it, and it's
over stupid, petty little things. I've read both the specs, and I could work with either one, they both do the job,
and they're just silly syntactic things and I have a hard time empathising why they can't just get together and
agree on one of these. I am doing my current work on Nvidia based hardware so it's likely I will be using their
extension.
Well, if you're doing the lighting in more than one pass, much of the vertex shader will remain the same between passes (if not all).Reverend said:Can anyone give me an example of situations where we would need to "evaluate vertex shaders multiple times for multiple passes"?