Ultra Shadow and Shadow volume rendering acceleration

Nick said:
Now that I think of it... how can you do per-pixel lighting with stencil shadowing if you don't have the surface properties? If I'm correct then all you have is the base texture color?
Stencil shadowing does nothing more than label each pixel as in shadow or not in shadow, for a particular light. You can apply all of the textures or bump maps or whatever you want to it.
 
Nick said:
<SNIPPED shadow vol/shadow map description>

Still one more argument I'm going to annoy you with. ;) If I got it correctly, this requires multiple passes over the whole framebuffer. This requires quite a lot of memory bandwith I suppose. For shadow mapping, lighting can be completely computed per pixel in one pass.

Both methods require "a pass per light". They are just giving you a 'flag' which indicates if a particular pixel is inside or outside of a shadow. Whether the multiple passes can be absorbed into the pixel shading operations then just depends on the capabilities of the HW. <shrug>
 
Chalnoth said:
Stencil shadowing does nothing more than label each pixel as in shadow or not in shadow, for a particular light. You can apply all of the textures or bump maps or whatever you want to it.
Damn, I knew I was on the wrong tracks there. Sorry. My defense is that I'm studying my computer architecture exam at the same and only disturbing this forum as a relaxation that keeps my brain alive. ;) If that annoys anyone, just don't answer...

I hope I got it right this time: The whole scene is rendered to the z-buffer. Then the shadow volume is rendered to the stencil buffer. Now a simple lookup in the stencil buffer tells us if the pixel is lit or not. Then we render the scene again using textures and bump maps and whatnot, using only ambient lighting for the parts in shadow. For the next light we clear the stencil buffer again and render the shadow volumes for the next light. Now we render a new additive pass similar to the previous one with texturing and bumpmapping again, but without ambient lighting. Repeat this process for all lights.

Still I can't help to think this is far less efficient. The textures and bump maps have to be read for every pass, and additive passes for the whole framebuffer isn't exactly fast either. With shadow mapping it appears to me that all this can be computed at once. The only workaround I see is to have as many stencil buffers as lights, like eight or so. This way it also can be done in one pass?

Sorry if I'm again making a fool of myself, I'm just trying to learn from my mistakes. Somehow, the shadow mapping algorithm seems so much more intuitive to me...
 
Nick,
You're asking perfectly reasonable questions - certainly nothing to be ashamed of!

As I said, the efficiency really depends on what the HW supports. For example, you're thinking that the shadow map method is much more efficient because you can do multi-texturing and don't need to send the geometry multiple times. If the HW only supported single texturing, as once upon a time was the case, you'd be no better off.

Similarly, if the HW could send more info from the visibility calcs to the pixel shader, then volumes would not need multiple geometry passes either. For example, Dreamcast had this sort of optimisation.
 
Nick said:
Still I can't help to think this is far less efficient. The textures and bump maps have to be read for every pass, and additive passes for the whole framebuffer isn't exactly fast either. With shadow mapping it appears to me that all this can be computed at once. The only workaround I see is to have as many stencil buffers as lights, like eight or so. This way it also can be done in one pass?

Well, with shadow mapping you can draw two or more lights in a single pass, true, but you'll need about twice as long shader for two lights, so it's not much of a gain. The only gain is that you don't need to sample the same texture over and over, but for a little more complex lighting the texture lookups aren't going to set the limit anyway. You'd also in the general case need different shaders for one light and two lights etc. With loops in fragment shaders it can be a useful though since you wouldn't have to have different shaders for different number of lights.
It should also be noted that there's nothing saying you will have to redraw the whole framebuffer as you put it for either shadow mapping or shadow volumes.
 
CI said:
Since we are on shadow volumes and UltraShadow:

(The NV35 Doom3 Advantage)
http://www.hothardware.com/hh_files/S&V/ushadow_doom3.shtml

Any comments?

Yep. I comment that I will hereby stop making any reference to "online journalism", and use "online sensationalism" instead. Those people are making a complete article based on Anand's benchmark numbers. That Anand's figures have several times been proven grossly innacurate over his last reviews, and that at the moment no one can duplicate those numbers (even Anand can't run the benchmark again, seeing this whole D3 "benchmarketing" operation was a one-time shot), does not seem to bother the author in the slightest. Allow me to throw up.

This has nothing to do with "brand loyalty", "fanboyism", or whatever. This has everything to do with "intellectual integrity", "consistency", "verification", and other concepts that seem to have been forgotten by the journalists^H^H^H^H^H^H^H^H^H^H^Hguys with webpages.

I don't know a damn thing about "Ultra Shadow". For what I know, it might very well be the best thing since toasted slice bread, or turn into another marketing feature. What I'm sure of is that it is actually being touted as the best thing ever for Doom3 and the reason why the GFFX performed better than the 9800 in the benchmarketing operation, whereas nobody can verify it...

On second thought, I think I'll throw up just a little more.
 
They discussed this at Dusk To Dawn, a while back (before it had a marketing name). We were all fairly certain the post NV30 chip would have it, and we were right.

To be fair to NVIDIA, they actually support both shadow maps and shadow volumes fairly equally in hardware. They try optimizing both cases and let the developers decide which to use.
 
Doomtrooper said:
Overdraw reduction using Stencil Operations gets a brand new PR name
Can you tell me how you can reduce overdraw by using stencil operations? This is news to me. I'm intrigued.
 
Reverend said:
Doomtrooper said:
Overdraw reduction using Stencil Operations gets a brand new PR name
Can you tell me how you can reduce overdraw by using stencil operations? This is news to me. I'm intrigued.

LOL! After reading that twice (it's 3:30am here :p) it made me laugh. :LOL:
 
Reverend said:
Doomtrooper said:
Overdraw reduction using Stencil Operations gets a brand new PR name
Can you tell me how you can reduce overdraw by using stencil operations? This is news to me. I'm intrigued.

Discussions on this board months ago revealed that neither the R300/NV30 supported z-rejection using Stencil Shadows..as shown here that problem was fixed in the R350, and I assume on the Nv35.

http://www.beyond3d.com/reviews/ati/r350/index.php?p=21

Overdraw reduction with Stencil Operations

Fablemark

9700 Pro 145.3

9800 PRO 174.2



3Dmark 03 Battle Of Proxycon

9700 PRO 52.6

9800 PRO 63.3


HyperZ-II for ATI
Ultrashadow for Nvidia
 
Simon F said:
As I said, the efficiency really depends on what the HW supports. For example, you're thinking that the shadow map method is much more efficient because you can do multi-texturing and don't need to send the geometry multiple times. If the HW only supported single texturing, as once upon a time was the case, you'd be no better off.
Yeah I'm sorry, I'm actually thinking totally theoretical here. I interpreted the title of this thread very generally, looking at what technology -could- be used in the future. So I automatically assumed total freedom with shaders and such. My way of thinking is also largely biased because I'm much more into software rendering (see my sig.) than hardware rendering, and in software anything is possible. So when I compared performance I was speaking of the theoretical number of operations required, not looking at any practical or physical limitations.

Actually I am particulary interested in the most efficient shadowing algorithm for a Turing machine. Although currently only software renderers are Turing machines, I think this is the way forward for graphics cards too...

Thanks a lot for the interesting discussion!
 
I thought the performance problem comes from Carmack's reversal of
the traditional shadow volume algorithm.

HZ implementations efficiently reject pixels guaranteed to fail the z-test (by storing some sort of per "tile" z-max), but they don't efficiently reject pixels guaranteed to pass the z-test (because they don't store per "tile" z-min).

I have yet to see any sort of official confirmation that this has changed in r350/nv35.
 
DT, I see what you mean now. You meant when stencil operations are in use, when I thought you meant using stencil operations to reduce overdraw.
 
The NVidia technique is a different issue than avoiding problems with stencil usage and the hierarchical z-buffer. It allows culling of zixels before they even reach the Z-reject stage and presumably without using extra bandwidth since it is in effect, a clipping technique.
 
DemoCoder said:
It allows culling of zixels before they even reach the Z-reject stage and presumably without using extra bandwidth since it is in effect, a clipping technique.
I'm still wondering how effective it is. How much of the shadows can you prevent from being (unnecessarily) calculated? Of course this will depend on the scene and probably also depend on how much time a developer wants to spend implementing the hints. JC stated somewhere that DoomIII uses half the fillrate just for shadows - so if it would be possible to avoid calculating shadows in only 1/3 of the area that they needed to be caluclated otherwise this should give a quite substantial 17% performance improvement overall. In that interview, "up to 30%" is quoted, but that's from nvidia PR (not implying they are lying, but it could be an unpractical case never really achieved in real life). I'd like to hear JC's comment about the performance benefits of this extension.
 
I'm thinking a 30% boost is not completely unlikely. You could potentially save a lot of fillrate on stencil volumes with this technique if used properly. Probably cut almost half the fillrate need.
 
Back
Top