VSM Technical Discussion (was part of R600 thread)

Andrew Lauritzen

Moderator
Moderator
Veteran
Presumably, the reason why AndyTX's demo runs faster on G80 is that the pass rendering *to* the shadowmap must be running much faster than on the R600. If you compare G80's Z-write rate to R600's, that shouldn't be very surprising either.
That's actually not true as my demo renders to a standard 1-component fp32 texture as a shadow map, not a depth buffer. That way we can use a linear depth metric instead of the much-less-useful post-projection z value. Thus no "z-only" rendering rates apply. My guess is that the G80 wins because of much better performing sampling abilities (the VSM results seem to indicate the same thing), but I'm open to suggestions considering the massive theoretical bandwidth advantage of R600...
 
That's actually not true as my demo [...]
Are you rendering depth and colour at the same time, or depth and then colour? 32-bit colour-only rendering is double-speed on G80 too, just like on G71 iirc. Besides that, G80's ROPs have free MSAA loops, but unless AA is being forced in the control panel and the driver is doing something weird that shouldn't affect your app's PCF score.

Now I'm really confused. I'm looking back at my texture bench scores, and the bug (or 'feature' I guess) that affects single-channel FP32 sampling on G80 also affects *all* depth formats, both in PCF mode and nearest, apparently. So G80 shouldn't have a real advantage there, especially so if R600 indeed can do 16 PCF/clock. So hmmm - any chance of letting out some other details to try to figure this out? :) (hopefully without hijacking this thread too much, could create another small one I guess)
 
Are you rendering depth and colour at the same time, or depth and then colour?
I'm rendering both at the same time... I highly doubt that rendering one and then the other would be faster overall as the whole seen would need to be re-transformed in that case.

Besides that, G80's ROPs have free MSAA loops, but unless AA is being forced in the control panel and the driver is doing something weird that shouldn't affect your app's PCF score.
Right: the PCF implementation is straightforward, although it does do "proper/correct" PCF filtering which requires derivatives and dynamic branching. To avoid this extra factor, position the camera parallel to the ground plane and make sure nothing else is onscreen (which will result in constant-sized filter kernels and branching).

I'm totally happy to explain anything about the demo that you want, and we can break this out into another thread too. I'm personally really interested in exploring R600's shadowing performance and trade-offs with different methods, but unfortunately I don't have access to a card right now :( I can certainly make changes to the demo though if you want to run different benchmarks. The source code will also be available with GPU Gems 3 in August so anyone can mess with it then.
 
That's actually not true as my demo renders to a standard 1-component fp32 texture as a shadow map, not a depth buffer. That way we can use a linear depth metric instead of the much-less-useful post-projection z value.
Does the cost of transforming your linear depth compare value to post-projection space per pixel really outweigh the cost of having to write two "depth buffers" and not being able to use double-Z?
 
I think he ran into some issues with getting double-z to actually work...or I may be remembering something wrong.
 
Does the cost of transforming your linear depth compare value to post-projection space per pixel really outweigh the cost of having to write two "depth buffers" and not being able to use double-Z?
No - the point is that to write linear depth you need to write a color buffer as you can't compute it per-vertex and rasterize (and I'm pretty sure writing depth from the fragment shader would make the whole thing hella-slower). Like I said you can probably get away with it for directional lights where the obvious linear metric (distance from light plane) is compatible with the depth buffer metric, but for spot and point lights the "distance to light" metric cannot be done with linear attribute interpolation because it requires a non-linear sqrt in the fragment shader ("length").

Thus you *have* to write a color buffer, and I don't think it makes sense to do it in a different pass even if you have a ton of overdraw, since the fragment shading cost of simply computing the depth (and depth^2 for VSM) is negligible.

The point is that "post-projection z" is non-linear, and while it may be "enough" for standard shadow mapping with biasing (ugh), it's really not suitable for VSM, or even PCF with suitable large filter kernels.

PS: We should really split this into a separate thread if we want to continue. It is certainly related to R600's design and performance, but not as much to the original article.
 
Last edited by a moderator:
No - the point is that to write linear depth you need to write a color buffer as you can't compute it per-vertex and rasterize (and I'm pretty sure writing depth from the fragment shader would make the whole thing hella-slower). Like I said you can probably get away with it for directional lights where the obvious linear metric (distance from light plane) is compatible with the depth buffer metric, but for spot and point lights the "distance to light" metric cannot be done with linear attribute interpolation because it requires a non-linear sqrt in the fragment shader ("length").
What I'm saying is that transforming between linear and post-projection Z is trivial (it can be as simple as a rcp). And it may be cheaper to do that per pixel than having the overhead of writing linear depth to a "color" buffer, depending on the scene and resolutions.

The point is that "post-projection z" is non-linear, and while it may be "enough" for standard shadow mapping with biasing (ugh), it's really not suitable for VSM, or even PCF with suitable large filter kernels.
I'm not sure about the exact requirements for VSM. If you do some post-processing on the depth buffer, you could linearize the depth values there. Why would it not be suitable for PCF?

PS: We should really split this into a separate thread if we want to continue. It is certainly related to R600's design and performance, but not as much to the original article.
Yes, maybe a nice moderator could take the last few posts and put them into a new thread, to clean out this one? :)
 
What I'm saying is that transforming between linear and post-projection Z is trivial (it can be as simple as a rcp). And it may be cheaper to do that per pixel than having the overhead of writing linear depth to a "color" buffer, depending on the scene and resolutions.
But reconstructed linear Z might be less accurate than a 'real' linear Z
 
But reconstructed linear Z might be less accurate than a 'real' linear Z
Possibly, though I'm not even sure of that. Anyway, with FP32 depth you're already limited by the vertex transformation (and general shader) precision. I doubt you'd get massive precision problems from using post-projection Z.
 
Reconstructing linear z would defeat the purpose entirely since as long as the reconstruction is monotonic (which it is), it wouldn't change the comparative relationship what-so-ever. In my experience (I did try both ways), using post-projection z even with PCF gave significantly worse depth precision than linear z, and there are a several papers and applications that recommend the same thing.

fp32 is arguably overkill for PCF, but it's certainly not for VSM in which the second moment is somewhat unstable. Still, fp16 is certainly *not* enough for PCF so you're stuck with depth formats or fp32. There are a number of reasons why I dislike the depth formats (including the aforementioned inflexibility with different metrics, biasing and a few more), but if I was doing a straight PCF implementation I'd certainly look into them.

That said I think I've given ample justification (and read Gems 3 for more!) for why VSM is often a better option than PCF, and with VSMs you certainly need to render to a color buffer. The only alternative is to render to the depth buffer, then read that back and write to a color buffer depth and depth^2. I'm fairly certain this would be slower on most architectures (excepting maybe the 360) and more importantly it eliminates the possibility of using derivatives to represent the variance of a certain pixel, which is often desirable albeit not crucial.

Anyways I'm sure PCF implementations using the depth buffer are quite usable (there wasn't even an option until recently), but in my experience they suffer from quite a few problems and the depth formats are far too limiting, even in DX10.
 
Reconstructing linear z would defeat the purpose entirely since as long as the reconstruction is monotonic (which it is), it wouldn't change the comparative relationship what-so-ever.
What I mean is that you need both values in the same space, so if you get the depth comparison value from linear interpolation, you need to transform it to post-projection depth/shadow map space, or transform the shadow map to linear depth at some point. Assuming you're using a real Z buffer.

In my experience (I did try both ways), using post-projection z even with PCF gave significantly worse depth precision than linear z, and there are a several papers and applications that recommend the same thing.

fp32 is arguably overkill for PCF, but it's certainly not for VSM in which the second moment is somewhat unstable. Still, fp16 is certainly *not* enough for PCF so you're stuck with depth formats or fp32. There are a number of reasons why I dislike the depth formats (including the aforementioned inflexibility with different metrics, biasing and a few more), but if I was doing a straight PCF implementation I'd certainly look into them.

That said I think I've given ample justification (and read Gems 3 for more!) for why VSM is often a better option than PCF, and with VSMs you certainly need to render to a color buffer. The only alternative is to render to the depth buffer, then read that back and write to a color buffer depth and depth^2. I'm fairly certain this would be slower on most architectures (excepting maybe the 360) and more importantly it eliminates the possibility of using derivatives to represent the variance of a certain pixel, which is often desirable albeit not crucial.

Anyways I'm sure PCF implementations using the depth buffer are quite usable (there wasn't even an option until recently), but in my experience they suffer from quite a few problems and the depth formats are far too limiting, even in DX10.
I'm not sure we're talking about the same thing. I do have trouble believing an FP32 depth RHW (1 / Zlight) buffer would have significantly worse precision than an FP32 linear (Zlight) shadow map.

In the initial post above you stated you're rendering to a 1-component fp32 texture, so I was assuming you're doing depth² and some blurring in a post-process step.
 
I'm not sure we're talking about the same thing. I do have trouble believing an FP32 depth RHW (1 / Zlight) buffer would have significantly worse precision than an FP32 linear (Zlight) shadow map..
It's not exactly the same thing but I've been trying more than once to reconstruct a linear Z from a normal 24 bit z buffer and results were not good at all, in fact I had to render linear Z in a z pre pass into a FP32 texture. Surprisingly enough I lost, on average, around 2%-5% of performance, so at some point I did not bother trying to make the pure zprepass work anymore.

Marco
 
It's not exactly the same thing but I've been trying more than once to reconstruct a linear Z from a normal 24 bit z buffer and results were not good at all, in fact I had to render linear Z in a z pre pass into a FP32 texture. Surprisingly enough I lost, on average, around 2%-5% of performance, so at some point I did not bother trying to make the pure zprepass work anymore.
There should be a huge difference between 24 bit fixed point and 32 bit floating point depth, if used correctly, so it isn't all that surprising. :)
I'm not expecting a huge performance difference either.
 
I'm not sure we're talking about the same thing. I do have trouble believing an FP32 depth RHW (1 / Zlight) buffer would have significantly worse precision than an FP32 linear (Zlight) shadow map.
I tend to agree with you theoretically, but for some reason every time I try to use post-projection Z for any shadowing work the results are very bad, even if it's just PCF. I should probably work through the FP error math, etc. but I haven't had the time or motivation to really look at it in detail, especially considering the current solution works quite well.

On the other hand, fundamentally the precision distribution of the depth buffer is pretty terribly for shadow mapping where you really want uniform precision over the entire depth range (the camera could be anywhere). For that reason alone I am concerned that after you've rendered to the depth buffer, you've already lost too much information.

In the initial post above you stated you're rendering to a 1-component fp32 texture, so I was assuming you're doing depth² and some blurring in a post-process step.
For PCF I certainly render a 1-component fp32 texture, but not for VSM. Like I said it's nice to have the depth derivatives available, and although it could certainly be combined with the first blurring pass, there hasn't been a need for that sort of optimization to this point, since rendering to the shadow map is comparatively inexpensive with VSM in my experience. That said it would certainly be something to look at in a production environment if it was found to be a bottleneck.
 
On the other hand, fundamentally the precision distribution of the depth buffer is pretty terribly for shadow mapping where you really want uniform precision over the entire depth range (the camera could be anywhere). For that reason alone I am concerned that after you've rendered to the depth buffer, you've already lost too much information.
The precision distribution of Z and 1/Z in FP is pretty similar. If the rcp precision isn't too bad, you probably lose 1-2 bits on the mantissa (and the exponent just gets inverted).
 
The precision distribution of Z and 1/Z in FP is pretty similar. If the rcp precision isn't too bad, you probably lose 1-2 bits on the mantissa (and the exponent just gets inverted).
I don't think we're talking about the same "z" here... what I use for point lights as the depth metric is "distance to light". i.e. length(PositionInViewSpace), potentially scaled and biased to fall into [-0.5, 0.5] for fp formats. That has a uniform distribution over the "world" (spatially), unlike Z which has a lot of precision near the light, and very little as you move away from it. The difference between these two metrics in my experience is quite pronounced.
 
Andy, have you tried to reconstruct linear Z from FP32 1/Z with directional lights?
 
The precision distribution of Z and 1/Z in FP is pretty similar. If the rcp precision isn't too bad, you probably lose 1-2 bits on the mantissa (and the exponent just gets inverted).

As a small footnote to this, you need to store 1 - z/w. I've found that the precision is worse than 24bit fixed-point otherwise, which is itself is not accurately reversable in practical situations.

Fortunately, as is well known, the 360 GPU has float zbuffer support and additionally has excellent rcp precision (which I suppose you could NR around otherwise). I'd be interested to know what the limitations of the RSX are in those respects :).
 
Andy, have you tried to reconstruct linear Z from FP32 1/Z with directional lights?
No I haven't actually - I've played mostly with spot/point lights. Have you tried? It seems to me that it may be possible to do such a thing with directional lights (where you can linearly interpolate the distance metric properly), but not with spot lights. Is this your experience as well?
 
Back
Top