Layered Variance Shadow Maps

You are right Andy ;)

I thought I could simply replace the storage of depth with the exp() and could expect filtering to work, but it seems it's not the case. Am I right ? Sorry for such stupid questions, actually I needed to "close" the shadow subject (my favorite one :)) in our software in order to move on (physic right now) so I don't have much time to read all papers anymore...

This is however likely to add a lot of instruction to the shader (some exp and log), and I am already pixel shader bound :( Am I right ?

I am not using layered VSM because of the storage/fillrate issue. As I already have 4 slices, so having several layers per split is not an option...
 
I thought I could simply replace the storage of depth with the exp() and could expect filtering to work, but it seems it's not the case.
Actually that's pretty much all you have to do, except use exp(c*x) rather than just exp(x). Also be sure to also warp your reference depth before evaluating Chebyshev's inequality. For fairly low values of "c" this is quite sufficient. For large values of "c" you may also want to use the negative exp(-cx) warp as discussed earlier in this thread.

This is however likely to add a lot of instruction to the shader (some exp and log), and I am already pixel shader bound :( Am I right ?
Depends on your hardware I suppose, but it's almost free on G80 for instance. Trancendentals like exp() are extremely fast on modern hardware, and you're almost certainly going to be fetch bound with shadow maps nowadays.

I am not using layered VSM because of the storage/fillrate issue. As I already have 4 slices, so having several layers per split is not an option...
Yep fair enough. That said, EVSMs are a really excellent choice IMHO as they cost almost the same as VSMs but yet have much less light bleeding and even have better precision usage - for instance a fp16 EVSM (with c = 10 or so IIRC) has very few precision problems compared to a standard fp16 VSM.
 
Basically you end up filtering the argument of your exponential functions, which are just linear depth values, so you enjoy the same range you have with less exotic techniques. Just don't forget to go back to exp space when you use the final filtered value.

You can also find the same info here (slide 24)

Marco
Pre-filtering I understand, but what do you actually store in the ESM after that, and what do you sample when rendering the shadow?

If you want to use hardware texture filtering, I don't see how log-space prefiltering helps you, because the values in the texture are still limited by the same C-value that you're going to use in the shader.
 
Pre-filtering I understand, but what do you actually store in the ESM after that, and what do you sample when rendering the shadow?
Different implementations are possible. Personally I just leave the shadow map as it is (in log space)
If you want to use hardware texture filtering, I don't see how log-space prefiltering helps you, because the values in the texture are still limited by the same C-value that you're going to use in the shader.
Though hardware texture filtering is not mathematically correct in log space it just causes a some overdarkening, nothing major.

If you have your shadow map filtered in log space occlusion is just computed like this (let assume we use bilinear filtering):

float occluder = tex2D( esm_sampler, esm_uv );
float occlusion = exp( occluder - receiver );

while with filtering in exp space you have:

float exp_occluder = tex2D( esm_sampler, esm_uv );
float occlusion = exp_occluder / exp( receiver );

EDIT: if more complex filters are used (trilinear, aniso, with mip maps) you need to generate mip maps using log filteirng as well.
 
hmmm, strange. I had a quick try but it seems somethings is wrong. There is no shadow anymore.

This is what I use.

In the shadow map generation shader:

Code:
OUT.Color 		= ComputeMoments(IN.fDepth) - GetFPBias();

with:

Code:
float2 ComputeMoments(float Depth)
{
    // Compute first few moments of depth
    float2 Moments;
    Moments.x = exp(EXP_C * Depth);
    Moments.y = Moments.x * Moments.x;

    return Moments;
}

float2 GetFPBias()
{
    return float2(0.5f, 0.0f);
}

In the main shader:

Code:
float fDepth	= vProjCoords[iSplit].w;    // this is the depth
fShadowContrib 	= ChebyshevUpperBound(Moments, fDepthAdj, g_VSMMinVariance);
		
fShadowContrib 	= LBR(fShadowContrib, fLBRAmount);

with:

Code:
float ChebyshevUpperBound(float2 Moments, float Mean, float MinVariance)
{
    float fMean = exp(EXP_C * Mean);
	
    // Standard shadow map comparison
    float p = (fMean <= Moments.x);
    
    // Compute variance
    float Variance = Moments.y - (Moments.x * Moments.x);
    Variance = max(Variance, MinVariance);
    
    // Compute probabilistic upper bound
    float d     = fMean - Moments.x;
    float p_max = Variance / (Variance + d*d);
    
    return max(p, p_max);
}

I use the min variance you suggested, in my case:

const float fMinVariance = (EXP_C * exp(EXP_C * 0.00001));
const float g_VSMMinVariance = fMinVariance*fMinVariance;

If I replace the exp(EXP_C * x) with x everything works fine... Any idea what could be wrong ?
 
Last edited by a moderator:
I have commented out the following line in ChebyshevUpperBound()
Code:
//Variance = max(Variance, MinVariance);

and changed the fpbias:

Code:
float2 GetFPBias()
{
    return float2(0.0f, 0.0f);;//float2(0.5f, 0.0f);
}

but it doesn't help. very strange. any idea ? I generate mipmaps because I use trilinear filtering, not sure if that could explain the issue ?
 
Odd... curiously what's "fDepthAdj" in the code above? As a general rule, just make sure you're doing the exact same thing to compute and warp the depth in the lookup shader as in the shadow map rendering pass... that's best formalized by calling the same function in both shaders.

Generating mipmaps is of course fine, as is using trilinear filtering.

As another guess, what is your value of EXP_C and what range are the depth values in before you warp them by the exponential? What storage format are you using for your (E)VSM?
 
I am using exactly the same depth value, don't pay attention to fDepthAdj. This is because I normally use my reprojection algo but it is completely disabled now. Depth is between 0 and 1 I think, I need to check. It might be that I have negative depth, not sure anymore.

EXP_C is 10.0. I tried different value (from 0.01 to 50.0) without any success.

The storage format is R32G32, so any precision problem is not an option. I will try to get the values used for some pixels using nvPerfHUD if I manage to.

Did you manage to have it working by simply replacing depth with exp(c*depth) on your side ?

Thanks for your help BTW, I appreciate
 
Did you manage to have it working by simply replacing depth with exp(c*depth) on your side ?
Yeah that's pretty much all I did... I certainly didn't have to change any C++ code (except potentially using a 4-component texture for pos/neg EVSM).

Alright here's the code that I have, maybe it'll help... but first, a BIG DISCLAIMER: this is pure dev code, not optimized, not well commented and definitely not ready for public release... still hopefully it'll help you track down the problem with your stuff. [Regardless, I know as soon as I post this it's gonna be haunting me for the next few years just like the original OpenGL VSM code ;)].

Code:
static const float g_ExpWarp_C = 42;

// ... clip some irrelevant stuff ...

float4 Depth_PS(centroid Depth_PSIn Input) : SV_Target
{
    float Depth = 2 * BasicDepth_PS(Input) - 1;
    float2 PosMoments = ComputeMoments( exp( g_ExpWarp_C * Depth));
    float2 NegMoments = ComputeMoments(-exp(-g_ExpWarp_C * Depth));
    return float4(PosMoments.xy, NegMoments.xy);
}

float3 SpotLightShaderVSM(float3 SurfacePosition,
                          float3 SurfaceNormal,
                          float2 LightTexCoord,
                          out float DistToLight,
                          out float3 DirToLight)
{
    // Call parent
    float3 LightContrib = SpotLightShader(SurfacePosition, SurfaceNormal,
                                          DistToLight, DirToLight);
    float Depth = 2 * RescaleDistToLight(DistToLight) - 1;
    float PosDepth =  exp( g_ExpWarp_C * Depth);
    float NegDepth = -exp(-g_ExpWarp_C * Depth);

    float4 Data = texShadow.Sample(sampAnisotropicClamp, LightTexCoord);
    float2 PosMoments = Data.xy;
    float2 NegMoments = Data.zw;
    
    // derivative of warping at Depth
    // TODO: Make this faster and less awkward/redundant...
    float PosDepthScale = g_ExpWarp_C * PosDepth;
    float PosMinVariance = g_VSMMinVariance * (PosDepthScale * PosDepthScale);
    float PosShadowContrib = ChebyshevUpperBound(PosMoments, PosDepth, PosMinVariance);
    
    float NegDepthScale = g_ExpWarp_C * NegDepth;
    float NegMinVariance = g_VSMMinVariance * (NegDepthScale * NegDepthScale);
    float NegShadowContrib = ChebyshevUpperBound(NegMoments, NegDepth, NegMinVariance);
    
    //float ShadowContrib = PosShadowContrib;
    //float ShadowContrib = NegShadowContrib;
    float ShadowContrib = min(PosShadowContrib, NegShadowContrib);
    //float ShadowContrib = sqrt(PosShadowContrib * NegShadowContrib);
    
    [flatten] if (g_LBR) {
        ShadowContrib = LBR(ShadowContrib);
    }
    
    return LightContrib * ShadowContrib;
}
 
Though hardware texture filtering is not mathematically correct in log space it just causes a some overdarkening, nothing major.
I guess it just depends on how much pre-filtering you do.

If the kernel is only a few pixels wide (say, 3x3), you definately will notice artifacts, as exp(C * (occluder - receiver)) will give you a very non-linear shadow gradient.
 
Thanks a lot for posting your code Andy.

I found out my issue; I prefer not telling you what it was in order to have some remaining credibility ;)

Two questions remains:

- if I don't use the second (negative) bound, what effect might appear (to see if I actually need it)

- concerning the log filtering, from what Mintmaster says, the shadow gradient might look less smooth (or less linear) when large kernels are used. In my case, I generate mipmaps, which means, if I understand correctly, that very large kernel might be used. Would it make sense to try to implement the log filtering ?

sorry if my questions sound stupid... I am not mathematician but try to have a working implementation :)
 
I found out my issue; I prefer not telling you what it was in order to have some remaining credibility ;)
Hehe no issues, I suspect people can find many bugs and stupid oversights in my code on the Gems 3 CD for instance ;)

- if I don't use the second (negative) bound, what effect might appear (to see if I actually need it)
You can get similar artifacts to those of Exponential Shadow Maps. In particular, you may get blockiness (extra light) on the edges of objects onto which shadows fall. In the general case, complicated geometry over big filter sizes may cause significant aliasing.

That probably doesn't make a lot of sense, so here's an example:

Using just the positive warp/estimator (with c around 40) yields the following... notice the artifacts where the shadow gets near to the edge of the building roof:
edge_artifact_evsm_pos.png


Now using both warps/estimators (with c around 40 for each), the artifacts can be fully removed:
edge_artifact_evsm.png


Basically the bigger that "c" gets on the positive warp, and the larger filter regions you have, the worse the artifact can be. The negative warp can help a lot with this (as seen above)... it's not perfect of course, but it eliminates many of the artifacts of the positive exponential warp (or ESM) and costs very little other than additional storage (4-component vs 2). Still, if you're using say 16-bit float textures and a small c (around 10) the artifacts may not be bad enough to warrant using the negative warp.

For reference, here's what ESM degrades to the in the case of very complicated occluder distributions (compared to PCF and VSM). Note that EVSM (not shown) does a fair bit better but naturally still has artifacts. In these situations, opacity shadow maps or convolution shadow maps actually do the best job, as they model the visibility function directly (rather than the occluder distribution) and interpolate that smoothly rather than necessarily trying to reconstruct all of the discontinuities. Hence why I say that CSM is more interesting as a shadow method for volumes, hair, foliage or other complicated distributions than standard opaque occluders.

- concerning the log filtering, from what Mintmaster says, the shadow gradient might look less smooth (or less linear) when large kernels are used. In my case, I generate mipmaps, which means, if I understand correctly, that very large kernel might be used. Would it make sense to try to implement the log filtering ?
Not quite true... what Mintmaster is saying is that using *linear* (hardware) filter to interpolate values in a *logarithmically-stored* (i.e. storing c*x rather than exp(c*x)) shadow map will produce non-linear shadow edges. However, he notes that if most of your filter support is computed in log-space (correctly), the error of simply using hardware bilinear filtering isn't that large. However for *small* filters (3x3 blurs say), the error will be more noticeable.

This is actually even more true for hardware anisotropic filtering, which I totally love when using with shadow maps (it makes them look so good!). However if this is used with a log-space shadow map he error can be quite noticeable, producing shadows that are much darker than they should be and indeed reintroducing aliasing at highly-anisotropic angles. I don't have any handy screen-shots/vids of this in action right now so you'll have to take my word for it :)

In simpler terms, with log-space shadow maps you are doing the filtering yourself... you generate your mipmaps in log space (manually), you blur in log space and you do your trilinear interpolation in log space. What Marco (nAo) is saying is that if you use the hardware to incorrectly do the trilinear lookup in linear space, the error isn't that bad, particularly for large blurs. Whether or not that's true is up to you to decide... I personally don't like how much it messed up aniso filtering, but honestly if you're going for a really soft shadow look, it's probably not an issue. In any case it's easy enough to implement a manual bilinear/trilinear filter in log space and compare it to just flicking the hardware switch to see whether the latter is acceptable quality for you.

sorry if my questions sound stupid... I am not mathematician but try to have a working implementation :)
Not at all - I now have a masters in math and still have to think this stuff through and play with it a lot. Your questions are not stupid; you just happen to be speaking with three of the people who have worked on this stuff extensively!
 
Thanks for your answer Andy. Very useful and very friendly, I appreciate :D

I will play with this new toy tomorrow (between two physic engine integration steps), it is already quite late here in Europe. Thanks once again !

PS: I tried to use 16bit floating points target but had a lot of artifacts (surface acnee mainly). What is the conditions to be able to use such a format, simply have a small depth range ?
[edit] 16 bits format is working now, however, I can't use values greater than about 6.0 for C, or the edges get over-darkened in some areas. Any idea why ?
 
Last edited by a moderator:
- concerning the log filtering, from what Mintmaster says, the shadow gradient might look less smooth (or less linear) when large kernels are used. In my case, I generate mipmaps, which means, if I understand correctly, that very large kernel might be used. Would it make sense to try to implement the log filtering ?
Actually, it's the small filters that cause the funky gradients, and I was strictly talking about filters that preserve map size. It's more about spatial frequency of the features in the final shadow map (whichever mipmap level). If you go from shadowed to unshadowed within the span of only a couple texels, log-filtering will look a bit odd.

ESM gives smooth gradients (again, only on planar recievers) when you use linear filtering on the exponent of depth. With log filtering, you're doing a non-linear transform on the latter, interpolating linearly, and then effectively doing another non-linear transformation again. It's sort of the opposite of doing a piecewise linear approximation of a non-linear function. Log filtering gives you a piecewise non-linear interpolation of a linear function.

With mipmaps that aren't the top level, smooth gradients are less of an issue since they'll only span a few pixels on the screen unless you have a very angled surface. At worst, you'll just get an aliased shadow edge, but at least it won't shimmer as much as it would without mipmaps.
 
[edit] 16 bits format is working now, however, I can't use values greater than about 6.0 for C, or the edges get over-darkened in some areas. Any idea why ?
That may be near the maximum range of 16-bit floats, I don't remember (just work out log_2(e^6) and compare it to the exponent bits). As in the example code that I gave though you can get another bit by expressing "x" in [-1, 1] before exponentiating.

That said there may be more cleverly derived encodings for the moments... I seem to recall someone on GameDev coming up with one that helped quite a bit for 16-bit floats. That said it probably does not apply to encoding the exponentially warped moments (EVSM) but it's perhaps something to consider.

Still, one big advantage of EVSM as well is that they have fewer numeric issues than a normal linear encoding. Thus with fp16, c~=6 or whatever you've got better numerics and better light bleeding behaviour than standard VSMs (even with only the 2 moments), so it's hard to imagine why *not* to use it...
 
Last edited by a moderator:
Yes you are right. This is why I decided yesterday to stick to my cascade EVSM implementation in the product I currently develop ;)

However, the 16-bits format produces quite a lot of issues with C=6.0, so I am currently using a R32G32 texture array with C=20 for safety (positive bound only). I don't have any visible light bleeding, even in some extreme cases (hi flying aircraft shadows overlapping some tree shadows, both projected on the ground). Amazing ;)

I use 4 512x512 slices to cover a range of 2000m. 1024 would be better but the slowdown is incredible high with that size, even with a 8800 GTX. So it is not an option (yet). No idea however why the slowdown is so huge, as the PS is quite simple. Certainly I am hitting the card fill rate limit...

Thanks once again for your work and help; both clever and nice guys are quite rare nowadays.. I wish you the best
 
Yes you are right. This is why I decided yesterday to stick to my cascade EVSM implementation in the product I currently develop ;)
Cool, I'm really glad to hear that it's working out! Definitely 32-bit formats are the way to go with EVSM if you have 32-bit filtering. Can you hint at what your product is just curiously? It's always fun to know where stuff is getting used :)

I use 4 512x512 slices to cover a range of 2000m. 1024 would be better but the slowdown is incredible high with that size, even with a 8800 GTX. So it is not an option (yet). No idea however why the slowdown is so huge, as the PS is quite simple. Certainly I am hitting the card fill rate limit...
Yeah not sure why that's so slow... I've even done 4x 1024's with 4x shadow MSAA and still it pulls >80fps. To be fair, that's a fairly simple scene, but I'm surprised there's such a falloff at 1024. I can see there being slowdown at 2048 or above due to card z-cull metadata limits and so forth, but I'm surprised that 1024's are causing a problem. Are you rendering to them sequentially or "all at the same time" (GPU Gems 3 PSSM chapter style)?

Thanks once again for your work and help; both clever and nice guys are quite rare nowadays.. I wish you the best
Not a problem - I'm really glad that EVSM is working for you. It's always great for academic work to actually be useful, which is always up in the air until someone actually does something cool with it :) If/when you can, definitely post some screen shots!

Cheers,
Andrew
 
Last edited by a moderator:
I render the shadow map using instancing and the geometry shader. So one single draw call per object to have all slices rendered, like in the article you mentionned.

I know I am already shader bound (or fillrate bound ?) so maybe increasing the shadow map size increase the workload on these units...

I will try to post some screen shots for you.

Greg
 
I render the shadow map using instancing and the geometry shader. So one single draw call per object to have all slices rendered, like in the article you mentionned.
Interesting. Yeah that approach may be a win if you're heavily CPU/submit bound, but honestly if you have decent spatial subdivision and view frustum culling, I'd considering benchmarking the standard approach of rendering each slice in a different pass. There's not necessarily going to be a lot of objects that span multiple slices anyways.
 
Back
Top