Variance Shadow Maps Demo (D3D10)

Andrew Lauritzen

Moderator
Moderator
Veteran
Hello again all. I promised to post the updated D3D10 demo of variance shadow maps, so here it is!

Since the original summed-area variance shadow maps thread and demo, I've done a decent bit of work. In particular, the application has been ported to D3D10, improved in several ways, and new techniques, features, and a new scene have been added.

A summary of the available techniques:
  • Normal ugly shadow maps.
  • Hardware accelerated percentage-closer filtering.
  • Variance shadow maps with trilinear/anisotropic filtering, and blurring to clamp minimum filter width. Also supports multisampling.
  • Summed-area variance shadow maps as described in the previous thread, except now with support for multisampling, as well as both an fp32 and int32 implementation.
  • Parallel-split variance shadow maps, which help magnification as well.

Shadow MSAA makes a *huge* difference in motion (use "animate light" checkbox). It really has to be seen to be believed, but even for really large minimum filter widths, swimming is still somewhat visible without MSAA. With even 4x MSAA swimming is drastically reduced or eliminated.

int32 is also really awesome for summed-area tables, and is the preferred implementation. Two things make this the case: the extra bits of precision over fp32, and the overflow behavior in D3D10. The latter works because overflow is wrapped in D3D10 which means that we only need to waste W*H bits of the SAT for accumulation where WxH are the dimensions of the maximum filter width. This maximum filter width can be bounded fairly conservatively (ex. 64x64 is plenty - probably overkill for most implementations). The results of int32 make numeric precision a non-issue again, and save a ton of memory bandwidth since there's no need to distribute precision into 4 components.

Parallel-split variance shadow maps are also really cool, especially with the new, larger "convoy" scene. Three 512x512 variance shadow map splits with 4x MSAA and a bit of blurring looks fantastic and has excellent performance, and the quality can go up from there if necessary. Note that this implementation is relatively unoptimized; it's more of a "proof of concept". In particular, the shadow split locations could be chosen a lot more sensibly, and even some basic frustum culling would greatly improve the performance of rendering the different shadow map splits.

There's a few details that come up with PSVSM that I thought you guys may be interested in as well. Feel free to skip over the next few paragraphs if not.

First to get consistent blurring over the different shadow splits one needs to scale the blur kernel size by the ratio of texel sizes between the current split and the full shadow frustum. This is fairly easy to do and very nicely hides the split locations as well, even in motion (see screenshots below). Note that in this demo I round the scaled filter widths to the nearest integer for simplicity; quality could certainly be improved further by allowing non-integer blur kernels.

The second detail is the "fun" one: the splits are rendered into a texture array and the applicable split is computed in the fragment shader when shading the scene. This split index is used to choose the appropriate projection matrix and texture array element. This poses a problem, however, for computing texture coordinate derivatives and LOD, since different pixels in the same quad may choose different slices, resulting in unrelated texture coordinates and nonsensical derivatives/LOD. Other implementations have not noticed this problem because without variance shadow maps (and proper shadow filtering), there *is* no LOD computation as things are at best bilinearly sampled.

To solve this problem we really want to arbitrarily choose one of the two split indices and make sure they are consistent across the quad (note that I'm assuming only two split indices in a quad, but this turns out to be completely reasonable). After some head scratching I came up with the following code:

Code:
// GLOBAL
const int SplitPowLookup[8] = {0, 1, 1, 2, 2, 2, 2, 3};


// IN FRAGMENT SHADER:
...
// Compute which split we're in
int Split = dot(1, Input.SliceDepth > g_Splits);
    
// Ensure that every fragment in the quad choses the same split so that
// derivatives will be meaningful for proper texture filtering and LOD
// selection.
int SplitPow = 1 << Split;
int SplitX = abs(ddx(SplitPow));
int SplitY = abs(ddy(SplitPow));
int SplitXY = abs(ddx(SplitY));
int SplitMax = max(SplitXY, max(SplitX, SplitY));
Split = SplitMax > 0 ? SplitPowLookup[SplitMax-1] : Split;
...
The idea is that while differencing doesn't give us an idea about where we are in an arithmetic sequence (ex. it will always return 1 or 0 for a sequence like 0, 0, 1, 2, 2, 2, 3, ...), it *does* tell us where we are in a geometric sequence. In particular, we will recover 2^(x+1)-2^(x) = 2^x, so taking the log2, we can recover x! (SplitPowLookup is just a small log2 lookup table). This allows us to make an choice about which split to use (x or x+1) and guarantee that the choice will be the same in the other pixels in the quad. Thus, the derivatives will be meaningful.

Ugly trick, I know, but it works like a charm! It's also less ugly than my first idea which was to compute which pixel the current fragment is in the quad using vpos % 2 :)

Anyways grab the demo here: Variance Shadow Maps Demo (April 26, 2007)
Source will be released with GPU Gems 3, and the accompanying chapter covers pretty much everything you ever wanted to know about variance shadow maps and shadow map filtering in general.

Please note the requirements (as detailed in the included Readme):

  • Any reasonably modern CPU/RAM
  • Windows Vista (for D3D10)
  • A D3D10 capable video card
  • DirectX Redist April 2007
    Available free from http://www.microsoft.com (search for the above)
  • Visual C++ 2005 Redistributable Package
    Available free from http://www.microsoft.com (search for the above)

Some screenshots:

SAVSM:


PSVSM:
Note that this quality level is greatly superior to stock PSSM/CSM, even when the latter have much larger shadow maps and more splits. Check out the PSSM demos if you don't believe me.


PSVSM Splits:
Zoom in and note that pixel quad trick at work.


VSM for the same scene:
Note how much poorer it looks compared to the above - the effect is even more pronounced in motion, and for lower resolution shadow maps.


In any case, go pick up a copy of Gems 3 when it comes out and enjoy the chapter. I'm pretty happy with it so far, and I think anyone interested in real-time shadowing algorithms will find that it contains a lot of useful information. Plus I'm sure that the rest of Gems 3 will be at least as good, if not better than my chapter, so it's worth it in any case ;)

To summarize though:
- For constant filter-widths, variance shadow maps + all the filtering and MSAA the hardware can give you + blurring is amazing and super-fast.
- For variance filter-widths (plausible soft shadows algorithms), summed-area variance shadow maps with int32 + MSAA also looks amazing and is quite fast as well, especially compared to the alternatives.

Also, if anyone happens to be going to I3D 2007 next week, come check out my poster; I'd love to chat!.

Enjoy,
Andrew Lauritzen
University of Waterloo / RapidMind Inc.

PS: Some people have asked about a 360/PS3 demo of variance shadow maps. I'd love to do it, but I don't have access to the necessary hardware and dev kits right now. Of course if anyone can help on that front... :p

[EDIT] Updated demo to include Readme.
 
Last edited by a moderator:
My eye-sockets are clearly not big enough right now. :oops: Well done. :)

Btw, there's no readme in the rar file. :p

PS: Some people have asked about a 360/PS3 demo of variance shadow maps. I'd love to do it, but I don't have access to the necessary hardware and dev kits right now. Of course if anyone can help on that front... :p
How about using the XNA tools for the 360? "All" you need is a 360 + HDD and the XNA Creator's Club membership.


Q: How much does XNA Game Studio Express cost? Is there a difference between Windows and Xbox 360 development?
A: Visual C# Express, the XNA Game Studio Express tools and runtime environment for Windows are all FREE. To develop, debug and/or play games on the Xbox 360, however, you must have an XNA Creators Club subscription purchased directly from the Xbox Live Marketplace. Two subscription options are available: $99 per year or $49 per four months.
http://msdn2.microsoft.com/en-us/xna/aa937795.aspx
 
Ok this is a random thought, you mention problems on the quad level with splits differing between quads - on the 360 shader implementation there are special attributes for if branches that make them operate at the quad level, an 'ifall' and 'ifany' style. Could these possibly be used for a more elegant solution? (these are exposed in XNA)

Looks very good :)
 
Btw, there's no readme in the rar file. :p
Whoops, sorry... I went to all the trouble of updating it and then forgot to include it :) I'll add it to the zip when I get home.

How about using the XNA tools for the 360? "All" you need is a 360 + HDD and the XNA Creator's Club membership.
I don't have a 360 :( Also I was under the impression that the XNA tools don't give you a whole lot of access to the specific hardware. In particular, I'd need control over the EDRAM and a few other hardware-specific features to make an efficient implementation. Plus I'd have to recode it all in C#, although I'd be willing to do that if necessary...

Graham said:
Ok this is a random thought, you mention problems on the quad level with splits differing between quads - on the 360 shader implementation there are special attributes for if branches that make them operate at the quad level, an 'ifall' and 'ifany' style. Could these possibly be used for a more elegant solution? (these are exposed in XNA)
That's interesting, I hadn't heard of these instructions. They could probably be used to similar effect, but I'm unsure... if they don't tell you which pixel in the quad you are, then they are no better than checking if derivatives == 0. I'll look into that though as it sounds interesting.
 
Actually I think I might be wrong.
The official documentation is a bit less clear...

http://msdn2.microsoft.com/en-us/library/bb313968.aspx

Code:
ifAll 	Executes the conditional part of an if statement when the condition is true for all threads on which the current shader is running. 
ifAny 	Executes the conditional part of an if statement when the condition is true for any thread on which the current shader is running.

Sorry bout that

[edit]
*HOOT HOOT*

this may be even better:

http://msdn2.microsoft.com/en-us/library/bb313942.aspx

Code:
getCompTexLOD2D 	
For 2D textures, gets the LOD for all of the pixels in the quad at the specified coordinates.
 
this may be even better:
getCompTexLOD2D
No dice - it takes the texture coordinates as a parameter and thus will have the same problem for different splits. I'm pretty sure that the only way one can solve this (other than my current solution) is if there was some instruction that forced - arbitrarily - the four pixels in a quad to go down the same control flow path, or choose the same value.
 
Last edited by a moderator:
First, thanks for the front-page link!

Second, I'd just like to mention that there's nothing preventing one from doing summed-area variance shadow maps with parallel-split (or cascaded) shadow maps (PSSAVSM? ;)). Indeed that would probably be a really good combination for two reasons:

1) PSSM generally only needs smaller shadow maps (3 512x512's is usually plenty) which are significantly cheaper to generate SATs with and have less numeric problems to boot (although with int32, numerics are pretty much fine even for large shadow maps).

2) The texel ratio scaling can cause some large filter widths in some cases which can get expensive with standard blurred VSMs. This wouldn't be a problem at all with SAVSM.

The filtering techniques and projection splitting/warping techniques are entirely orthogonal, and I would have implemented them this way in the demo if not for the absolute inadequacy of HLSL/Fx for doing that sort of abstraction and encapsulation. Thus the only reasons why I only implemented PSVSM are time and laziness :)

PSSAVSM would indeed be very cool. Why not even throw in some soft shadows using PCSS and get parallel-split soft summed-area variance shadow maps (PSSSAVSM? :D). One could even throw in some trapezoidal shadow mapping on each of the splits, but may be getting excessive - particularly on the acronym front ;)
 
Last edited by a moderator:
Hi there.First to thank AndyTX's VSM source code.VSM is a decent way to take care of shadow map.No doubt that Variance Shadow Maps can be continually improved along with Hardware upgrading.The new techniques in your VSM demo for D3D10 is awesome.It will be more meaningful and effective if you put those new techniques back for D3D9.Keep good working:)

Attaching some screenshots for my VSM test running on Virtools 3.5 with
D3D9/ShaderModel2.0.

Shadow based on VSM
1

http://img243.imageshack.us/img243/8515/devr20070427b1ak0.jpg

2

http://img132.imageshack.us/img132/1811/devr20070427b2wm1.jpg

3

http://img227.imageshack.us/img227/1662/devr20070427b3bd5.jpg

4

http://img221.imageshack.us/img221/3548/devr20070427b4hh0.jpg
 
512x512
img224.imageshack.us/img224/5228/devr200704275121qv2.jpg

1024x1024
img228.imageshack.us/img228/8459/devr2007042710241pe5.jpg

Parallel-split researching...
img172.imageshack.us/img172/6964/devr2007042714530623tx5.jpg

Above all are 4xAA screenshots.
 
It will be more meaningful and effective if you put those new techniques back for D3D9.
Yeah I know... I'm just lazy and back-porting to older APIs/hardware just isn't fun, which is a primary motivating factor when you're working for no pay ;)

Thanks for sharing the screenshots! Shadows are nice, and I'm looking forward to seeing the fully-shaded scenes. One minor note about the code in the second window - with respect to VSM epsilon while I originally modified that variance as you have done there, recently I've just been clamping the minimum variance to some value, which seems to work pretty well and is a bit simpler. Either way should be fine though, just thought I'd mention.
 
Hey Andy,
great looking screenshot.
The idea of using a distribution of shadow samples based on probability is great. The main problem with Variance shadow maps are the light bleeding artefacts. It seems like some people found ways to get around those by choosing a different probability equation (covered in ShaderX6 :) ... couldn't resist).
Did you use a trick to get rid of light bleeding in your plain variance shadow map implementation?

After having solved depth aliasing / light bleeding artefacts and after we have found ways to blur the penumbra with hardware, the next frontier is probably how to construct the light view frustum in the most efficient way to save more cycles by using smaller maps. Additionally low-res shadow maps start flickering when you move around the camera. This is also an interesting area of research.
Did you look into light view frustum construction a bit ... any advice?

- Wolfgang
 
The idea of using a distribution of shadow samples based on probability is great. The main problem with Variance shadow maps are the light bleeding artefacts. It seems like some people found ways to get around those by choosing a different probability equation (covered in ShaderX6 :) ... couldn't resist).
Yeah certainly that's the main limitation, and an area for future research. What I'd really love to see is a way to extend a Chebyshev-like inequality to a larger (or arbitrary) number of moments, and still have it be a good approximation to the case of N delta functions (occluders). I've done a bit of work on that front but from my initial research it is a very difficult problem to solve. I'd certainly be interested in what is being done by the people in ShaderX6. (Did you still want me to review anything?)

Did you use a trick to get rid of light bleeding in your plain variance shadow map implementation?
I use a simple "light bleeding reduction" function as I discussed with you earlier and in the previous SAVSM thread. It works fairly well in my experience although with large blurs you can still construct situations with stubborn light bleeding.

I actually briefly sketch a proof in the Gems 3 chapter that one can't perfectly reconstruct the visibility function without being able to fall back on brute-force PCF in the worst case. An adaptive algorithm based on VSM is straightforward and may be one way of doing that in the future.

Unfortunately this implies that there is no silver bullet to visibility, however what we'd like to find is an efficient approximation. VSM actually works rather well for only two pieces of information, but it would be nice to have a way to extend it if one is willing to sacrifice more storage/performance for better quality. More moments is an obvious extension since they are trivial to compute, but as I mentioned above a suitable approximating function that uses them would have to be found.

After having solved depth aliasing / light bleeding artefacts and after we have found ways to blur the penumbra with hardware
I'd argue that SAVSM's let you efficiently blur the penumbra to an arbitrary degree in hardware, although I'm not convinced that PCSS-like "blocker search" is the best approach for soft shadows. I am convinced however that SAVSMs will be a useful tool for a shadow-map based plausible soft shadows algorithm.

the next frontier is probably how to construct the light view frustum in the most efficient way to save more cycles by using smaller maps.
Definitely this is an area for future research, and it is largely orthogonal to shadow filtering (VSM and PCF being the only two algorithms that address this right now). I'm personally a fan of the parallel-split shadow map/CSM-like approaches right now, although they can certainly be improved as well. A notable paper can be found here: Warping and Partitioning for Low Error Shadow Maps. It discusses many methods of warping and partitioning shadow maps and how well they perform. While the analysis is necessarily limited to a few simple cases, the results are still noteworthy: z-partitioning does a pretty good job by itself, and warping each of the partitions (TSM, LiPSM, PSM, whatever) can help as well. Face partitioning does well in a few cases but IMHO isn't worth the effort for average scenes.

Additionally low-res shadow maps start flickering when you move around the camera. This is also an interesting area of research.
This is actually one of the recent things that I've discovered and love about VSM: you can use multisampling when rendering the shadow map! Using 4x MSAA has a fairly small performance hit and really eliminates flickering very effectively - much more-so than even a really large blur (10x10 for instance). The combination of PSSM, 4x MSAA and a ~5x5 blur looks absolutely stunning with no flickering or swimming with animated lights/cameras, even with fairly small shadow maps (3x 512x512 for instance).

To this end, I encourage you to check out the demo. Use the "Convoy" scene, PSVSM with 3 or 4 slices, 4x shadow MSAA and set the softness just high enough to eliminate any flickering or swimming (shouldn't take much).

Did you look into light view frustum construction a bit ... any advice?
I haven't looked into frustum partitioning a lot myself - I just implemented the "practical split scheme" from the original PSSM paper. So far I've chosen to concentrate on the filtering front and leave that problem to others, and then I can just implement the good solutions :) According to here there is going to be a Gems 3 chapter that covers PSSM, so I suspect it will go into these sorts of problems in more detail.

Anyways thanks for your input and interest. I'd definitely be interested in discussing/reviewing any pertinent chapters for ShaderX6, so please PM or e-mail me when you get the chance.

Cheers!
Andrew
 
What I'd really love to see is a way to extend a Chebyshev-like inequality to a larger (or arbitrary) number of moments, and still have it be a good approximation to the case of N delta functions (occluders).
There are 3 and 4 moments Chebyshev-like inequalities in literature, but they don't sadly fix light bleeding.
Adding a third moment basicly gets you the same results of VSM without using any bias (just cause skewness gives you information about asymmetries in the Z distribution), adding a fourth moment ameliorate things a bit but the results are too instable and the fair amount of data and computation needed in a such model makes it a no go, imho.
If you postulate a distribution a priori (sum of 2 or more dirac's deltas, or a mixture of 2 gaussians..) you can then try to fit your moments to the related cumulative distribution function (and get rid of Cheb's inequality) and get some nice results where your Z distribution fits these models and some supercrappy stuff in any other case :(
 
There are 3 and 4 moments Chebyshev-like inequalities in literature, but they don't sadly fix light bleeding.
Yeah I messed with a 4-moments version but as you mention it didn't seem to help with light bleeding at all... I was sure whatever it was just me implementing it wrong though as the math is that paper was highly non-trivial ;)

Adding a third moment basicly gets you the same results of VSM without using any bias (just cause skewness gives you information about asymmetries in the Z distribution), adding a fourth moment ameliorate things a bit but the results are too instable and the fair amount of data and computation needed in a such model makes it a no go, imho.
That's really good information; thanks a lot!

If you postulate a distribution a priori (sum of 2 or more dirac's deltas, or a mixture of 2 gaussians..) you can then try to fit your moments to the related cumulative distribution function (and get rid of Cheb's inequality) and get some nice results where your Z distribution fits these models and some supercrappy stuff in any other case :(
Yeah I played with some similar ideas myself. In particular you can actually use the 4 moments and solve a non-linear system of equations to recover the location of several delta functions but the problem is that it does not degenerate well in the "no solution" case. Chebyshev's inequality is actually fairly reasonable in that it obtains exact results in arguably the most common case, and does something "relatively reasonable" in more complicated situations.

The other ideas that I played with are using several depth layers similar to depth peeling so that each distribution would contain only two delta functions. However in the straightforward implementations one has to sacrifice either inner or outer penumbrae, and in the more complicated case it degenerates into something which is arguable just as much work as just biting the bullet and doing full deep shadow maps.

It doesn't seem like an easy problem to solve efficiently, which is why I'm particularly interested in the work that wolfgang mentioned.

PS: It sounds like you've experimented with VSMs quite a bit nAo. Is that true or is this all theoretical?
 
PS: It sounds like you've experimented with VSMs quite a bit nAo. Is that true or is this all theoretical?
It's true, I've implemented all the things I was talking about, that's why I do believe (it's not the first time I write this :) ) there's a lot more work to do in this area.
 
It's true, I've implemented all the things I was talking about, that's why I do believe (it's not the first time I write this :) ) there's a lot more work to do in this area.
Awesome, yeah me too. I've been hoping that the initial paper will spawn research as well. I'm glad to hear that people are messing around with it and coming up with improvements and new ideas! :D
 
To andy
No pay doesn't mean no value.It is unfair all the time when people creat huge value without reasonable reward.It is abnormal so that need someone to do meaningful thing to correct it.That is also my driving force to keep going.

Actually there are lots of ways to make shadow,I've tried some clumsy ways to achieve soft shadow like I ever used shadow system in virtools plus some post-process effect.
1

http://img223.imageshack.us/img223/7958/hls1df5.jpg

2
http://img213.imageshack.us/img213/8619/hls2th9.jpg

But they are used in Virtools only.There will be different story to creat Shadows based on VSM that means I can incorporate them into other engines when it is necessary.VSM is great!

"I'm looking forward to seeing the fully-shaded scenes"

I think you have to wait until my fully-shaded scene done:)You can view some of my previously incomplete shader test but no realtime shadows.

1

http://img135.imageshack.us/img135/2194/devr2007042715044175ng8.jpg

2
http://img155.imageshack.us/img155/5492/devr2007042715052667wi2.jpg

3
http://img155.imageshack.us/img155/4930/devr2007042715075120tg0.jpg

4
http://img91.imageshack.us/img91/8617/devr2007042716294512sh2.jpg

5
http://img179.imageshack.us/img179/1453/devr2007042716301935qs3.jpg

6 compatiable to ATI card
http://img224.imageshack.us/img224/9317/devr2007042812371109et7.jpg

7 compatiable to ATI card
http://img153.imageshack.us/img153/7536/devr2007042812371626fa4.jpg

8
http://img153.imageshack.us/img153/7466/devr2007042812432170ft3.jpg

9
http://img151.imageshack.us/img151/7911/devr2007042812453890vh6.jpg

10
http://img245.imageshack.us/img245/3350/nova03us3.jpg

11
http://img82.imageshack.us/img82/4973/nova04fo2.jpg
 
Andy gets a shout out from the Inq - http://www.theinquirer.net/default.aspx?article=39224

party0011.gif
 
Back
Top