Hello again all. I promised to post the updated D3D10 demo of variance shadow maps, so here it is!
Since the original summed-area variance shadow maps thread and demo, I've done a decent bit of work. In particular, the application has been ported to D3D10, improved in several ways, and new techniques, features, and a new scene have been added.
A summary of the available techniques:
Shadow MSAA makes a *huge* difference in motion (use "animate light" checkbox). It really has to be seen to be believed, but even for really large minimum filter widths, swimming is still somewhat visible without MSAA. With even 4x MSAA swimming is drastically reduced or eliminated.
int32 is also really awesome for summed-area tables, and is the preferred implementation. Two things make this the case: the extra bits of precision over fp32, and the overflow behavior in D3D10. The latter works because overflow is wrapped in D3D10 which means that we only need to waste W*H bits of the SAT for accumulation where WxH are the dimensions of the maximum filter width. This maximum filter width can be bounded fairly conservatively (ex. 64x64 is plenty - probably overkill for most implementations). The results of int32 make numeric precision a non-issue again, and save a ton of memory bandwidth since there's no need to distribute precision into 4 components.
Parallel-split variance shadow maps are also really cool, especially with the new, larger "convoy" scene. Three 512x512 variance shadow map splits with 4x MSAA and a bit of blurring looks fantastic and has excellent performance, and the quality can go up from there if necessary. Note that this implementation is relatively unoptimized; it's more of a "proof of concept". In particular, the shadow split locations could be chosen a lot more sensibly, and even some basic frustum culling would greatly improve the performance of rendering the different shadow map splits.
There's a few details that come up with PSVSM that I thought you guys may be interested in as well. Feel free to skip over the next few paragraphs if not.
First to get consistent blurring over the different shadow splits one needs to scale the blur kernel size by the ratio of texel sizes between the current split and the full shadow frustum. This is fairly easy to do and very nicely hides the split locations as well, even in motion (see screenshots below). Note that in this demo I round the scaled filter widths to the nearest integer for simplicity; quality could certainly be improved further by allowing non-integer blur kernels.
The second detail is the "fun" one: the splits are rendered into a texture array and the applicable split is computed in the fragment shader when shading the scene. This split index is used to choose the appropriate projection matrix and texture array element. This poses a problem, however, for computing texture coordinate derivatives and LOD, since different pixels in the same quad may choose different slices, resulting in unrelated texture coordinates and nonsensical derivatives/LOD. Other implementations have not noticed this problem because without variance shadow maps (and proper shadow filtering), there *is* no LOD computation as things are at best bilinearly sampled.
To solve this problem we really want to arbitrarily choose one of the two split indices and make sure they are consistent across the quad (note that I'm assuming only two split indices in a quad, but this turns out to be completely reasonable). After some head scratching I came up with the following code:
The idea is that while differencing doesn't give us an idea about where we are in an arithmetic sequence (ex. it will always return 1 or 0 for a sequence like 0, 0, 1, 2, 2, 2, 3, ...), it *does* tell us where we are in a geometric sequence. In particular, we will recover 2^(x+1)-2^(x) = 2^x, so taking the log2, we can recover x! (SplitPowLookup is just a small log2 lookup table). This allows us to make an choice about which split to use (x or x+1) and guarantee that the choice will be the same in the other pixels in the quad. Thus, the derivatives will be meaningful.
Ugly trick, I know, but it works like a charm! It's also less ugly than my first idea which was to compute which pixel the current fragment is in the quad using vpos % 2
Anyways grab the demo here: Variance Shadow Maps Demo (April 26, 2007)
Source will be released with GPU Gems 3, and the accompanying chapter covers pretty much everything you ever wanted to know about variance shadow maps and shadow map filtering in general.
Please note the requirements (as detailed in the included Readme):
Some screenshots:
SAVSM:
PSVSM:
Note that this quality level is greatly superior to stock PSSM/CSM, even when the latter have much larger shadow maps and more splits. Check out the PSSM demos if you don't believe me.
PSVSM Splits:
Zoom in and note that pixel quad trick at work.
VSM for the same scene:
Note how much poorer it looks compared to the above - the effect is even more pronounced in motion, and for lower resolution shadow maps.
In any case, go pick up a copy of Gems 3 when it comes out and enjoy the chapter. I'm pretty happy with it so far, and I think anyone interested in real-time shadowing algorithms will find that it contains a lot of useful information. Plus I'm sure that the rest of Gems 3 will be at least as good, if not better than my chapter, so it's worth it in any case
To summarize though:
- For constant filter-widths, variance shadow maps + all the filtering and MSAA the hardware can give you + blurring is amazing and super-fast.
- For variance filter-widths (plausible soft shadows algorithms), summed-area variance shadow maps with int32 + MSAA also looks amazing and is quite fast as well, especially compared to the alternatives.
Also, if anyone happens to be going to I3D 2007 next week, come check out my poster; I'd love to chat!.
Enjoy,
Andrew Lauritzen
University of Waterloo / RapidMind Inc.
PS: Some people have asked about a 360/PS3 demo of variance shadow maps. I'd love to do it, but I don't have access to the necessary hardware and dev kits right now. Of course if anyone can help on that front...
[EDIT] Updated demo to include Readme.
Since the original summed-area variance shadow maps thread and demo, I've done a decent bit of work. In particular, the application has been ported to D3D10, improved in several ways, and new techniques, features, and a new scene have been added.
A summary of the available techniques:
- Normal ugly shadow maps.
- Hardware accelerated percentage-closer filtering.
- Variance shadow maps with trilinear/anisotropic filtering, and blurring to clamp minimum filter width. Also supports multisampling.
- Summed-area variance shadow maps as described in the previous thread, except now with support for multisampling, as well as both an fp32 and int32 implementation.
- Parallel-split variance shadow maps, which help magnification as well.
Shadow MSAA makes a *huge* difference in motion (use "animate light" checkbox). It really has to be seen to be believed, but even for really large minimum filter widths, swimming is still somewhat visible without MSAA. With even 4x MSAA swimming is drastically reduced or eliminated.
int32 is also really awesome for summed-area tables, and is the preferred implementation. Two things make this the case: the extra bits of precision over fp32, and the overflow behavior in D3D10. The latter works because overflow is wrapped in D3D10 which means that we only need to waste W*H bits of the SAT for accumulation where WxH are the dimensions of the maximum filter width. This maximum filter width can be bounded fairly conservatively (ex. 64x64 is plenty - probably overkill for most implementations). The results of int32 make numeric precision a non-issue again, and save a ton of memory bandwidth since there's no need to distribute precision into 4 components.
Parallel-split variance shadow maps are also really cool, especially with the new, larger "convoy" scene. Three 512x512 variance shadow map splits with 4x MSAA and a bit of blurring looks fantastic and has excellent performance, and the quality can go up from there if necessary. Note that this implementation is relatively unoptimized; it's more of a "proof of concept". In particular, the shadow split locations could be chosen a lot more sensibly, and even some basic frustum culling would greatly improve the performance of rendering the different shadow map splits.
There's a few details that come up with PSVSM that I thought you guys may be interested in as well. Feel free to skip over the next few paragraphs if not.
First to get consistent blurring over the different shadow splits one needs to scale the blur kernel size by the ratio of texel sizes between the current split and the full shadow frustum. This is fairly easy to do and very nicely hides the split locations as well, even in motion (see screenshots below). Note that in this demo I round the scaled filter widths to the nearest integer for simplicity; quality could certainly be improved further by allowing non-integer blur kernels.
The second detail is the "fun" one: the splits are rendered into a texture array and the applicable split is computed in the fragment shader when shading the scene. This split index is used to choose the appropriate projection matrix and texture array element. This poses a problem, however, for computing texture coordinate derivatives and LOD, since different pixels in the same quad may choose different slices, resulting in unrelated texture coordinates and nonsensical derivatives/LOD. Other implementations have not noticed this problem because without variance shadow maps (and proper shadow filtering), there *is* no LOD computation as things are at best bilinearly sampled.
To solve this problem we really want to arbitrarily choose one of the two split indices and make sure they are consistent across the quad (note that I'm assuming only two split indices in a quad, but this turns out to be completely reasonable). After some head scratching I came up with the following code:
Code:
// GLOBAL
const int SplitPowLookup[8] = {0, 1, 1, 2, 2, 2, 2, 3};
// IN FRAGMENT SHADER:
...
// Compute which split we're in
int Split = dot(1, Input.SliceDepth > g_Splits);
// Ensure that every fragment in the quad choses the same split so that
// derivatives will be meaningful for proper texture filtering and LOD
// selection.
int SplitPow = 1 << Split;
int SplitX = abs(ddx(SplitPow));
int SplitY = abs(ddy(SplitPow));
int SplitXY = abs(ddx(SplitY));
int SplitMax = max(SplitXY, max(SplitX, SplitY));
Split = SplitMax > 0 ? SplitPowLookup[SplitMax-1] : Split;
...
Ugly trick, I know, but it works like a charm! It's also less ugly than my first idea which was to compute which pixel the current fragment is in the quad using vpos % 2
Anyways grab the demo here: Variance Shadow Maps Demo (April 26, 2007)
Source will be released with GPU Gems 3, and the accompanying chapter covers pretty much everything you ever wanted to know about variance shadow maps and shadow map filtering in general.
Please note the requirements (as detailed in the included Readme):
- Any reasonably modern CPU/RAM
- Windows Vista (for D3D10)
- A D3D10 capable video card
- DirectX Redist April 2007
Available free from http://www.microsoft.com (search for the above) - Visual C++ 2005 Redistributable Package
Available free from http://www.microsoft.com (search for the above)
Some screenshots:
SAVSM:
PSVSM:
Note that this quality level is greatly superior to stock PSSM/CSM, even when the latter have much larger shadow maps and more splits. Check out the PSSM demos if you don't believe me.
PSVSM Splits:
Zoom in and note that pixel quad trick at work.
VSM for the same scene:
Note how much poorer it looks compared to the above - the effect is even more pronounced in motion, and for lower resolution shadow maps.
In any case, go pick up a copy of Gems 3 when it comes out and enjoy the chapter. I'm pretty happy with it so far, and I think anyone interested in real-time shadowing algorithms will find that it contains a lot of useful information. Plus I'm sure that the rest of Gems 3 will be at least as good, if not better than my chapter, so it's worth it in any case
To summarize though:
- For constant filter-widths, variance shadow maps + all the filtering and MSAA the hardware can give you + blurring is amazing and super-fast.
- For variance filter-widths (plausible soft shadows algorithms), summed-area variance shadow maps with int32 + MSAA also looks amazing and is quite fast as well, especially compared to the alternatives.
Also, if anyone happens to be going to I3D 2007 next week, come check out my poster; I'd love to chat!.
Enjoy,
Andrew Lauritzen
University of Waterloo / RapidMind Inc.
PS: Some people have asked about a 360/PS3 demo of variance shadow maps. I'd love to do it, but I don't have access to the necessary hardware and dev kits right now. Of course if anyone can help on that front...
[EDIT] Updated demo to include Readme.
Last edited by a moderator: