Three new DirectX 11 demos

Optimal depth bounds are very important because of the EVSM filtering, and that's why I want to have the tight SDSM z-bounds as well.
Yeah agreed it really does cut down on the light bleeding significantly, and in the places where it matters the most (near the viewer). It's funny to see someone be like "is that light bleeding?" and as they walk up to it it fades away ;) Pretty happy with how the two techniques help one another here.

But with tighter SDSM bounds and some fine tuning, the 16 bits are likely enough (at least I hope so).
Yeah I'm crossing my fingers :) Let me know if it works out.

I plan to use the z-buffer we render for the virtual texture page fetching to feed the SDSM system. It's 4x4 smaller than the game resolution, so the downsampling performance should be better.
Makes sense. Indeed it will be possible to "miss" shadow samples then but doing some sort of reasonable default (in shadow, out of shadow, or just clamp to edge) should work ok.

But I could modify it to render the next frame almost perfectly: perfect next frame camera matrix and near perfect object movement approximation (position & rotation added with velocity & angular velocity). This would allow me to lock the result next frame (prevent lock stalls) and use the information for SDSM (it should provide more correct result than just using last frame min/max xyz).
Ah yes very clever - should be able to get around the frustum culling stall that way!

And the downsampled depth buffer is really useful for occlusion culling as well (we plan to have similar real time occlusion culling system than they have in CryEngine 3)... Two weeks ago I refactored our tile based deferred away, but if we get near perfect tile near/far depth values (without lock stalls), it might change my plans a bit :)
Yep indeed - it seems like a really useful data structure to generate regardless. I use the "first few levels" of it in the deferred shading demo as well for light volume culling :)
 
I did some analysis for SDSM for our game content, and unfortunately my analysis doesn't look that good for us.

We have around 2 kilometer view distance, and almost always (during the game play) you will see parts of the horizon, so the z-max reduction doesn't help. The z-near reduction helps a bit, since usually we have around 3 meters of air before our character (3rd person viewport). I guess we are lucky, since we do not have a weapon glued to the screen almost clipping the near plane (like all fps shooters have).

The light space xy bounds of each cascade do not help that much either, since we do not have that many big blockers narrowing the view. I did a shader that plots all our visible screen pixels to the light space (selects the cascade and the texture coordinates exactly by the same math as our lighting shader). The nearest 2 cascades are mostly filled to the borders. We choose the cascade by selecting the first cascade that results in texture coordinates in [0,1] range (multiple transformations). This is most likely the biggest reason why we sample almost 100% of the cascade area (for the first cascades). Selecting the PSSM cascade by z-distance would likely result in much bigger areas of unused space in each cascade, but I am unsure if that would be a good idea.

PSSM shadow cascades (left), and the sampled pixels plotted (right). As you see, the light shader seems already to sample almost the whole PSSM (rectange cannot be tightened for any other than the last cascade):
http://img513.imageshack.us/img513/2918/shadowsampling.jpg
 
We have around 2 kilometer view distance, and almost always (during the game play) you will see parts of the horizon, so the z-max reduction doesn't help.
Yeah the z-max reduction is actually the least important part in any case. Near is what matters (critically!) for the logarithmic distribution.

I guess we are lucky, since we do not have a weapon glued to the screen almost clipping the near plane (like all fps shooters have).
Right :) I'd handle the gun case separately in practice as it isn't really a part of "scene" per se.

The light space xy bounds of each cascade do not help that much either, since we do not have that many big blockers narrowing the view.
Yeah for these to help much you need either occlusion or empty space. Both of these are pretty common in a lot of scenes, but if you have a scene where your light is pretty high in the sky (vs on the horizon) and a sparse scene you can definitely have a case where it won't matter much. That said, the good news about this case is that it's also pretty easy to just parameterize analytically (which is incidentally why the simple scene-independent solutions work pretty well). Judging from your image, the best additional thing you could do is apply a warping to each shadow partition (log PSMs ideal of course, but liPSM or something is more practical). Of course warping the partitions complicates consistent edge softening assuming that you are prefiltering your shadow maps.

So curiously are there not cases where someone gets near large/semi-large objects (relative to the size of the player)? These cases are usually the ones that cause standard cascades to exhibit artifacts but indeed the ones that SDSM can take good advantage of by tightening up both the z-range and partitions.

The last nice advantage of SDSMs is of course avoiding tweaking of partition ranges/PSSM lambdas/etc. This is nice as it frees up some artist time and avoids problems late in production when cameras/scenes change after partition ranges have been set up. That said if you're a third person game with a fixed camera distance and a fairly sparse scene this might not be an issue either.

In any case sounds like it might not be a good fit for your scene, but the positive way to look at that is that your scene is already well-fit by standard CSMs :) I'm still curious what the additional cost of the SDSM analysis would be on a console though so if you do happen to implement it do let me know!

Thanks for the update.
 
Right :) I'd handle the gun case separately in practice as it isn't really a part of "scene" per se.
But if you are using a deferred renderer, doing the gun on a separate pass is not that elegant. You basically have to forward render it, and to be sure that the deferred rendering & lighting is not applied to those pixels (render the gun first to stencil buffer for example, or a lot of performance is wasted). You really want that the gun also receives the shadows, since it would look really weird if the gun shines fully bright, when the surrounding geometry is shadowed (you enter a small building in the terrain for example -- in our case the roofs also have some cracks letting the sun light in, but only partially).

Judging from your image, the best additional thing you could do is apply a warping to each shadow partition (log PSMs ideal of course, but liPSM or something is more practical). Of course warping the partitions complicates consistent edge softening assuming that you are prefiltering your shadow maps.
I was thinking about some kind of trapezoid warp, but unforunately it's not a linear transform (cannot be simply put in the light matrix & inverse of it), so it requires some extra math in both sides of the equation, and like you said causes other issues as well.

So curiously are there not cases where someone gets near large/semi-large objects (relative to the size of the player)? These cases are usually the ones that cause standard cascades to exhibit artifacts but indeed the ones that SDSM can take good advantage of by tightening up both the z-range and partitions.
Yes, there's some blocker geometry, but unfortunately in the most common case you see pretty far, since forest and trees are the most common view blockers, and their leaves always leave some gaps that allow you to see really far. In most games, you have linear paths though the levels, allowing the developer to put lots of huge view/movement blockers along the path. In our case, we have a world that allows you to move everywhere, as we have an in game level editor that allows you to fly over any geometry. Unfortunately for us technology guys, our artists tend to like levels that a located on top of some big (but thin) structures (a nice vertigo feel), and this kind of setting gives a huge visible view distance... but at the same time, the real game play happens very near the camera (so the near plane cannot be moved that far).

The terrain itself is of course a good view blocker, but it rarely cuts the z-max that much, since the horizon is often visible in the camera view. However it cuts the light space cascade z-max considerably, and that should improve the EVSM quality nicely.

The last nice advantage of SDSMs is of course avoiding tweaking of partition ranges/PSSM lambdas/etc. This is nice as it frees up some artist time and avoids problems late in production when cameras/scenes change after partition ranges have been set up.
I agree this is one of the key advantages in SDSM.

In any case sounds like it might not be a good fit for your scene, but the positive way to look at that is that your scene is already well-fit by standard CSMs :)
SDSM will help a lot in some scenes, but for the worst case scenes it only helps a bit (we unfortunately have a lot of worst case scenes - and have no control over it, since the player created content). SDSM is still a very good technique. It saves some artist work, and improves the quality and performance a bit even in the worst case scenarios. And naturally we develop our technology in a long run, so there will be future games that gain more from SDSM than our current one. I am pretty sure we will use the z-min / z-max system (it's really fast to generate on GPU, and the tight near plane alone improves the quality nicely), but the per cascade bounds do not seem that useful for us right now (but I'd really like to have the tighter z-bound for the EVSM). But let's see how it goes.
 
I'm still curious what the additional cost of the SDSM analysis would be on a console though so if you do happen to implement it do let me know!

Thanks for the update.
SDSM Update (part 1):

The SDSM near-z and far-z search alone improved the shadow map quality nicely in our game view. Our PSSM near plane was set to 0.5 meters, and now during the game play the SDSM near plane is fluctuating between 3 to 5 meters. The far plane is sometimes as close as 100 meters (buildings etc blocking the view), but usually howers around 1.5 kilometers.

But the most impressive improvement was in the editor. When you fly around in the air and look at the scenery below, the near plane can be pushed as far as 100-500 meters away, giving everything almost pixel perfect shadows (we have four cascades 512x512 pixels each, rendered with hardware 4xMSAA and EVSM filtering).

I'll post you the performance improvements on the console platform, when the algorithm runs fully on the console as well. I implemented the recursive downsampling method on PC (DX11) first (using our virtual texture small z-buffer as my depth data source). It seems that using last frame data is enough and no graphics glitches are visible (we render at vsynch locked 60 fps), so there's no need to cause a lock stall.

Thanks for the good algorithm. Our artists are really happy already :)
 
You really want that the gun also receives the shadows, since it would look really weird if the gun shines fully bright, when the surrounding geometry is shadowed (you enter a small building in the terrain for example -- in our case the roofs also have some cracks letting the sun light in, but only partially).
Agreed although that kind of implies that it has to be at a realistic place in the "scene" (to project its coordinates back into world space for shadow/lighting work) rather than just "plastered to the near plane" :) And indeed if it's at a realistic place that is really near the viewer but you want proper shadows on it... well there's no avoiding dedicating a high-resolution cascade to that range really :S

I was thinking about some kind of trapezoid warp, but unforunately it's not a linear transform (cannot be simply put in the light matrix & inverse of it), so it requires some extra math in both sides of the equation, and like you said causes other issues as well.
Yeah it gets a bit ugly which is why I normally avoid warps but if you do have resolution issues (particularly near the viewer) it might be worth looking at.

In our case, we have a world that allows you to move everywhere, as we have an in game level editor that allows you to fly over any geometry. Unfortunately for us technology guys, our artists tend to like levels that a located on top of some big (but thin) structures (a nice vertigo feel), and this kind of setting gives a huge visible view distance... but at the same time, the real game play happens very near the camera (so the near plane cannot be moved that far).
Makes sense - gameplay/design comes first of course :) Terrain visibility (assuming a simple height map) seems possible to exploit analytically actually if it gives much benefit (rather than in image space like SDSM) - have you played with that at all or would the delta not be worth it?

SDSM will help a lot in some scenes, but for the worst case scenes it only helps a bit (we unfortunately have a lot of worst case scenes - and have no control over it, since the player created content).
Indeed. It of course shouldn't ever do *worse* than standard CSMs but the cost of generating the per-partition tight bounds may not be justified for scenes with low occlusion/empty space.


The SDSM near-z and far-z search alone improved the shadow map quality nicely in our game view. [...] But the most impressive improvement was in the editor. When you fly around in the air and look at the scenery below, the near plane can be pushed as far as 100-500 meters away, giving everything almost pixel perfect shadows (we have four cascades 512x512 pixels each, rendered with hardware 4xMSAA and EVSM filtering).
Nice! Yeah another nice thing I like about using SDSMs with nice filtering is that you never completely "waste" any resolution. When using PCF you can get excessive resolution (higher than screen space) that is simply dropped by PCF and you still get aliased shadow edges. Proper filtering though uses these additional samples to effectively super-sample the shadow map edge via mipmapping - looks great in practice and adding hardware MSAA (with the associated nice sampling patterns) helps a ton too!

I'll post you the performance improvements on the console platform, when the algorithm runs fully on the console as well. I implemented the recursive downsampling method on PC (DX11) first (using our virtual texture small z-buffer as my depth data source). It seems that using last frame data is enough and no graphics glitches are visible (we render at vsynch locked 60 fps), so there's no need to cause a lock stall.
Cool yeah I always suspected that using a really fuzzy scene description (previous frame, downsampled, etc) would be sufficient since the min/max actually drops the vast majority of data from the scene and the likelihood of hitting a case with a significant number of important "missed" pixels is very low and can probably be fully accounted for in practice by just dilating the min/max results slightly (which we already do for the partitions to support filtering near edges). Great to hear it does indeed work in practice and it definitely solves the stall problem I imagine for all practical cases!

Thanks for the good algorithm. Our artists are really happy already :)
No problem, I'm really happy that it's working well for you! :)
 
The SDSM recursive min/max downsample takes 0.04 milliseconds on the console hardware (from a 320x168 buffer). The frame rate boost is much higher than the downsample cost, since the tighter cascades have less objects to render. Also we use the z-max to do crude occlusion culling for all our rendered objects (cut the view cone by it). But as a side effect, the recursive min/max downsample generates hierarchial z-buffer for higher quality CPU occlusion culling as well, but we haven't implemented it yet.
 
The SDSM recursive min/max downsample takes 0.04 milliseconds on the console hardware (from a 320x168 buffer). The frame rate boost is much higher than the downsample cost, since the tighter cascades have less objects to render. Also we use the z-max to do crude occlusion culling for all our rendered objects (cut the view cone by it). But as a side effect, the recursive min/max downsample generates hierarchial z-buffer for higher quality CPU occlusion culling as well, but we haven't implemented it yet.
That's awesome news! It seems clearer and clearer that this sort of data structure gleaned from the visible samples in the scene is useful in a number of important ways. Neat to see you making use of it for occlusion culling and other things as well.

I'm eager to see your game in action :)
 
I hear that vocal section but TBH they're mostly software people making broad claims about hardware engineering based on generalized logic at best (of course this doesn't apply to everyone, but I've met a lot of these people :)). When I ask the hardware people though they say that it scales fine and I'm inclined to trust them more...
Would be nice if they had led by example, Fermi doesn't do full coherence and Larrabee doesn't do full speed scatters. They talk the talk, but they don't walk the walk.
 
Would be nice if they had led by example, Fermi doesn't do full coherence and Larrabee doesn't do full speed scatters. They talk the talk, but they don't walk the walk.
If you read back the original comment was *about* Larrabee's cache architecture, thus we're not talking about full speed single-cycle x86-style-cache-coherent scatters. That's not necessarily viable, but not clearly necessary either. A more relaxed coherence model and/or some performance degradation for complex scatters are probably both acceptable in the long run. Fermi basically already has the latter with its "coherent" write-combining on scatters to global memory, they just market it as an "optimization" to the base case rather than falling off the fast path. So arguably Fermi doesn't do "full speed" scatters either in the general case, but that's not really saying anything interesting...

To put it another way, you could easily make an architecture that did coherent scatters to 16 different cache lines at "full speed", but that's more of a statement about how inefficient your cache-aligned write is than anything :)
 
Banked caches are at full utilization both with ideal scatters (they hit all banks) and cache aligned linear writes (they still just hit all banks). Of course the overhead for coherency increases a bit.
 
If anyone is implementing SDSM on Xbox 360, you may want to check out the screen extent query (D3DQUERYTYPE_SCREENEXTENT). It will return the minimum Z and maximum Z for the queried rendering, with some limitations.

I'm hoping it will prove to be useful. Of course, I am somewhat biased since I pushed for including Z in the screen extent query (when I was at MS) because of how much having tight Z bounds improved the quality of regular shadow maps.
 
If anyone is implementing SDSM on Xbox 360, you may want to check out the screen extent query (D3DQUERYTYPE_SCREENEXTENT). It will return the minimum Z and maximum Z for the queried rendering, with some limitations.
Ah interesting! I specifically have asked people for this sort of feature in the past (both for x/y/z) but it has never come to PC. Awesome to hear that there's something like it on 360 and indeed depending on the limitations it might actually be useful for SDSM!

[Edit] It is good to take occlusion into account though which requires rendering the full depth buffer before analyzing anything and thus wouldn't be compatible with this technique. That said, you could use this for just the camera near value, which is actually the most important (far has significantly less effect).
 
Last edited by a moderator:
Just a quick update - the SDSM paper (author preprint for I3D 2011) is now linked from the SDSM web page. Let me know if you guys have questions or comments, or if anyone is planning to attend I3D this year!
 
Great stuff. :cool: Wonder when we'll see more devs using this on current gen consoles. Theye're certainly here for awhile...
 
Back
Top