Some presentations from GDC08

I also played around with exponents of the distance term a while ago, but didn't look at pre-filtering because I think correct anisotropic filtering and mipmapping are more important. Philosophically, I guess I'm in the same boat as Andy.
I do admit to being a bit wary of using hardware linear filtering (trilinear, aniso, MSAA, bilinear) on a ln-space ESM since it isn't really "correct". I suspect if you're doing some ln-space blurring then this won't be noticable in most cases (you've arbitrarily clamped the minimum filter width, putting most of the support now on the filter that is computed correctly), but it seems like mip-transitions (trilinear) and aniso cases will become more obviously over-darkened. Hard to say though as I haven't fully implemented it myself though.

In the end, I think the most bulletproof solution for shadow maps is Andy's suggestion of using VSM to see what the variance is, and if it's high enough fall back to many-sample PCF. It's much better than other methods of skipping samples in non-penumbra regions.
Indeed the "adaptive sampling" approach is the only one that will get complex filter regions "right", but after some conversations with Marco and some further playing around I came up with something that seems promising. I was going to write it up a bit more formally in a tech report, but I'd rather get some feedback here since we're on the topic.

My recent work with "layered" variance shadow maps (to be published @ GI 2008) is a useful generalization based on the observation that warping the depth function monotonically will still produce "correct" upper bounds, but can affect how tight they are in light bleeding cases, etc. In the layered stuff I used simple linear warps to reduce or eliminate light bleeding, but unfortunately the technique requires good layer positions to be effective. Even though these can be determined fairly well automatically, the "light bleeding reduction potential" of the technique scales approximately like convolution shadow maps with storage, which is to say - badly :)

Since light bleeding is worst when the ratio of distances between objects 2-1 (d_a) and 3-2 (d_b) is large, one way to look at layered VSMs is a method of clamping d_a, ideally to zero. An alternative approach however is to determine a warping function which makes d_a small relative to d_b. exp(c*x) does exactly this. Thus exponentially warping the depth function, but still using 2 moments/Chebyshev as with VSM will greatly reduce the light bleeding associated with VSM and works really well in practice.

But you can go further... the exponential warp actually kind of messes up cases when the receiver is non-planar (or there are multiple receivers) as Marco notes. While these cases can't be done "correctly" in a soft-shadows sort of way without storing multiple shadow map projections, we just want to do something reasonable and "smooth". To that end, using the "dual" of the above warping -exp(-c*x) makes d_a large relative to d_b and thus makes the shadows on any intermediate objects much more like the "smooth" shadow on d_b.

In my brief experimentation, using these two duals together (just taking the min) works really well, and fits nicely into 4 components. If C could be arbitrarily large, I suspect the results would be extremely good. As it stands, there are only really artifacts in places where both duals have artifacts (i.e. non-planar receivers AND light bleeding). I have to work through some of the math and implementation details, but it seems promising, and thus I was interested in whether or not you guys have experimented with anything similar. I can post some screenshots later if people are interested.

In any case with ESM, Layered VSM, CSM and this latest stuff (that I'm calling Exponential VSM for now), there are a lot of options out there with various trade-offs. They're all actually pretty similar fundamentally, which is what makes all of this so interesting :) I'm glad people have really run with the VSM idea in any case... I can't wait to see what people continue to come up with!

Great presentation Marco, and I'm curious to see what the CSM people have come up with their "Exponential Shadow Maps" paper coming up at GI. Maybe they have solved all of these problems already, which would be fantastic :)
 
Last edited by a moderator:
Did you guys read the notes or recent summary of some technical bits in among others the shadow handling of Uncharted? I assume you did but if not, I'll try to find them again. It looked interesting to me, but then I don't know half the tech-terms in the previous post, so that doesn't mean much. :D
 
I was at the Uncharted Tech presentation @ GDC which talked about how they use shadows, blending from them to static GI-generated data (per-vertex) in the distance. Was there more that I missed? I'd certainly be interested if you have a link :)
 
I was at the Uncharted Tech presentation @ GDC which talked about how they use shadows, blending from them to static GI-generated data (per-vertex) in the distance. Was there more that I missed? I'd certainly be interested if you have a link :)

Ha ha, that's very unlikely. There was this write-up, but you probably picked up more than the author did:

http://xemu.blogharbor.com/blog/_archives/2008/2/20/3536343.html

What you mentioned though, I thought they didn't like it for being too slow, but you were there so you probably know more details.

Static lighting: Global Illumination. Direct color, indirect color, direction per vertex. Blend realtime shadows with this static lighting. Moving objects used light probes & SH. Objects look around and pick the closest one. Convert SH to cubemap (via SPU). Not texture bound. Biggest problem: Too slow, and couldn’t be distributed.

I thought this shadow was the most interesting and relevant to the current discussion:

Scene Rendering: 5 phases: Shadows, dynamic lighting, opaque geom, alpha blend geom, post processing effects.

Sunlight Shadow: Tried many solutions, but all had problems. Idea from Killzone: Reduce flickering: fixed world space sample points. Shimmering caused from slight differences in maps due to movement, so wanted to fix that. So use stable grid, just scroll around on it. SSM: orthographic shadow map. How to determine resolution? Cascaded shadow maps.

Shadow blockers > shadow pass > depth buffer

Opaque geom. pass
 
In my brief experimentation, using these two duals together (just taking the min) works really well, and fits nicely into 4 components. If C could be arbitrarily large, I suspect the results would be extremely good. As it stands, there are only really artifacts in places where both duals have artifacts (i.e. non-planar receivers AND light bleeding). I have to work through some of the math and implementation details, but it seems promising, and thus I was interested in whether or not you guys have experimented with anything similar. I can post some screenshots later if people are interested.
I was about to suggest to nAo a similar thing in my previous post, but I wanted to try it first, and my main computer had a serious HD crash, unfortunately. Just thinking about it, it seemed like most of the time the bleeding artifacts wouldn't overlap, so the min of the two shadowing terms would be quite satisfactory.

The only problem is that the memory/BW costs are really getting high now, and I think adaptive sampling controlled by VSM could be substantially faster and maybe even look better.
 
What you mentioned though, I thought they didn't like it for being too slow, but you were there so you probably know more details.
I believe the comment about it being too slow and not distributable was related to their offline radiosity/GI solver for *generating* the indirect lighting/bent normals. I could be wrong though... last week was pretty busy and I don't have my notes with me right now.

I thought this shadow was the most interesting and relevant to the current discussion:
Yeah they use the standard "fix your shadow map projection grid in world space" trick of cascaded shadow maps. It works, but I've shown that you can get pretty close to sub-pixel shadow map resolution on modern GPUs (at least G80) in real-time so such hackery isn't really necessary (and doesn't work with spot/omni lights to boot). That said, RSX probably can't handle that high shadow detail, so it's a reasonable solution to reduce shadow flickering as the camera moves in the mean time. That said shadows will still flicker when the light or objects move so it's not really a fully-satisfactory solution.

Just thinking about it, it seemed like most of the time the bleeding artifacts wouldn't overlap, so the min of the two shadowing terms would be quite satisfactory.
Indeed I can find a few places where there are minor artifacts, but VSM/ESM both have more serious artifacts at those places naturally. With a "low C" (it can only be ~44 with fp32 and EVSM unfortunately) stuff like trees and foliage still can't be handled totally satisfactorily, but I suspect they'd start to look good if C could be increased. I'll write some manual filtering code soonish to verify.

The only problem is that the memory/BW costs are really getting high now, and I think adaptive sampling controlled by VSM could be substantially faster and maybe even look better.
Definitely the memory costs are getting high, but not quite as abusive as CSM or Layered VSM. Thus it's yet another bullet point in the quality/storage scale which unfortunately ends up eventually reducing to (adaptively) sampling the entire filter region in the worst case. I think it hits a pretty good quality/performance trade-off going forward though... it's not much slower than VSM on G80 and it looks a fair bit better.

Unfortunately it doesn't play that nicely with UINT textures and SAT due to the exp() warping, but hey, I was looking for an excuse to ask for doubles in graphics anyways ;)
 
Not sure what you mean here... you never *have* to pre-filter; it's just an accelerator. In particular mipmaps are a form of pre-filtering and become a huge win when PCF would otherwise have to sample the entire shadow map per-pixel :) In any case, ESM is no different in this manner to my knowledge.
By "need", I was referring to the fact that the advantage of preblurring is not quite super-compelling, nor is there anything particular about it that demands work in a separate pass (which is something that applies to a number of other hacks out there).

The latter however will only be a win if you have many more shadow map pixels than framebuffer pixels, which implies a poor shadow map projection.
In my experience, it's not just been "many" more, but rather "any" more, or even "not much fewer". Although, I'm curious just what you mean by "poor". The way you talk, it sounds as if you're saying that short of irregular depth buffers, there will never be a "good" shadow map projection. Reality often means getting stuck thinking about "good" first and then scaling back to "passable" when you have no other choice. It's not so much that the importance is low, but that it's generally the case that something else that is infinitely important can bury something that's very important.

The other disadvantage of not pre-blurring is that you really want to generate mipmaps based on the blurred shadow map or else they won't work properly.

It's important not to downplay the importance of mipmapping/aniso in generating high-quality shadows. I consider separable blurs to be a nice "side-effect" of linear filtering, but the real win is the application of linear filtering acceleration structures like mipmaps or summed-area tables.
If you can afford it, which isn't always the case. I'd say SATs, at least, are pretty well out of the question for a good while. But I've seen many a case where even generating mipmaps for your shadow maps is something people aren't willing to do. Generically speaking, 99% of the flaws associated with not mipmapping are not difficult to hide when the content is something that's actually meant for production titles (85% of the time, it doesn't even require any extra work/tweaking). What really matters in the end is what people will notice rather than academic "correctness."

You really have to be comparing to "proper" PCF in which you project the filter area using derivatives and evaluate the whole region (no matter how big) using dynamic branching. With that implementation I think that you'll find the linear/hardware filtering methods to be a substantial win.
I get your point, but again, you're talking about comparing something that is fairly unreasonable against something that is ridiculously unreasonable. In the end, I'd probably be talking along those lines if I were still roaming the halls of academia, but otherwise, it's only practice that matters. Separable filter passes isn't totally unreasonable, but if escapable without huge issues, then so much the better.

I'd try just using non-separable blurs or similar to reduce the number of passes first. The tradeoff between memory/fill/computation is easily managed with simple convolutions like blurs.
Already been tried, and it always works out that having even 1 additional pass per shadow-casting light is enough to kill it unless you are working at a low output resolution (albeit that it dies less often than by less degree than 2 passes). Of course, the simple reason why prefiltering is still preferable is the size of the filter you can get for fewer samples when using a separable filter -- especially for the 360 which is no worse on pixel fill, but has a pretty lousy texel rate. 10 samples to get a 5x5 is nice compared to 9 for a 3x3. And if you try to do wider filters in one pass, you're trying to save on pixel fill only to blow out your texel rate, so you still lose either way.

The less simple reason why prefiltering is still preferable is the level of control we can offer to lighting artists who want to tweak around with shadowing parameters, and that includes the filtering mechanism among other things.

AndyTX said:
I do admit to being a bit wary of using hardware linear filtering (trilinear, aniso, MSAA, bilinear) on a ln-space ESM since it isn't really "correct". I suspect if you're doing some ln-space blurring then this won't be noticable in most cases (you've arbitrarily clamped the minimum filter width, putting most of the support now on the filter that is computed correctly), but it seems like mip-transitions (trilinear) and aniso cases will become more obviously over-darkened. Hard to say though as I haven't fully implemented it myself though.
Mmm... that's another point of concern, indeed. And often times, lighting artists will tell me that it's far preferable to have those edges be overtly light than overtly dark, for the simple reason that flaws are more apparent when you over-darken.
 
Already been tried, and it always works out that having even 1 additional pass per shadow-casting light is enough to kill it unless you are working at a low output resolution (albeit that it dies less often than by less degree than 2 passes). Of course, the simple reason why prefiltering is still preferable is the size of the filter you can get for fewer samples when using a separable filter -- especially for the 360 which is no worse on pixel fill, but has a pretty lousy texel rate. 10 samples to get a 5x5 is nice compared to 9 for a 3x3. And if you try to do wider filters in one pass, you're trying to save on pixel fill only to blow out your texel rate, so you still lose either way.

Speaking of filtering, and something I haven't had the chance to try yet, could ordering the draw calls for filtering by interleaving non-filter dependent calls (ie other work) help in filtering performance?

This is assuming filtering is nearly always going to be TEX bound, even in the case of mostly cache hits, simply due to the latency of providing interpolated results. If the cards could interleave ALU heavy work with TEX heavy work, then perhaps you could keep the ALUs busy even when filtering.
 
In my experience, it's not just been "many" more, but rather "any" more, or even "not much fewer".
I suppose that's gonna be fairly hardware dependent. On G80 the crossover is at about 3x3 (bil) PCF vs. fp32 VSM as I showed in my Gems 3 chapter. I'd expect the 360 to have different tradeoffs of course.

Although, I'm curious just what you mean by "poor". The way you talk, it sounds as if you're saying that short of irregular depth buffers, there will never be a "good" shadow map projection.
No not really. Indeed the results of using CSM/PSSM are very good and if your place your splits cleverly (and have enough) you're not going to have a ton of sub-pixel resolution anywhere.

If you can afford it, which isn't always the case. I'd say SATs, at least, are pretty well out of the question for a good while. But I've seen many a case where even generating mipmaps for your shadow maps is something people aren't willing to do.
Perhaps, but I'm of course looking down the road. SATs are quite doable on G80 and mipmap generation is practically "free". What may or may not be appropriate for 360/PS3 isn't really that much of a concern to me to be honest. Down the road, mipmapping or even SATs are not going to be a problem, period. O(n^2) filters are.

Generically speaking, 99% of the flaws associated with not mipmapping are not difficult to hide when the content is something that's actually meant for production titles (85% of the time, it doesn't even require any extra work/tweaking). What really matters in the end is what people will notice rather than academic "correctness."
Oh certainly it's about the end result - nothing I do is purely for academic "correctness" as you so easily would like to dismiss. However not filtering/mipmapping properly is very visible in many scenes... check out Crysis foliage shadows on roads and enjoy the big mess of flickering, aliasing, noise ;) I don't think it's too unreasonable to say that shadows and normal maps are the primary causes of aliasing in modern AAA titles, and both are due to no/inadequate filtering considerations.

In the end, I'd probably be talking along those lines if I were still roaming the halls of academia, but otherwise, it's only practice that matters.
I don't disagree with you. I'm just looking a few years down the road rather than at the hardware *right now*. People said deferred rendering was totally impractical/unreasable several years ago when I was arguing that it would be worth it in the end. Today many games use some amount of deferred rendering and, incidentally, VSM :)

Already been tried, and it always works out that having even 1 additional pass per shadow-casting light is enough to kill it unless you are working at a low output resolution (albeit that it dies less often than by less degree than 2 passes). [...] 10 samples to get a 5x5 is nice compared to 9 for a 3x3.
Well I somewhat pity the hardware you're using from what you're telling me... G80 has no trouble with even 40x40 separable filters (hundreds of FPS w/ large shadow maps) not that those are particularly useful ;)

Anyways we're certainly targeting different time-frames/hardware, but I'm just as concerned with the practical usability of shadow filtering techniques as the "correctness". Hell if you read my EVSM stuff on the previous page of this thread that has little to do with "correct" and a lot to do with "looks somewhat reasonable". Even blurring the shadow map falls into the latter category!

I do however feel that we're going to want to "properly" filter our shadow maps in the coming few years... i.e. mipmapping/aniso or equivalent. It's also invaluble to be able to apply MSAA to shadow maps. All of this stuff works a lot better with pre-filtering/linear techniques.
 
I don't disagree with you. I'm just looking a few years down the road rather than at the hardware *right now*. People said deferred rendering was totally impractical/unreasable several years ago when I was arguing that it would be worth it in the end. Today many games use some amount of deferred rendering and, incidentally, VSM :)
In various ways, though, both are still unreasonable, but not "absurdly" or "totally" unreasonable. Even for us, we're kind of sticking to indoor scenes whenever using VSMs. Deferred rendering is still unreasonable in the sense of tradeoffs potentially being unacceptable.

Well I somewhat pity the hardware you're using from what you're telling me... G80 has no trouble with even 40x40 separable filters (hundreds of FPS w/ large shadow maps) not that those are particularly useful ;)
Yeah, the hardware is one thing, but the use case is another. For instance, for many of our materials, 14 texture layers and even the occasional filtered blend (usually mip-biasing, but sometimes not) isn't that unusual. Texel fill becomes utterly precious when your texel rate is only 16 samples per cycle whether filtered or not. For pre-filtering even with small filter sizes, you end up worrying about pixel fill as it's not as though the number of other purposeful render passes is small, and just adding to that costs an arm. Granted, if it were a situation where the only shadow-casting light was the main sunlight or something, even with 4 splits, a small filter size separable filter is pretty harmless. Add a few shadow casting dynamic lights to that, or for that matter, a volume light (meaning you have to take multiple samples in the main render pass anyway), and now it's worrisome both in pixel fill and texel fill.

I do however feel that we're going to want to "properly" filter our shadow maps in the coming few years... i.e. mipmapping/aniso or equivalent. It's also invaluble to be able to apply MSAA to shadow maps. All of this stuff works a lot better with pre-filtering/linear techniques.
Yes, someday. I look at nVidia's min-max mipmap shadows and I see it as a step in that general direction. Larrabee and the notion of a return to software rendering is quite an interesting prospect to me as well, but I also have to think about a world where the hardware will not change in the slightest for the next few years.
 
One of the test cases I like for these fancy shadow techniques is two layers of objects over a ground plane. VSM shows bleeding on the ground plane, and ESM shows it on the second object.
Yes, one test to rule them all. In fact every time I come up with a new idea the first thing I do is to write down the equations for a 3 planar occluders configuration and to compute occlusion for some trivial cases. It's good way to rule out algorithms that don't work.

I don't see how the negative moments help you (at least from the perspective of a negative exponential). They give you information about the closest samples in your filter kernel.
I don't have a real answer for this, just three hints that I hope will lead me to find a better solution:
1) from a statistical point of view we need all the negative moments to completely reconstruct the depth distribution
2) from a convolution shadow maps point of view we need negative moments to approximate the other 'branch' of the exponential function (unfortunately the two branches put together are not separable in CSM-sense, but their reciprocal is..)
3) from a PCF point of view..the negative moments can tell me that my occluders are not always occluders..

The way I see it, the postive moments give you information about the weight of the furthest samples, and when they're all on the same plane as your receiver, then you can actually retrieve that weight and it equals visibility.
This is 100% correct.

One thing I tried is using regular distance, a positive exponential, and a negative exponential. Then I can get the average, max, and min for an arbitrary kernel, and can get a nice gradient by linearly interpolating. The results are mostly good, but when there's three or more distinct groups of distance values in the kernel, somewhere in the scene there's an artifact. It's better than VSM, but not perfect.
Ehehe, I tried every possible combination..none was robust enough.

For ESM, I think a more interesting direction for improvement is clipping the furthest values in each texel's kernel when prefiltering. It could introduce other artifacts, but maybe they're less objectionable.
This would destroy separability, unless you want to go multilayer..and when many layers enter in the equation I let Andy do the talking ;)
 
Speaking of filtering, and something I haven't had the chance to try yet, could ordering the draw calls for filtering by interleaving non-filter dependent calls (ie other work) help in filtering performance?

This is assuming filtering is nearly always going to be TEX bound, even in the case of mostly cache hits, simply due to the latency of providing interpolated results. If the cards could interleave ALU heavy work with TEX heavy work, then perhaps you could keep the ALUs busy even when filtering.

I'm surprised none of you wanted to take a stab at commenting on this!

For non-console cases (and newer hardware) CUDA and CTM say no simultaneous shaders (API doesn't support it), but we know pixel and vertex shaders run simultaneously to keep the fixed function hardware busy. Tom Forsyth says in 2008 blog entry, "for example, some hardware may be able to have 2 pixel shaders, 2 vertex shaders, ... in flight at once". Which hints that at least some hardware supports overlapping perhaps the end of one draw call and the beginning of another. We can also at least be sure that the hardware simply cannot overlap different program execution on a given SIMD core (because the threads on a core are using a single instruction pointer?).

So I guess I've answered my own question, hardware limitation so no way to overlap extra ALU and TEX work from two different programs because of a shared instruction pointer. And perhaps vertex/fragment program interleaving and ending/beginning program overlap happens by dividing cores to different tasks.

But then there is the real crazy but awesome idea of manually interleaving 2 programs inside one pixel program. So basically you pair two programs together in one program and split the output render targets among each sub program. So if you schedule your texture loads for both sub-programs early in the program, then do ALU work, you can effectively mix a high ALU / low TEX program with a high TEX / low ALU program, and keep both the ALU and TEX units busy on one core. At least should work great for GeForce 8 series cards, AMD/ATI HD, and 360?. Of course you would also lower your texture cache hit ratio, but probably not that bad...

Which leaves the last case where if you are doing something like pyramid filtering operations (like say min/max shadow map generation), where one pass needs to read the results from the previous pass (so lets skip thinking about this for the 360 for obvious EDRAM issues). The driver would at least have to place a hard sync point between draw calls to insure all pending ROP writes finished before the next pass started. Where as if you could do non-dependent draw calls, the driver probably could interleave the beginning and end of two separate shaders at the same time (on different SIMD cores, but at least with no hard, 600+ cycle, sync point).
 
I don't have a real answer for this, just three hints that I hope will lead me to find a better solution:
1) from a statistical point of view we need all the negative moments to completely reconstruct the depth distribution
2) from a convolution shadow maps point of view we need negative moments to approximate the other 'branch' of the exponential function (unfortunately the two branches put together are not separable in CSM-sense, but their reciprocal is..)
3) from a PCF point of view..the negative moments can tell me that my occluders are not always occluders..
Looking at Andy's 4xFP32 solution above, I can now see why it's useful. Even though it's the furthest sample in kernel that's causing the trouble in ESM (when they're further than the point being shadowed), it's usually the shadow from the closest points that's actually getting messed up.

This would destroy separability, unless you want to go multilayer..and when many layers enter in the equation I let Andy do the talking ;)
True, but I'm just making a suggestion based on why the exponential function works in the first place. If you're using it as an approximation of the Heaviside function along the lines of CSM, then it only works when none of your samples in the kernel are further than your reciever. I was just thinking of a way to enforce that requirement.
 
Valve has posted some: http://www.valvesoftware.com/publications.html

Kim Swift and Erik Wolpaw, "Integrating Narrative and Design: A Portal Post-Mortem," Game Developer's Conference, February 2008.

Jason Mitchell, "Stylization With a Purpose: The Illustrative World of Team Fortress 2," Game Developer's Conference, February 2008.

Elan Ruskin, "How To Go From PC to Cross Platform Development Without Killing Your Studio," Game Developer's Conference, February 2008.

Alex Vlachos, "Post Processing in The Orange Box," Game Developer's Conference, February 2008.
 
Be sure to wave your mouse pointer over the comment box in the upper left corner.... There is a lot more info discussed than the slides alone!
 
Thanks for the Valve link Asher, some interesting presentations in general and the PC-->Console slides were timely in their arrival given discussion topics of late. I don't think Valve has gained much love for Cell as compared to previous displays of affection ("we hate it!"), but they seem to be embracing multi-threading itself in a big way and looks like they've adopted a number of intelligent design directions that have come back to yield dividends in their PC products.

@Alstrong: Thanks for the text info; I went back and re-read a number of the slides with that new knowledge! :)
 
Last edited by a moderator:
I found this picture from the Valve slides interesting.

hl2iw6.png


It gives an hint of that large media like BRD may become very useful if the progress toward higher fidelity continues.
 
Back
Top