Layered Variance Shadow Maps

Hy,

Andrew, your paper is very interesting. Unfortunatly I didn't have time to read it very carefully but I have already some questions/remarks on it :).

First, what about the memory consumption/computation time for omni-directionnal light sources. Naïvely It seems that all is multiplied by six (or two if we use dual-paraboloïd). Next, in the introduction you state that the shadow volumes can't handle alpha textured objects but the paper "Textured Shadow Volumes" adress this limitation. Finally, you write:

Andrew Lauritzen said:
[...] why would we want to do shadow maps any more when we can do ray traced shadows!!! (I'm trying to get in the mood ).

Note that ray-tracing is not the only way to perform accurate shadows ;). Lately , the paper "Accurate Shadows by Depth Complexity Sampling" proposes a framework that either numerically solve the direct lighting or compute a physically based visibility coefficient on fully dynamic scenes.

PS: Sorry for my bad english. I'm not very good in my native langage too :???:
 
Great work Andy!
Have you experimented with the log filtering so far? Maybe there's some hope to have decent quality EVSM in 8 bytes per texel..
No I haven't but it's on the todo list after my thesis is done! Not only am I interested in reducing memory consumption but I'd like to see if bumping C up to rediculous levels introduces significant artifacts into EVSMs, the way it does with ESMs. I really don't know whether it will, so I'd like to see :)

First, what about the memory consumption/computation time for omni-directionnal light sources. Naïvely It seems that all is multiplied by six (or two if we use dual-paraboloïd).
Yes there's no special way to handle these cases with VSMs/LVSMs so you're correct in multiplying the memory usage.

Next, in the introduction you state that the shadow volumes can't handle alpha textured objects but the paper "Textured Shadow Volumes" adress this limitation.
Cool good to know there's some work in the area. Still I think my point about generally comparing the advantages and disadvantages of the two algorithms stands: shadow maps naturally handle anything that can be rasterized. Shadow volumes have a bit more trouble with this like alpha testing and (per-pixel) displacement mapping.

Note that ray-tracing is not the only way to perform accurate shadows ;).
Hehe that was a joke :) If you've read any of my responses to the Intel ray tracing stuff you can see that I'm not exactly as excited about it as they are, and I'm unconvinced that ray traced shadows are really necessary down the road. However as I'm going to be working at Intel, I figured I should learn to preach the good-ness of ray tracing ;)

So sorry for the confusion... I was just joking :D
 
Adrew lauritzen said:
Still I think my point about generally comparing the advantages and disadvantages of the two algorithms stands: shadow maps naturally handle anything that can be rasterized. Shadow volumes have a bit more trouble with this like alpha testing and (per-pixel) displacement mapping.

Yes it stands :). However the "Textured Shadow Volumes" solve the problem for the alpha textured objects (the paper presents this algorithm as an """ad'hoc""" method for transmitance objects but it handles all alpha textured geometries). So, no more trouble for such meshes :)... But it is a detail

Andrew Lauritzen said:
Hehe that was a joke If you've read any of my responses to the Intel ray tracing stuff you can see that I'm not exactly as excited about it as they are, and I'm unconvinced that ray traced shadows are really necessary down the road. However as I'm going to be working at Intel, I figured I should learn to preach the good-ness of ray tracing

I'm agree. Using RT for the shadows is not really necessary. Accurate results can be obtain in real time using raster, without dealing with the RT drawbacks. And if a visually plausible result is sufficient , the CSM, ESM, LVSM/VSM, etc are efficients and give good results.
 
Yes it stands :). However the "Textured Shadow Volumes" solve the problem for the alpha textured objects (the paper presents this algorithm as an """ad'hoc""" method for transmitance objects but it handles all alpha textured geometries). So, no more trouble for such meshes :)... But it is a detail
Ah, cool. Well thanks for the paper references... I'll have to check them out when I have the time!

And if a visually plausible result is sufficient , the CSM, ESM, LVSM/VSM, etc are efficients and give good results.
Yup I agree. And if you need more physical accuracy, you can always super-sample the light rays by rendering and accumulating multiple shadow maps from jittered light positions, in the same way that you'd do it with ray traced shadows. Usually overkill IMHO, but certainly a possibility.
 
If you want more accuracy I'm not sure that rendering and accumulating several shadow maps/volumes is the right approach. With such method you have to deal with correlated samples for each pixel (=>hugly artefacts). In addition the performances are drastically reduced by the several renderings of the geometry. The "Depth Complexity Sampling" algorithm address all this issues by combining the accuracy of the offline "Soft Shadow Volumes" approach with the performances of the penumbra wedge framework. So, this method can produce as accurate shadows as a ray-tracer in interactive/real time. In addition it handles both planar and omni-directionnal light sources without any specific treatments. Textured lights like TV , fire etc. are also supported and semi opaque occluders can be simulated.
 
With such method you have to deal with correlated samples for each pixel (=>hugly artefacts).
Yes certainly, although shadow MSAA mitigates that a little bit (effectively multiplies your sample count without much cost). That said, if it's *enough* faster than the alternatives, it could still be a win.

In addition the performances are drastically reduced by the several renderings of the geometry. The "Depth Complexity Sampling" algorithm address all this issues by combining the accuracy of the offline "Soft Shadow Volumes" approach with the performances of the penumbra wedge framework.
Looks reasonable, but shadow volume/wedge approaches all incur significant performance penalties from complex/large scenes (as the results in the paper demonstrate clearly!). This gets even more significant with dynamic geometry. While geometry extrusion may well be the best way to get "true soft shadows" in the long run, I'm not willing to accept that necessarily yet. Shadow volumes still come with a lot of problems, as discussed, and I'd rather avoid those problems entirely if possible.

So, this method can produce as accurate shadows as a ray-tracer in interactive/real time.
Well to be fair, the results in the paper have it running on the order of seconds per frame for scenes with ~500k polygons, which is really quite a normal count for modern scenes. Of course the Doom3 example runs somewhat reasonable as it was designed for shadow volumes in the first place.

Anyways the work certainly looks interesting, but I always like to keep a healthy amount of skepticism about techniques that involve shadow volume extrustion ;)
 
Hi everybody,

I am not sure I understand EVSM correctly.

What is to be done to replace VSM with EVSM ?

Simply storing e^(c*depth) instead of depth, and the square of that in the second channel instead of depth^2, is that right ?
 
The scene composed of 500,000 visible polygons, is rendering with 4 omni-directionnal lights. With a cube shadow map approach you need 24 renderings of the scene, just for the shadow map acquisition. To compare, it should be interesting to know the rendering time of LVSM on such configuration. Even though LVSM will be faster, it will not produce the same result (correct direct illumination). In addition, DCS proposes a way to adjust the performance/quality ratio. By reducing the number of samples and using an interleave sampling pattern, the performances can be multiplied by 5 or 4 and the shadows are still good.

Andrew Lauritzen said:
Looks reasonable, but shadow volume/wedge approaches all incur significant performance penalties from complex/large scenes (as the results in the paper demonstrate clearly!).T his gets even more significant with dynamic geometry

I am not agree :). Today, you don't have to treat the geometry onto the. All can be performed onto the GPU. So the silhouette detection is performed very efficiently and independently of the animation/deformation of the geometry. Of course, since it is an object based approach you are still influence by the geometry complexity. But with a shadow map approach you are also (less) influenced by the geometry. As an example consider the construction of the data structure for shadow queries. With an omni-directional light source you must perform 6 renderings of the geometry. With an object based approach, you can build the shadow volumes with only one transformation of the geometry without rasterizing anything. The real bottleneck of the object based methods is the fillrate and the constraints onto the geometry (good performances require 2D manifold meshes). In other words, object based algorithms have not specific drawbacks for dynamic scenes :D (as illustrated in the paper)

To conclude I think that DCS and LVSM/CSM/ESM have not the same goal. If you want fast an pleasant results LVSM is a good alternative to PCF. However, if you are interested in an accurate direct illumination or physically plausible soft shadows, DCS proposes an efficient alternative to ray-traced shadows. Since the goal of real-time rendering is to efficiently solve the rendering equation, LVSM/ESM seems to be a nice alternative right now but DCS can be a solution for the future.
 
What is to be done to replace VSM with EVSM ?

Simply storing e^(c*depth) instead of depth, and the square of that in the second channel instead of depth^2, is that right ?
Effectively yes, but you have to remember to warp the fragment depth using e^(c*depth) as well. Furthermore as described in the paper you can also use the "negative" warp -e^(-c*depth) in conjunction to avoid some more problems (i.e. store 4 components total). [Edit] I can post some code for this if you guys want... it's neither hard nor complicated but probably easier to understand in code than otherwise.

The scene composed of 500,000 visible polygons, is rendering with 4 omni-directionnal lights. With a cube shadow map approach you need 24 renderings of the scene, just for the shadow map acquisition.
But you can cut that down to 4 shadow passes with GS-cloning or instancing. If you're gonna do silhouette extraction on the GPU using GS, you have to give the same benefit to the shadow map algorithms.

All can be performed onto the GPU. So the silhouette detection is performed very efficiently and independently of the animation/deformation of the geometry.
I'm not 100% convinced of how "efficient" it is, particularly for complex geometry. GS amplification/deamplification does involve either memory allocation, a"pack" operation or both and that's not cheap, even when implemented in hardware.

But with a shadow map approach you are also (less) influenced by the geometry.
Much less though, which is key. Remember that even though you may need more "passes" to more render targets, the rendering itself is extremely cheap due to very few state changes (really only vertex shader and depth output). In any case it would be an interesting comparison, and I'm certainly willing to use whatever is fastest for the job! :)

To conclude I think that DCS and LVSM/CSM/ESM have not the same goal. If you want fast an pleasant results LVSM is a good alternative to PCF. However, if you are interested in an accurate direct illumination or physically plausible soft shadows, DCS proposes an efficient alternative to ray-traced shadows.
Certainly true that LVSM/CSM/ESM/PCF attack *filtering* not soft shadows. The whole "edge softening by clamping the minimum filter width" is really just a side-effect rather than the goal IMHO. This is a really important thing to remember because if you start thinking of the edge softening as the *goal*, then it's both a physically incorrect approach, and potentially inefficient way to do it.

Since the goal of real-time rendering is to efficiently solve the rendering equation, LVSM/ESM seems to be a nice alternative right now but DCS can be a solution for the future.
Well, that's *one* goal of real-time rendering. I think if you ask any game developers though they don't give a damn about "solving the rendering equation" and rightfully so. Hell, even movies spend more time fudging stuff than doing it physically correctly. Physical correctness is another tool IMHO, not the end goal.

That said, please do realize that VSM et al. are *filtering* algorithms. Ray traced shadows, shadow volumes and DCS do not address shadow filtering *at all* - thus you are forced to super-sample in screen space to avoid aliasing. Thus DCS isn't really the end goal/answer for shadows either IMHO since I think VSM shows pretty conclusively that we can do a good job on shadow filtering and avoid inefficiently super sampling the whole screen buffer. This is the same case as using texture filtering in ray tracers... technically you can handle it via screen space super sampling, but in reality it's a hell of a lot more efficient to do some prefiltering.

Let me reiterate: edge softening is a "bonus" of PCF/VSM/etc., not the goal. It's even presented that way in the original PCF paper, which I highly suggest that everyone working in shadows should read.
 
Last edited by a moderator:
Certainly true that LVSM/CSM/ESM/PCF attack *filtering* not soft shadows. The whole "edge softening by clamping the minimum filter width" is really just a side-effect rather than the goal IMHO. This is a really important thing to remember because if you start thinking of the edge softening as the *goal*, then it's both a physically incorrect approach, and potentially inefficient way to do it.

Let me reiterate: edge softening is a "bonus" of PCF/VSM/etc., not the goal. It's even presented that way in the original PCF paper, which I highly suggest that everyone working in shadows should read.
This is a very important detail: during a QA session after my talk at GDC someone was 'complaining' cause I didn't cover soft shadows in my presentation and this person was so disappointed when I made him notice that the whole talk was about filtering shadow maps, not rendering soft shadows.
If we can do some fake soft shadows with this pre-filtering techniques is just a fortunate coincidence.
 
Last edited:
Adrew Lauritzen said:
But you can cut that down to 4 shadow passes with GS-cloning or instancing. If you're gonna do silhouette extraction on
the GPU using GS, you have to give the same benefit to the shadow map algorithms.

Yes of course and I don't forgive this feature for the shadow map algorithm :). But GS and instancing does not really reduce the rendering cost (rasterization, transformation etc.)! This reduce only the CPU overhead by limiting the rendering call: only one rendering call is necessary but you still multpiply the transformation, the primitive generation, the rasterization etc by the number of instances.

I'm not 100% convinced of how "efficient" it is, particularly for complex geometry. GS amplification/deamplification does involve either memory allocation, a"pack" operation or both and that's not cheap, even when implemented in hardware.
I think that I detected silhouette edges onto 100,000,000 polygons in less than 200ms (I need to cheek the results to give you the exact performances. But the performances suprised me). Note that for the silhouette detection you know that for one triangle you generate at most 6 silhouette edges. The number of out primitives is fixed and not prohibtive (even though on the G80).

Much less though, which is key. Remember that even though you may need more "passes" to more render targets, the rendering itself is extremely cheap due to very few state changes (really only vertex shader and depth output). In any case it would be an interesting comparison, and I'm certainly willing to use whatever is fastest for the job!

We are agree :)... An image based approach is of course much less influenced by the geometry complexity than an object based algorithm. It is unnecessary to try convince me. I am already convinced. ;)

Certainly true that LVSM/CSM/ESM/PCF attack *filtering* not soft shadows. The whole "edge softening by clamping the minimum filter width" is really just a side-effect rather than the goal IMHO. This is a really important thing to remember because if you start thinking of the edge softening as the *goal*, then it's both a physically incorrect approach, and potentially inefficient way to do it.
I'm totally convinced that LVSM/ESM/PCF target the filtering of a shadow map and that "soft shadows" is a side effect :)

Well, that's *one* goal of real-time rendering. I think if you ask any game developers though they don't give a damn about "solving the rendering equation" and rightfully so. Hell, even movies spend more time fudging stuff than doing it physically correctly. Physical correctness is another tool IMHO, not the end goal.

I understand your point of view and I am agree with you. However, today we compute "hacks" because we cannot efficiently solve the real rendering problematic. When we will have sufficient horsepower it will be unnecessary (for realistic engines) using alternatives because they will never generate as accurate result. Today, many applications try to produce a rendering that is "eye convincing" (Crysis, Ratatouille etc.). So it seems reasonable to consider that the rendering equation is the target of the real time/offline realistic applications :)

That said, please do realize that VSM et al. are *filtering* algorithms. Ray traced shadows, shadow volumes and DCS do not address shadow filtering *at all* - thus you are forced to super-sample in screen space to avoid aliasing. Thus DCS isn't really the end goal/answer for shadows either IMHO since I think VSM shows pretty conclusively that we can do a good job on shadow filtering and avoid inefficiently super sampling the whole screen buffer. This is the same case as using texture filtering in ray tracers... technically you can handle it via screen space super sampling, but in reality it's a hell of a lot more efficient to do some prefiltering.

Object based approach can use the quite efficient hardware anti-aliasing proposes by the GPU (For exemple, Doom3 used anti aliased shadow volumes). In addition they are not submitted to the magnification, minification aliasing since the computation is not based on a (surjective) discretized function. But maybe I don't understand what your are saying :(
 
But GS and instancing does not really reduce the rendering cost (rasterization, transformation etc.)!
Actually that's not true really... the only triangles that need to be multiply-transformed and rasterized are those that fall into MULTIPLE cube map faces, and those are extremely few. This can be done either with simple view frustum culling on the CPU before submitting batches, or directly in the GS, cloning and binning triangles to the appropriate face on the fly.

When we will have sufficient horsepower it will be unnecessary (for realistic engines) using alternatives because they will never generate as accurate result.
I somewhat agree with that, but historically we've always been saying that. I'm not totally convinced that when we "can" render X amount of rays or triangles w/ GI and whatever else in real time we wouldn't rather apply the power to an approximation of something yet more complicated ;) We'll see in any case, and having many techniques at our disposal is always a good thing!

In addition they are not submitted to the magnification, minification aliasing since the computation is not based on a (surjective) discretized function. But maybe I don't understand what your are saying :(
Indeed they are not affected by magnification problems but they *are* affected by minification problems (and anisotropic filters, etc)! With ray traced or shadow volumes, you'll get something like the "bad" image at the top of the VSM paper (compared to the aniso filtering), and the mipmapping example - there are even more obvious examples in the GPU Gems 3 chapter.

Maybe there's a clever way to "prefilter" shadow volumes or ray traced shadows, but I know of none other than super-sampling the eye rays, which is kind of undesirable unless you've got tons of incoherent secondary rays and need to do it anyways.
 
Last edited by a moderator:
Actually that's not true really... the only triangles that need to be multiply-transformed and rasterized are those that fall into MULTIPLE cube map faces, and those are extremely few. This can be done either with simple view frustum culling on the CPU before submitting batches, or directly in the GS, cloning and binning triangles to the appropriate face on the fly.

You are right. With cube shadow maps there are few triangles that are shared by several frustums. So you can save a lot of rendering cost with a frustum culling :p

Indeed they are not affected by magnification problems but they *are* affected by minification problems (and anisotropic filters, etc)! With ray traced or shadow volumes, you'll get something like the "bad" image at the top of the VSM paper (compared to the aniso filtering), and the mipmapping example - there are even more obvious examples in the GPU Gems 3 chapter.

I didn't understand what you mentioned about the minification artifacts since the minification term is essentially used for textures. Now I think that you described the staircasing (or aliasing) drawback (One sample at the the pixel center => aliasing). Indeed RT requires the super sampling of the eye ray. However a multisampling approach can drastically reduce this effect for object based shadows. Moreover, I think that a conservative rasterization (for exemple the algorithm describes in GPUGems 2) can be used in combination of an hand made "anti-aliasing shader". In any cases, this artifact is really problematic for high frequency effects (ie: hard shadows). For soft shadows (Penumbra wedge, DCS etc.), it is less (not) visible (=> no specific treatment are required)
 
Moreover, I think that a conservative rasterization (for exemple the algorithm describes in GPUGems 2) can be used in combination of an hand made "anti-aliasing shader".
Yeah it seems like there are definitely things you can do, although the problem is often that you don't necessarily know the distribution of receiver depths over the target filter region. You can probably approximate it pretty reasonably using the surface normal, but even then it's kind of non-obvious how to shoot additional rays (in the case of ray traced shadows). Do you sample a region in world space on the receiver plane maybe? Hard to say, but it falls naturally right out of the linear filtering techniques, which can often be used in conjunction with fancier things.

In any cases, this artifact is really problematic for high frequency effects (ie: hard shadows). For soft shadows (Penumbra wedge, DCS etc.), it is less (not) visible (=> no specific treatment are required)
Right, because soft shadows are much lower frequency and thus generally do not require as much sampling. That said, you can expect to have hard shadows near castors alias in the same manner, particularly when they are projected onto a plane that is approximately parallel to the eye ray (high anisotropic in texture terms).

Soft shadows work is definitely cool though, and something I'd love to see nicely combined with shadow filtering... it seems like it should be possible, and that would produce unparalleled shadow quality.
 
But you can cut that down to 4 shadow passes with GS-cloning or instancing. If you're gonna do silhouette extraction on the GPU using GS, you have to give the same benefit to the shadow map algorithms.

I'm not 100% convinced of how "efficient" it is, particularly for complex geometry. GS amplification/deamplification does involve either memory allocation, a"pack" operation or both and that's not cheap, even when implemented in hardware.

Are there any "fast" examples of doing anything using the GS pipe? Last time I tried using GS to clone (drawing geometry to 6 faces of a cubemap, with and without software culling in the GS) it was much faster to simply run six independent draw calls (and process each vertex 6 times). Of course my vertex shader was simple, GS might have been a win for a really complex VS with say blending and skinning. Who knows, it could have been a driver issue. But even in the blending/skinning case it would be better simply to use stream out from the VS alone (apply blending/skinning once) then do multiple draw calls with ultra simple VS reading from that one stream out vertex buffer.
 
Yes there are :)... Using GS for silhouette detection is very efficient since you generate few vertexes (6 vertexes for each triangle). However cloning geometry for direct rendering into a cube map is still inefficient because you have to emit 18 vertexes (parallelism-- onto the G80).
 
Yes there are :)... Using GS for silhouette detection is very efficient since you generate few vertexes (6 vertexes for each triangle). However cloning geometry for direct rendering into a cube map is still inefficient because you have to emit 18 vertexes (parallelism-- onto the G80).

Yes where? Where is GS being used with any number of output verts nearing what you would do with a standard VS only based graphics pipeline. Sounds like what you are saying is that if you don't use much GS then the GS pipeline is fast. Just haven't seen a good "working" GS example yet.

You might find this paper rather interesting (much faster to do O(lg) searches in a pixel shader for stream compaction than simply doing GS output with a variable number of primitives).

http://www.mpi-inf.mpg.de/~gziegler/hpmarcher/techreport_histopyramid_isosurface.pdf
 
Are there any "fast" examples of doing anything using the GS pipe? Last time I tried using GS to clone (drawing geometry to 6 faces of a cubemap, with and without software culling in the GS) it was much faster to simply run six independent draw calls (and process each vertex 6 times).
My experience is the same as yours actually... for cases like this it seems faster to just have a decent frustum culling algorithm and rendering each of the cube faces separately, at least on G80. Apparently R600 has a bit faster GS implementation, but I've not had the opportunity to try that out yet.

You might find this paper rather interesting (much faster to do O(lg) searches in a pixel shader for stream compaction than simply doing GS output with a variable number of primitives).
Again, this is in line with my experience. I actually benchmarked using the GS to implement "pack" and found the standard scan/scatter approach to be *faster* than the hardware GS amplification/deamplification. Of course your data set and % of elements that you keep affect the outcomes here. This was also with some pretty early DX10 drivers, but still.

TBH I'm still a bit skeptical that GS amplification/deamplification is something that can be much more efficiently implemented in hardware than otherwise. It always seemed like a bit of an unnecessary feature to me, given the presence of so-called "transform feedback" anyways. That said, maybe they'll make it super-fast next generation, although I'd be wary of anyone devoting too much hardware to that end.
 
Are there any "fast" examples of doing anything using the GS pipe? Last time I tried using GS to clone (drawing geometry to 6 faces of a cubemap, with and without software culling in the GS) it was much faster to simply run six independent draw calls (and process each vertex 6 times). Of course my vertex shader was simple, GS might have been a win for a really complex VS with say blending and skinning. Who knows, it could have been a driver issue. But even in the blending/skinning case it would be better simply to use stream out from the VS alone (apply blending/skinning once) then do multiple draw calls with ultra simple VS reading from that one stream out vertex buffer.
You're only going to get faster cube map performance using the GS if primitives aren't rendered to all cube faces. Otherwise the only savings is fewer draw calls. If this is what you meant by software culling then I guess you found otherwise.

Anyone that tries to determine GS performance from a single architecture is not getting the full picture. Unfortunately G8x and R6xx have completely different performance characteristics. G8x is good with minimal amplification and in some cases for R6xx higher amounts of amplification actually improve efficiency.

I believe Humus used the GS with the global illumination demo ATI did so maybe he has more insight.

I doubt a complex VS will improve anything as it is rendered before the GS. Or am I missing the thought process here.
 
Back
Top