Andrew Lauritzen said:[...] why would we want to do shadow maps any more when we can do ray traced shadows!!! (I'm trying to get in the mood ).
No I haven't but it's on the todo list after my thesis is done! Not only am I interested in reducing memory consumption but I'd like to see if bumping C up to rediculous levels introduces significant artifacts into EVSMs, the way it does with ESMs. I really don't know whether it will, so I'd like to seeGreat work Andy!
Have you experimented with the log filtering so far? Maybe there's some hope to have decent quality EVSM in 8 bytes per texel..
Yes there's no special way to handle these cases with VSMs/LVSMs so you're correct in multiplying the memory usage.First, what about the memory consumption/computation time for omni-directionnal light sources. Naïvely It seems that all is multiplied by six (or two if we use dual-paraboloïd).
Cool good to know there's some work in the area. Still I think my point about generally comparing the advantages and disadvantages of the two algorithms stands: shadow maps naturally handle anything that can be rasterized. Shadow volumes have a bit more trouble with this like alpha testing and (per-pixel) displacement mapping.Next, in the introduction you state that the shadow volumes can't handle alpha textured objects but the paper "Textured Shadow Volumes" adress this limitation.
Hehe that was a joke If you've read any of my responses to the Intel ray tracing stuff you can see that I'm not exactly as excited about it as they are, and I'm unconvinced that ray traced shadows are really necessary down the road. However as I'm going to be working at Intel, I figured I should learn to preach the good-ness of ray tracingNote that ray-tracing is not the only way to perform accurate shadows .
Adrew lauritzen said:Still I think my point about generally comparing the advantages and disadvantages of the two algorithms stands: shadow maps naturally handle anything that can be rasterized. Shadow volumes have a bit more trouble with this like alpha testing and (per-pixel) displacement mapping.
Andrew Lauritzen said:Hehe that was a joke If you've read any of my responses to the Intel ray tracing stuff you can see that I'm not exactly as excited about it as they are, and I'm unconvinced that ray traced shadows are really necessary down the road. However as I'm going to be working at Intel, I figured I should learn to preach the good-ness of ray tracing
Ah, cool. Well thanks for the paper references... I'll have to check them out when I have the time!Yes it stands . However the "Textured Shadow Volumes" solve the problem for the alpha textured objects (the paper presents this algorithm as an """ad'hoc""" method for transmitance objects but it handles all alpha textured geometries). So, no more trouble for such meshes ... But it is a detail
Yup I agree. And if you need more physical accuracy, you can always super-sample the light rays by rendering and accumulating multiple shadow maps from jittered light positions, in the same way that you'd do it with ray traced shadows. Usually overkill IMHO, but certainly a possibility.And if a visually plausible result is sufficient , the CSM, ESM, LVSM/VSM, etc are efficients and give good results.
Yes certainly, although shadow MSAA mitigates that a little bit (effectively multiplies your sample count without much cost). That said, if it's *enough* faster than the alternatives, it could still be a win.With such method you have to deal with correlated samples for each pixel (=>hugly artefacts).
Looks reasonable, but shadow volume/wedge approaches all incur significant performance penalties from complex/large scenes (as the results in the paper demonstrate clearly!). This gets even more significant with dynamic geometry. While geometry extrusion may well be the best way to get "true soft shadows" in the long run, I'm not willing to accept that necessarily yet. Shadow volumes still come with a lot of problems, as discussed, and I'd rather avoid those problems entirely if possible.In addition the performances are drastically reduced by the several renderings of the geometry. The "Depth Complexity Sampling" algorithm address all this issues by combining the accuracy of the offline "Soft Shadow Volumes" approach with the performances of the penumbra wedge framework.
Well to be fair, the results in the paper have it running on the order of seconds per frame for scenes with ~500k polygons, which is really quite a normal count for modern scenes. Of course the Doom3 example runs somewhat reasonable as it was designed for shadow volumes in the first place.So, this method can produce as accurate shadows as a ray-tracer in interactive/real time.
Andrew Lauritzen said:Looks reasonable, but shadow volume/wedge approaches all incur significant performance penalties from complex/large scenes (as the results in the paper demonstrate clearly!).T his gets even more significant with dynamic geometry
Effectively yes, but you have to remember to warp the fragment depth using e^(c*depth) as well. Furthermore as described in the paper you can also use the "negative" warp -e^(-c*depth) in conjunction to avoid some more problems (i.e. store 4 components total). [Edit] I can post some code for this if you guys want... it's neither hard nor complicated but probably easier to understand in code than otherwise.What is to be done to replace VSM with EVSM ?
Simply storing e^(c*depth) instead of depth, and the square of that in the second channel instead of depth^2, is that right ?
But you can cut that down to 4 shadow passes with GS-cloning or instancing. If you're gonna do silhouette extraction on the GPU using GS, you have to give the same benefit to the shadow map algorithms.The scene composed of 500,000 visible polygons, is rendering with 4 omni-directionnal lights. With a cube shadow map approach you need 24 renderings of the scene, just for the shadow map acquisition.
I'm not 100% convinced of how "efficient" it is, particularly for complex geometry. GS amplification/deamplification does involve either memory allocation, a"pack" operation or both and that's not cheap, even when implemented in hardware.All can be performed onto the GPU. So the silhouette detection is performed very efficiently and independently of the animation/deformation of the geometry.
Much less though, which is key. Remember that even though you may need more "passes" to more render targets, the rendering itself is extremely cheap due to very few state changes (really only vertex shader and depth output). In any case it would be an interesting comparison, and I'm certainly willing to use whatever is fastest for the job!But with a shadow map approach you are also (less) influenced by the geometry.
Certainly true that LVSM/CSM/ESM/PCF attack *filtering* not soft shadows. The whole "edge softening by clamping the minimum filter width" is really just a side-effect rather than the goal IMHO. This is a really important thing to remember because if you start thinking of the edge softening as the *goal*, then it's both a physically incorrect approach, and potentially inefficient way to do it.To conclude I think that DCS and LVSM/CSM/ESM have not the same goal. If you want fast an pleasant results LVSM is a good alternative to PCF. However, if you are interested in an accurate direct illumination or physically plausible soft shadows, DCS proposes an efficient alternative to ray-traced shadows.
Well, that's *one* goal of real-time rendering. I think if you ask any game developers though they don't give a damn about "solving the rendering equation" and rightfully so. Hell, even movies spend more time fudging stuff than doing it physically correctly. Physical correctness is another tool IMHO, not the end goal.Since the goal of real-time rendering is to efficiently solve the rendering equation, LVSM/ESM seems to be a nice alternative right now but DCS can be a solution for the future.
Certainly true that LVSM/CSM/ESM/PCF attack *filtering* not soft shadows. The whole "edge softening by clamping the minimum filter width" is really just a side-effect rather than the goal IMHO. This is a really important thing to remember because if you start thinking of the edge softening as the *goal*, then it's both a physically incorrect approach, and potentially inefficient way to do it.
This is a very important detail: during a QA session after my talk at GDC someone was 'complaining' cause I didn't cover soft shadows in my presentation and this person was so disappointed when I made him notice that the whole talk was about filtering shadow maps, not rendering soft shadows.Let me reiterate: edge softening is a "bonus" of PCF/VSM/etc., not the goal. It's even presented that way in the original PCF paper, which I highly suggest that everyone working in shadows should read.
Adrew Lauritzen said:But you can cut that down to 4 shadow passes with GS-cloning or instancing. If you're gonna do silhouette extraction on
the GPU using GS, you have to give the same benefit to the shadow map algorithms.
I think that I detected silhouette edges onto 100,000,000 polygons in less than 200ms (I need to cheek the results to give you the exact performances. But the performances suprised me). Note that for the silhouette detection you know that for one triangle you generate at most 6 silhouette edges. The number of out primitives is fixed and not prohibtive (even though on the G80).I'm not 100% convinced of how "efficient" it is, particularly for complex geometry. GS amplification/deamplification does involve either memory allocation, a"pack" operation or both and that's not cheap, even when implemented in hardware.
Much less though, which is key. Remember that even though you may need more "passes" to more render targets, the rendering itself is extremely cheap due to very few state changes (really only vertex shader and depth output). In any case it would be an interesting comparison, and I'm certainly willing to use whatever is fastest for the job!
I'm totally convinced that LVSM/ESM/PCF target the filtering of a shadow map and that "soft shadows" is a side effectCertainly true that LVSM/CSM/ESM/PCF attack *filtering* not soft shadows. The whole "edge softening by clamping the minimum filter width" is really just a side-effect rather than the goal IMHO. This is a really important thing to remember because if you start thinking of the edge softening as the *goal*, then it's both a physically incorrect approach, and potentially inefficient way to do it.
Well, that's *one* goal of real-time rendering. I think if you ask any game developers though they don't give a damn about "solving the rendering equation" and rightfully so. Hell, even movies spend more time fudging stuff than doing it physically correctly. Physical correctness is another tool IMHO, not the end goal.
That said, please do realize that VSM et al. are *filtering* algorithms. Ray traced shadows, shadow volumes and DCS do not address shadow filtering *at all* - thus you are forced to super-sample in screen space to avoid aliasing. Thus DCS isn't really the end goal/answer for shadows either IMHO since I think VSM shows pretty conclusively that we can do a good job on shadow filtering and avoid inefficiently super sampling the whole screen buffer. This is the same case as using texture filtering in ray tracers... technically you can handle it via screen space super sampling, but in reality it's a hell of a lot more efficient to do some prefiltering.
Actually that's not true really... the only triangles that need to be multiply-transformed and rasterized are those that fall into MULTIPLE cube map faces, and those are extremely few. This can be done either with simple view frustum culling on the CPU before submitting batches, or directly in the GS, cloning and binning triangles to the appropriate face on the fly.But GS and instancing does not really reduce the rendering cost (rasterization, transformation etc.)!
I somewhat agree with that, but historically we've always been saying that. I'm not totally convinced that when we "can" render X amount of rays or triangles w/ GI and whatever else in real time we wouldn't rather apply the power to an approximation of something yet more complicated We'll see in any case, and having many techniques at our disposal is always a good thing!When we will have sufficient horsepower it will be unnecessary (for realistic engines) using alternatives because they will never generate as accurate result.
Indeed they are not affected by magnification problems but they *are* affected by minification problems (and anisotropic filters, etc)! With ray traced or shadow volumes, you'll get something like the "bad" image at the top of the VSM paper (compared to the aniso filtering), and the mipmapping example - there are even more obvious examples in the GPU Gems 3 chapter.In addition they are not submitted to the magnification, minification aliasing since the computation is not based on a (surjective) discretized function. But maybe I don't understand what your are saying
Actually that's not true really... the only triangles that need to be multiply-transformed and rasterized are those that fall into MULTIPLE cube map faces, and those are extremely few. This can be done either with simple view frustum culling on the CPU before submitting batches, or directly in the GS, cloning and binning triangles to the appropriate face on the fly.
Indeed they are not affected by magnification problems but they *are* affected by minification problems (and anisotropic filters, etc)! With ray traced or shadow volumes, you'll get something like the "bad" image at the top of the VSM paper (compared to the aniso filtering), and the mipmapping example - there are even more obvious examples in the GPU Gems 3 chapter.
Yeah it seems like there are definitely things you can do, although the problem is often that you don't necessarily know the distribution of receiver depths over the target filter region. You can probably approximate it pretty reasonably using the surface normal, but even then it's kind of non-obvious how to shoot additional rays (in the case of ray traced shadows). Do you sample a region in world space on the receiver plane maybe? Hard to say, but it falls naturally right out of the linear filtering techniques, which can often be used in conjunction with fancier things.Moreover, I think that a conservative rasterization (for exemple the algorithm describes in GPUGems 2) can be used in combination of an hand made "anti-aliasing shader".
Right, because soft shadows are much lower frequency and thus generally do not require as much sampling. That said, you can expect to have hard shadows near castors alias in the same manner, particularly when they are projected onto a plane that is approximately parallel to the eye ray (high anisotropic in texture terms).In any cases, this artifact is really problematic for high frequency effects (ie: hard shadows). For soft shadows (Penumbra wedge, DCS etc.), it is less (not) visible (=> no specific treatment are required)
But you can cut that down to 4 shadow passes with GS-cloning or instancing. If you're gonna do silhouette extraction on the GPU using GS, you have to give the same benefit to the shadow map algorithms.
I'm not 100% convinced of how "efficient" it is, particularly for complex geometry. GS amplification/deamplification does involve either memory allocation, a"pack" operation or both and that's not cheap, even when implemented in hardware.
Yes there are ... Using GS for silhouette detection is very efficient since you generate few vertexes (6 vertexes for each triangle). However cloning geometry for direct rendering into a cube map is still inefficient because you have to emit 18 vertexes (parallelism-- onto the G80).
My experience is the same as yours actually... for cases like this it seems faster to just have a decent frustum culling algorithm and rendering each of the cube faces separately, at least on G80. Apparently R600 has a bit faster GS implementation, but I've not had the opportunity to try that out yet.Are there any "fast" examples of doing anything using the GS pipe? Last time I tried using GS to clone (drawing geometry to 6 faces of a cubemap, with and without software culling in the GS) it was much faster to simply run six independent draw calls (and process each vertex 6 times).
Again, this is in line with my experience. I actually benchmarked using the GS to implement "pack" and found the standard scan/scatter approach to be *faster* than the hardware GS amplification/deamplification. Of course your data set and % of elements that you keep affect the outcomes here. This was also with some pretty early DX10 drivers, but still.You might find this paper rather interesting (much faster to do O(lg) searches in a pixel shader for stream compaction than simply doing GS output with a variable number of primitives).
You're only going to get faster cube map performance using the GS if primitives aren't rendered to all cube faces. Otherwise the only savings is fewer draw calls. If this is what you meant by software culling then I guess you found otherwise.Are there any "fast" examples of doing anything using the GS pipe? Last time I tried using GS to clone (drawing geometry to 6 faces of a cubemap, with and without software culling in the GS) it was much faster to simply run six independent draw calls (and process each vertex 6 times). Of course my vertex shader was simple, GS might have been a win for a really complex VS with say blending and skinning. Who knows, it could have been a driver issue. But even in the blending/skinning case it would be better simply to use stream out from the VS alone (apply blending/skinning once) then do multiple draw calls with ultra simple VS reading from that one stream out vertex buffer.