Layered Variance Shadow Maps

Discussion in 'Rendering Technology and APIs' started by Andrew Lauritzen, Mar 28, 2008.

  1. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    The answer will probably seem obvious once I hear it, but what do you mean by "transform feedback"?
     
  2. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,632
    Likes Received:
    1,246
    Location:
    British Columbia, Canada
    Certainly GS performance is a bit weird, but even with minimal or no amplification, it's still too slow. At one point I was getting a 50% performance hit on a simple shader just by *having* a GS, even if it did nothing and just passed the triangles straight through. Now I can expect some overhead, but 50% is absurd, even for a simple shader. Perhaps drivers have improved since then, but my initial experience was not particularly positive.

    aka. "Stream Out" in DX10. It's the silly name that they decided to call buffer output after GS.
     
  3. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    You can do transform feedback without a GS (previous testing on NVidia cards showed this as a faster hardware path). I was doing GL_POINT to GL_TRIANGLE conversion in the VS alone (obviously requires a 2nd draw call to draw the triangles).

    Also in this GS testing, I tried two situations, (a) an "empty" VS stage which simply passed VBO data to the GS stage for full computation there, and (b) moving as many computations as possible from the GS stage to the VS stage. The operations were exactly the same, and on NVidia cards, (b) was quite a bit faster than (a). This hints that there are some limitations on how well the GS stage can be parallized (perhaps the GS stage wasn't being SIMDed well by the driver). Transform feedback in VS only is obviously easy to make fast because you have a fixed number of inputs and outputs which are the same for all verts (matches the design of the hardware quite well).

    GS is SIMD unfriendly. The common case of having only a fraction of GS invocations with expensive divergent branching is an obvious problem. Having variable number of outputs is an obvious problem. Not sure how the hardware handles this? I could pre-allocate the maximum number of outputs per call and later coalesce the buffer in the case of stream out (which could be expensive), or simply pre-clear the buffers and later process the degenerates and tosses them (in the case without stream out), or do something really awful and use atomic operations to do proper packing (early 8 series cards didn't have atomic ops in CUDA, so I'm guessing this wasn't the case).
     
  4. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,632
    Likes Received:
    1,246
    Location:
    British Columbia, Canada
    Yes certainly, I didn't mean to imply otherwise. All I meant to say was that the presence of a stage that outputs results directly after the VS/GS, or being able to render to VBO, or similar makes the GS amplification/deamplification even less interesting to me, particularly with its current speed.

    That's the root of my biggest beefs with it. It's an operation that can't be "obvious" done faster in hardware to me, and thus I see no problem with letting the programmer choose an algorithm best suited to the data set. I would write a pack very differently if I expected to be throwing out 99% of the elements, rather than if I expected to be throwing out 1%.

    I'm really not sure how the hardware handles it, but I've certainly heard some things that hint that at some level they have to allocate the maximum buffer size, then pack it (or simply index it I guess, in the case of resubmitted geometry) later. This may be done at a finer granularity than the entire buffer (for instance, buffers for each SM or something), but I would be surprised if it was implemented strictly in terms of atomics or similar, which would make the "min/max hints" to the API - and even amplification limits - unnecessary.
     
    #44 Andrew Lauritzen, Apr 3, 2008
    Last edited by a moderator: Apr 7, 2008
  5. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    Yep. There's a reason you specify the max vertex count when writing a GS. :wink:
     
  6. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    In the beginning they'd run GS code on just one cluster, and (maybe) stop running all threads on the rest of the chip too, until the GS was done for that particular stream. So performance was shit. Not sure if that's still the case, haven't written any GS code in a while (and neither has the games industry :lol: ).
     
  7. flycooler

    Newcomer

    Joined:
    Apr 25, 2008
    Messages:
    3
    Likes Received:
    0
    Error in ESM

    Hi, Andrew, I feel very curious about the
    sentence u wrote here. From my point, the ESM will
    unavoidly introduce error if the filter size is big &&
    several occluder exists. When the param C is like >20,
    the shadow result will easily be over-exposured after
    bilinear filtering and result in light bleeding.

    Can u explain a little bit more about how to use the
    negative warp -e^(-c*depth) in conjunction to avoid the
    problem? or Can u show some simple shader code so
    that I can test it by myself? Gracious thanks!
    :grin:

     
  8. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco

    Yes and no. You can have any depth complexity and zero errors if your receiver is planar within the filtering window.
    Imagine a tree casting a shadow over the ground, the shadow on the ground won't contain any error (no matter how compled the depth function is due to the thousands of leaves..), while the self shadowing on the leaves will contain errors.

    This are accuracy and range issue, try with log filtering (btw..c = 20 is HIGH!)


     
    #48 nAo, Apr 25, 2008
    Last edited: Apr 25, 2008
  9. flycooler

    Newcomer

    Joined:
    Apr 25, 2008
    Messages:
    3
    Likes Received:
    0
    Hi, thanks for ur quick reponse:)

    Could u pls provide the downloading url
    for ur ESM implementation? I am a poor
    guy and had no money to buy the book:)

    For the param C, my feeling is since the heaviside
    shadow comparison func is very high frequency.
    We usually want to use big value >30 to try to
    approximate the heaviside curve. I strongly hope
    the ESM can get practical good results since its cost
    is really low. I implemented the esm before and
    experienced some errors when the bilinear filter size
    is big. That make me feel confused. It would be
    great to see some simple shader code and implement
    by myself to test it:)

    Cheers




     
  10. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    ESM works when the furthest samples in your filter kernel are the same distance from the light as the pixel you're shadowing. In nAo's example, nothing is further away than the ground, so the shadows drawn there are fine. For the leaves, even if a tiny filter weight goes to a texel from the ground, you won't get any shadow.

    If you think about the Heaviside function, an exponential is a very close match until x=0. After that the error is enormous. That's why anything even slightly farther away than your shadow reciever makes everything light. A higher value of C won't help the bleeding.

    Andy isn't really talking about ESM here. He's talking about using VSM with a positive and negative exponential warp.

    VSM bleeds when the ratio of distances between occluders gets too small, i.e. if A overlaps B and B overlaps C, you get light bleeding (B's shadow on C) when B-A approaches or exceeds C-B. The exponential warp makes it so that B-A is almost always smaller, but unfortunately you run into the same problems as with ESM (A's shadow on B bleeds, because C is farther), but at least the shadow on C is okay now.

    The negative warp is kind of neat. You do the opposite, so light always bleeds through B and you basically isolate the shadow of A only. Now you take the min of this shadow and the above, and all your shadows look pretty good now.
     
    #50 Mintmaster, Apr 25, 2008
    Last edited by a moderator: Apr 25, 2008
  11. flycooler

    Newcomer

    Joined:
    Apr 25, 2008
    Messages:
    3
    Likes Received:
    0
    Gracious thanks for your reply! and basically that's
    also my feeling about ESM.

    In fact, several months ago I tried quite hard to work
    on soft shadow based on ESM. However finally I gave
    it up and it seems the ESM will introduce lots of artifacts
    even using constant filter size for each screen pixel. The
    reason I chose it is because its low memory cost.

    We can test ESM with a simple scene with several square
    quads overlapped a little bit together and distributed
    evenly from near to far (compared to the light pos). We
    should be able to see the error of shadow in the middle
    square quad. Of course, for some complex stuffs like
    trees in games, u will not notice the self-shadow problem
    easily and the shadow on the ground plane is the most
    important. However, from the theory side, I somehow
    feel the ESM is difficult to be extended to fake soft
    shadow effect.



     
  12. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,632
    Likes Received:
    1,246
    Location:
    British Columbia, Canada
    Wow, entire conversations have begun and ended before I even noticed... tsk tsk me... that's what I get for writing theses and moving out of my house ;)

    Anyways I would answer something... but Mintmaster and nAo already answered everything better than I could. I'll keep an eye on the thread again though if you have any further questions!

    Cheers,
    Andrew
     
  13. MK_Xue

    Newcomer

    Joined:
    Apr 28, 2008
    Messages:
    1
    Likes Received:
    0
    Hi,
    I implemented VSM. It works allright, except for light bleeding. The edges are soft.
    My question is how to implement EVSM...I simply replace those depths with exp(depth), it looks like:
    before:
    PixelShadow()
    {
    Color.x = depth;
    Color.y = depth * depth;
    Color.z = 0;
    Color.w =1;
    }

    PixelScene()
    {
    d = current_d;
    mm = tex2D(g_VSM, tex);
    if ( d < mm.x )
    LightAmount = 1;
    else
    {
    sigma2 = mm.y - mm.x * mm.x;
    LightAmount = sigma2 / (sigma2 + (d - mm.x) * (d - mm.x));
    }
    }

    Now:
    PixelShadow()
    {
    Color.x = exp(depth);
    Color.y = exp(2*depth);
    Color.z = 0;
    Color.w = 1;
    }

    PixelScene()
    {
    d = exp(current_d);
    mm = tex2D(g_VSM, tex);
    if ( d < mm.x )
    LightAmount = 1;
    else
    {
    sigma2 = mm.y - mm.x * mm.x;
    LightAmount = sigma2 / (sigma2 + (d - mm.x) * (d - mm.x));
    }
    }

    However, there are many artifacts, the edges are zigzag, and there are shadow acnes...
    Help me please....

    btw:shadow map is 1024 * 1024, and there is a gauss filtering pass after generating shadow maps...

    Thanks..
     
  14. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,632
    Likes Received:
    1,246
    Location:
    British Columbia, Canada
    That should be exp(c*depth) where c is a parameter. Use approximately 42ish if you're using fp32 VSMs. Same change goes for the warping of your fragment depth later.

    Change this to Color.y = Color.x*Color.x

    There should not be too many artifacts after those changes. You may get a tiny bit of acne - same with VSM w/o clamping minimum variance - but it can be handled in the same way as VSM. However, note that you need to scale the minimum variance constant by the square derivative of your depth scaling... i.e. (c*exp(c*x))^2 in this case.

    Note that the above will give you exponentially warped VSMs, which will be fine for small values of "c". For larger values (like 10+ probably...) you'll want to implement the full EVSM with both warps (i.e. exp(c*x) and -exp(-c*x)) as described in the paper and answered in the above posts. The latter warp will help to avoid ESM-like multi-receiver filter region artifacts.
     
  15. gjaegy

    Newcomer

    Joined:
    Mar 21, 2007
    Messages:
    73
    Likes Received:
    0
    Hi, I apologize for this stupid question, my English level is a bit limited...

    Do you recommend using something like 42.0f for c, right ? (not sure what -ish means)

    In that case, both warps needs to be implement according to your recommendation, did I understand correctly ?
     
  16. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,632
    Likes Received:
    1,246
    Location:
    British Columbia, Canada
    Yes, c = 42 approximately uses your entire 32-bit floating point range without overflow with a little "wiggle room". Feel free to fiddle with the exact value, but you certainly can't go over 44 without artifacts.

    Actually you can use different c values for each warp if you want, they have no dependence on one another (they are just two different estimators). I don't see much of a need to though.
     
  17. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    If you don't want to run out of range remember that you can always use log filtering :)
    It works well even with half precision floats!
     
  18. gjaegy

    Newcomer

    Joined:
    Mar 21, 2007
    Messages:
    73
    Likes Received:
    0
    Actually I would like to try this with a parallel split VSM implementation. I had a similar idea some time ago to reduce bleeding, which is in one way similar to storing the exp. I used the following method to re-project the depth [0..1] in an non-uniform manner (to "flatten" some range). It helped reducing huge Z differences, for instance when you had two shadows overlapping on the ground, one from a tree and the other from a aircraft that was very far away (big light bleeding)

    Instead of storing the linear Z, I reproject the linear Z to:
    - from 0.0 to 0.1: objects behind camera
    - from 0.1 to 0.9: objects which Z is in the frustum range
    - from 0.9 to 1.0: objects which Z is behind frustum (or actually, shadowed part of the frustum)

    With that scheme the differenct between the aircraft Z and the tree Z won't be that big.

    Code:
    
    vDepthLightSpace.Set(0.0f, fFrustumMinZ-fMinZCaster, fFrustumMaxZ-fMinZCaster, fMaxZCaster-fMinZCaster);
    
    float AdjustDepthInterval(in float fDepth, in float4 vDepthLightSpace)
    {
    	float fRet;
    
    	float4 vDiv = float4(vDepthLightSpace[1] - vDepthLightSpace[0],
    			vDepthLightSpace[2] - vDepthLightSpace[1],
    			vDepthLightSpace[3] - vDepthLightSpace[2],
    			1.0f);
    	float4 vDif = saturate((fDepth - vDepthLightSpace) / vDiv);
    	fRet 	= dot(vDif, float4(0.1f, 0.8f, 0.1f, 0.0f));
    		
    	return fRet;
    }
    
    I guess using exp() would have the same effect. I will try.



    I asked if one needs to use both warps because of that:

    as 42 is > than 10 I thought one must use the two warps (I haven't read the article so I don't know what the second warp is for).



    nAo, what is log filtering ?
     
  19. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    It's about filtering an exponential shadow map in log space.
    Now you are filtering exponential values which rapidly go out of range, to avoid this issue you can filter the logarithm of these values (and the go back to exp space)

    For example if you averaging two exponential value such as exp(A) and exp(B) you have:

    a*exp(A) + b*exp(B) (a and b are some filter weights)

    but you can rewrite the same expression as:

    exp(A) * (a + b*exp(B-A)) ,

    exp(A) * exp( log (a + b*exp(B-A)))),

    and:

    exp(A + log(a + b*exp(B-A))

    Now your sum of exponential is written as a single exponential, if you take the logarithm of it you can then just work on its argument:

    A + log(a + b*exp(B-A))

    Basically you end up filtering the argument of your exponential functions, which are just linear depth values, so you enjoy the same range you have with less exotic techniques. Just don't forget to go back to exp space when you use the final filtered value.

    You can also find the same info here (slide 24)

    Marco
     
  20. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,632
    Likes Received:
    1,246
    Location:
    British Columbia, Canada
    There's actually no need to maintain any depth ranges beyond those that you're going to query (i.e. in the camera frustum). You can safely just clamp those to 0 or 1 and remap the center range. This is actually done automatically if you do the Lloyd relaxation step of Layered VSMs in camera space rather than light space.

    The problem is that objects still in the frustum can still cause significant light bleeding in some instances. For these cases, layers or an exponential warp will reduce or eliminate this bleeding more uniformly.

    Still your observation is apt and even applies to normal shadow maps: when using frustum partitioning, be sure to remap the depth range of each split to the absolutely minimal required range, both to maximize precision and (in the case of probabilisitic methods), improve the approximation.

    PS: I got the log filtering stuff working with EVSM today Marco... just gotta fiddle it a bit more and then see how well it works with very high C values. Results to come...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...