Silhouette shadow mapping.

Graham · Dec 6, 2005

Ok, about 5 days ago I had another of my crazy ideas.
It was pretty simple, and unfortunatly, it *had* been done before (here)
basically the idea was to embed shilouette data into the shadow map.

Anyway. I went ahead and implemented it anyway, and I thought I'd post my results, just for the sheer hell of it.

http://www.hungryspoon.com/ssm/ssm.htm

Personally I'm quite happy I managed to implement it.

Bunnys!!!

(128x128 shadow map)

Guess I'm a bit dissapointed it's not original, but ohh well

[edit]
I'll just say I did not base this on that paper. I didn't actually know it existed until I'd finished..

Arun · Dec 6, 2005

Hmm, I didn't know about this technique - how high-performance is it, and is it combinable with PSMs, so as to use a lower resolution?

EDIT: Woah, just read the beggining of the paper, a 60 fragment instructions extra pass? Cool idea, but I guess it's just not very viable for actual games yet, if it ever will considering PSMs feel like a better solution overall.

Uttar

Graham · Dec 6, 2005

yeah I managed it in 62.
BUT, that said, it's very easy to branch it and cut out all but a couple of them.

AlNom · Dec 6, 2005

so how many instructions after doing that?

Mintmaster · Dec 6, 2005

Graham said:
yeah I managed it in 62.
BUT, that said, it's very easy to branch it and cut out all but a couple of them.

Oooh, very nice. I never though much of this algorithm, but branching suddenly makes it quite interesting.

You should make a PS3.0 shadows benchmark. Use this technique as well as a many sample soft shadow algorithm. I always though it would be neat if you did an edge-detect on the depth texture, and then decide from that how many samples to take.

You know we're dying for practical dynamic branching demos here at B3D.

EDIT: Actually, to tell you the truth, I never thought dynamic branching was going to be that useful, and figured the stencil buffer would be good enough when you really need to take multiple branches. When the first results of NV40 came out, my preconception was only further confirmed. I never thought we'd get the granularity shown by R520 without a performance compromise. (I'm still not convinced of the latter, actually, given the size and clock speed of R520. R580 might change my mind, though.)

Jawed · Dec 6, 2005

First, ATI needs to send him an X1800XT.

Jawed

nAo · Dec 6, 2005

Mintmaster said:
EDIT: Actually, to tell you the truth, I never thought dynamic branching was going to be that useful, and figured the stencil buffer would be good enough when you really need to take multiple branches.

Actually dynamic branching is VERY useful..I use it a lot...to debug shaders

Mintmaster · Dec 6, 2005

nAo said:
Actually dynamic branching is VERY useful..I use it a lot...to debug shaders

But do you need dynamic branching for that? Performance isn't necessary for debugging, so I would think a simple compare is usually good enough. Or are you somehow using looping for debugging?

nAo · Dec 6, 2005

Mintmaster said:
But do you need dynamic branching for that? Performance isn't necessary for debugging, so I would think a simple compare is usually good enough. Or are you somehow using looping for debugging?

Sometimes you need to debug only pixels that have some special properties, in other cases I want to debug some area of the screen while doing some kind of post processing so I branch using the VPOS register as 'reference'.

Graham · Dec 7, 2005

Jawed said:
First, ATI needs to send him an X1800XT.

Jawed

While that would be *ohh so nice* I've already bought myself an ati-original x1800xl

Don't worry I'm a major fan of ati without them needing to send me free cards...

however...

Currently I have the demo running on PS2.0 path as I'm getting some rather odd HLSL compile errors when I compile to PS3, some stuff about lerp() not allowing the output register to be the same as the 1st or 3rd input register, which is odd. I've had a hack around to try and get it to work but maybe I've just done something spectacuarly dumb somewhere (no line number on the error either)

I've got another demo in the works anywho....

Mintmaster · Dec 7, 2005

Sweet. martrox in another thread was mentioning that the X1800XL is only $299 in CompUSA, so I'm getting tempted. But I swore to myself I'd keep the AIW 9700 until a card with a unified architecture comes out.

That's a weird error, but I haven't done any PS3.0 programming so I wouldn't know.

BTW, is your method identical to the one in the paper? If that's your SSM in the corner of the screenshots, it seems quite different, and possibly superior if you're doing what I think you're doing.

psurge · Dec 7, 2005

Nice

. I was thinking that this algorithm will be even more interesting with SM4.0 (the geometry shader should be enough to generate silhouette quads completely on the GPU)

Graham · Dec 7, 2005

psurge said:
Nice . I was thinking that this algorithm will be even more interesting with SM4.0 (the geometry shader should be enough to generate silhouette quads completely on the GPU)

Yes absolutely. This was definitely my thought too while writing the test. There is still a lot of cpu work needed to generate the silhouette.

I haven't actually read through the implementation details in the paper, but the way I go about it almost certainly is quite different. The silhouette is a ring of quads effectively. There are vertex shaders in use to try and keep it to exactly 2-3 pixels wide (and also when drawing the initial depth pass). One of the tricky problems to get around was dealing with an overlapping silhouette, using the vertex shaders to do this 2-3 pixel offset helped a lot here as the first pass writes one depth value, and the second pass (silhouette) writes another, but writes it 'clean', ie no z-fighting (unless there is overlap!)

The problem with the paper seems to be that the models it shows are all quite simple, don't have any curved surfaces, and also do not show many obvious cases of overlapping angles. You can see in one of the pictures that I posted, that a complex model can cause a lot of overlap, in which case the algorithm explodes into tiny little flaming pieces. But then again nothings perfect

BTW, is your method identical to the one in the paper? If that's your SSM in the corner of the screenshots, it seems quite different, and possibly superior if you're doing what I think you're doing.

no, when I had the idea, I went straight ahead and did it. It took a couple of afternoons screwing about. Once done then I researched and in a minute I had found that paper - :'( shame, but I guess teaches me to research things first. But it's more fun and you learn more doing things yourself.

martrox in another thread was mentioning that the X1800XL is only $299 in CompUSA

I upgraded this particular pc from a 9500pro so it was quite a step up. That said, If I were back in the states (where I bought the card) now I'd probably go for the XT for the extra 256mb. Although I expect ati driver team to perform miracles, AA still isn't always an option at 16x12, which is a shame, because I really like 16x12

an extra 256mb would most certainly have help here. (NOT that I'm complaining in any way, it's still an absolute beast of a card)

Silhouette shadow mapping.

Graham

Hello :-)

Arun

Unknown.

Graham

Hello :-)

AlNom

Moderator

Mintmaster

Jawed

nAo

Nutella Nutellae

Mintmaster

nAo

Nutella Nutellae

Graham

Hello :-)

Mintmaster

psurge

Graham

Hello :-)

Similar threads