Perspective Shadow Mapping Issues....

cjworkman

Newcomer
Hi,.. New Poster (3D artist) here,

We are currently in the process of attempting to get Perpective Shadow Mapping working on an outdoor scene with the longest line of sight hitting 150 meters.

2 main problems have arisen, the first being that we are massively CPU limited. The second is Draw Calls are causing a huge problem. Draw Calls are a problem due to the fact that it is an FPS and every object has a few textures and usually multiple sub-meshes which compounds our Draw Calls greatly. A 4ghz machine seems to choke after about 1200 draw calls are on screen.

With just terrain in the scene we drop from about 450 fps to about 60-80 just by turning the PSM on. It's eating up enough CPU that it seems the GPU is idle half the time.

Has anyone found any good solutions or had an expierience attempting to use PSM on a medium density outdoor environment?
 
cjworkman said:
Hi,.. New Poster (3D artist) here,

We are currently in the process of attempting to get Perpective Shadow Mapping working on an outdoor scene with the longest line of sight hitting 150 meters.

2 main problems have arisen, the first being that we are massively CPU limited. The second is Draw Calls are causing a huge problem. Draw Calls are a problem due to the fact that it is an FPS and every object has a few textures and usually multiple sub-meshes which compounds our Draw Calls greatly. A 4ghz machine seems to choke after about 1200 draw calls are on screen.

With just terrain in the scene we drop from about 450 fps to about 60-80 just by turning the PSM on. It's eating up enough CPU that it seems the GPU is idle half the time.

Has anyone found any good solutions or had an expierience attempting to use PSM on a medium density outdoor environment?

Best of luck PSM is an utter pig to get working in any real scenario. It's just edge case after edge case after edge case.

I'm going to recommend that you look at the NV sample they actually solve most of the hard problems.
 
You do realize that considering the Z/Stencil/etc. passes are not texture-based, except for alpha textures, you batch up things a LOT more effectively in these passes? You should put all the submeshes together, etc. - there's not much you can do "against" animation, though, of course, which means you need to keep all the resutls in memory in order to avoid having the CPU hit of animation for each pass (but that's kinda obvious). If anything it should just be some kind of reordering with some memory movement.
Of course, that requires a bit of CPU work for the re-batching, but this is a win, 99% of the time. Or is it another problem you're talking about? Also, what kind of draw calls are you using?

Uttar
P.S.: 1200 draw calls at 4Ghz - at what FPS? Even if it's 60FPS, that's what a 1Ghz machine should do, not a 4Ghz one, iirc. Or are you counting other overhead in there?
 
We cannot combine the sub-meshes because we are texturing with tile sets. I believe it's a hardware issue, UV coordinates only go from 0-1 on the hardware driver. If we combine sub-meshes with tiles we would need the UV coordinates to go up as high as we want. i.e. 0-10 if it's tiled 10 times.

1200 draw calls runs at about 20 fps on a 4ghz machine with the newest GeForce card. We are running a few effects (bloom and perlin noise clouds) and some scripting, but turning those off only increases the frame rate by about 3-4 hertz.

To get an acceptable frame rate for an FPS we had to reduce the draw calls to below 400. Which of course, hardly anything was left in the scene.

After more testing, it appears that we are vertex limited which doesn't make much sense to me because these cards are supposed to push 300 million verts a second. We are running about 30 million a second ( 3 million a frame ) Keeping everything else running and just reducing the number of verts in the scene gave us a 19 fps jump in performance.
 
cjworkman said:
We cannot combine the sub-meshes because we are texturing with tile sets. I believe it's a hardware issue, UV coordinates only go from 0-1 on the hardware driver. If we combine sub-meshes with tiles we would need the UV coordinates to go up as high as we want. i.e. 0-10 if it's tiled 10 times.

1200 draw calls runs at about 20 fps on a 4ghz machine with the newest GeForce card. We are running a few effects (bloom and perlin noise clouds) and some scripting, but turning those off only increases the frame rate by about 3-4 hertz.

To get an acceptable frame rate for an FPS we had to reduce the draw calls to below 400. Which of course, hardly anything was left in the scene.

After more testing, it appears that we are vertex limited which doesn't make much sense to me because these cards are supposed to push 300 million verts a second. We are running about 30 million a second ( 3 million a frame ) Keeping everything else running and just reducing the number of verts in the scene gave us a 19 fps jump in performance.


Welcome to the wonderful world of PC DirectX.

DrawIndexedPrim or DrawPrim is incredibly expensive, the only way your going to make it run faster is make less DrawPrim calls.

Longhorn is supposed to fix this issue.
 
I believe it's a hardware issue, UV coordinates only go from 0-1 on the hardware driver.
Texture coordinates are not limited to 0..1. That would make GL_REPEAT (the default wrap mode in GL) quite useless now would it?

You shouldn't have trouble with coordinates in the range [-1k..+1k]. Beyond that, you start getting into LSB problems when magnifying large polygons (ie: you have less bits left in the mantissa to do the interpolation with).
 
I don't think draw calls are as expensive as people think.

I was optimizing the penumbra wedge algorithm, and the way the authors implemented it, they made two draw calls for each silhouette in the scene.

I thought I'd get an enormous win by figuring out a way to batch them, but it was less than I expected. Turn out that my little 2GHz Athlon XP can crank out over a quarter million calls per second. I read an NVidia presentation ("Batching 4EVA") that suggested no more than 1,000 calls per frame, which seems rather conservative to me.

My guess is that you are indeed more vertex limited than call limited. Have you tried optimizing the meshes? Are you using index buffers most of the time?

Remember that it's tough to get high efficiency all the time. Small batches will make the graphics card wait for the CPU, even if overall you think there are enough vertices to buffer it out. Also, large polygons will keep the vertex side waiting, so you may want to zoom out such that your scene objects are very tiny. This will measure vertex throughput more accurately.
 
"Texture coordinates are not limited to 0..1. That would make GL_REPEAT (the default wrap mode in GL) quite useless now would it?"

I should have explained myself better,.. we would like to combine sub-meshes to reduce draw calls. But if I have 2 sub-meshes each with a different tiling value, (say one mesh tiles one texture 3 times and the other tiles a different texture 8 times) I have been told that the 2 cannot be combined because the hardware driver cannot handle the UV coordinates for 2 tiling meshes.

That said, we've managed to keep draw calls in check using LOD's but we are still having issues which apparently is that we are vertex limited now.

But I am starting to agree with the previous poster that unless we can reduce our draw primitive calls, we are never going to speed it up. PSM does appear to be an "utter pig" hehe :)

Anyone have any other suggestions for self shadowing the objects in our level besides baking? We are already pushing the cards memory limit.

BTW.. we found in our testing that every 100 draw calls added would be about a 2-3 hertz hit. 800-1200 draw calls should be resonable for a fast machine, but our hang up is apparently somewhere else.
 
Back
Top