New dynamic branching demo

fellix said:
Here is a the "strange" FPS oscillation with branching enabled in the new version compared to the flat FPS graph where branching is disabled:
afais the highs are when lights are "behind the camera" ;)
 
chavvdarrr said:
very interesting.
With new version without branching I get almost constant 120fps in default window (all but branching ON)
With branching off, fps start to jump between 100 and 150, more often around 110-115, so on average its ~ same fps, but with wide variations.
It seems to me that IF branches can eliminate enough workload even Nv cards can show some small gains .

I don't notice any difference in framerate fluctuations which are small here anyway whether AA/AF enabled or disabled, branching on or off and that both with 84.25 and 84.43.

I don't even notice any other difference between the older and the new version in framerate behaviour either.
 
Ailuros said:
I don't notice any difference in framerate fluctuations which are small here anyway whether AA/AF enabled or disabled, branching on or off and that both with 84.25 and 84.43.

I don't even notice any other difference between the older and the new version in framerate behaviour either.
are you sure you force vsync off ?
I'd expect higher fps from 7800... perhaps your high res&AA/AF limit the fps?
 
This seems like a good chance to test the X1600 I installed yesterday in my work PC ;)

At 1024x768 no AF no AA it's 80-90 FPS with dynamic branching disabled, 80-120 FPS with dynamic branching enabled. The 120 FPS happen when looking towards an area mostly, or all, in shadow (as expected).
 
chavvdarrr said:
are you sure you force vsync off ?
I'd expect higher fps from 7800... perhaps your high res&AA/AF limit the fps?

Since I get 116-117 fps in 1600*1200 with 4xAA on a CRT, you'd think that vsync is disabled unless you're aware of a CRT that manages 120Hz at 1600. Those performance measurings are from the spot where the demo defaults to upon firing it up. If I move around of course the framerate changes.

69 fps at 2048*1536 is more than a decent measuring and all of those are with the GPU clocked at 490/685MHz.
 
fellix said:
Here is a the "strange" FPS oscillation with branching enabled in the new version compared to the flat FPS graph where branching is disabled:

fpsgraph8ki.png

That's expected. Without branching the light position doesn't affect the amount of pixel shader work done. All computations are done on all pixels, so the cost is pretty much constant. With dynamic branching however you get a big performance difference between when a light source lights up a lot of stuff close to you and when it's behind you or far away and affects many fewer pixels on the screen.
 
:!: With fluctuations like that, I would rather not have the dynamic branching... but I suppose this is an isolated case :|
 
Anyone with x1K hardware care to take the same meashurment? Would be nice to compare with the "threadless" nV couterparts. :D

So, the possible conclusion is that, if there is a lot of light sources in the viewport (scene at all) the lower fps will be for most of the time, so it's clear that dynbranching is generaly not recomended here for my case, but it is obvious that on less complex scenes branching is able to catch a few more frames out there despite the lack of fine-grained threading of the batches.

Humus, is it possible to define some more lights in your demo - i think it could be a good run for a kind of DynBranch benchmark. :D
 
Ailuros said:
69 fps at 2048*1536 is more than a decent measuring and all of those are with the GPU clocked at 490/685MHz.
:oops: sorry, I didn't saw the resolution :D

2 RoOoBo
@ 1024x768 I get 50-85fps (mostly around 55-60) with DB and 65-66fps without. That is with 525/1100, so this tests seems to show the strength of X1600...
btw what is your X1600 - Pro or XT ? Also a graph like one given by fellix could be interesting, I huess he made it with RivaTuner.

2 fellix
има сорсове, какво чакаш :)
 
Last edited by a moderator:
Humus said:
3 lights. ;) And some texture lookups (the shadow map) is within the branch. I just uploaded a new version where I've tweaked the branching to improve performance. I added an outer branch condition on all light radii that will put the bumpmap lookup and some math inside the branch. The single pass dynamic branching path is now about 40% faster than no branching on my X1800XL.
Okay, neat. I was going to ask you why you didn't put the bump normalization and reflection in the branch, but then I realized you need that same info for every light. If it was in the loop you'd have to jump over it after the first light, and I figured it'd be too much trouble.

Still, I think when you consider that the X1900XT and even the 7900GT have much more math power, reducing the time spent in the branch, I don't think this particular use of DB will proliferate. The fluctuating framerate graphs also suggest that this may not be the most useful optimization, but I guess that depends largely on the scene.

Right now I think the two useful techniques that benefit most from dynamic branching is POM and nice shadow mapping.

To highlight the advantages of DB for lighting, IMO, a better situation would be having a bunch of lights and doing attenuation on the per vertex level. A quake type scene with plasma guns and rocket launchers used by many players would look pretty sweet if everything was a dynamic light.
 
Alstrong said:
:!: With fluctuations like that, I would rather not have the dynamic branching... but I suppose this is an isolated case :|

I'd figure that the driver takes that decision for you in real time conditions ;)
 
Alstrong said:
:!: With fluctuations like that, I would rather not have the dynamic branching... but I suppose this is an isolated case :|

On a efficient DB hardware the lows on the left should be (in the worst case) on par with the constant part in the right. So it should be all good (can someone with a radeon so the same measurement?
 
NocturnDragon said:
On a efficient DB hardware the lows on the left should be (in the worst case) on par with the constant part in the right. So it should be all good (can someone with a radeon so the same measurement?
According to RoObo, that is the case: 80-90 without, 80-120 with DB
 
Ailuros said:
I'd figure that the driver takes that decision for you in real time conditions ;)


Not if I rewrite the driver!



...........



In any case, while a performance improvement is nice, 80-120 instead of 80-90, I'm much more sensitive to fluctuations, which is evil.

er.... wait... this would be better for v-sync situations... right... nevermind! :oops:
 
Alstrong said:
:!: With fluctuations like that, I would rather not have the dynamic branching... but I suppose this is an isolated case :|

It looks worse on that graph than what's the case since it's a bit compressed. Each one of those peeks would be a full circle of the light that goes around in a square. The frame to frame fluctuations are minimal. What you see in that graph is no worse than what would happen if you turn around or walk around a corner in your average FPS game.
 
fellix said:
Anyone with x1K hardware care to take the same meashurment? Would be nice to compare with the "threadless" nV couterparts. :D

My results on X1800XL AIW.

Single pass:
dynamicbrancing0.png

Without dynamic branching: 66.0fps average
With dynamic branching: 98.9fps average (+49.8%)

Multipass:
dynamicbrancing1.png

Without dynamic branching: 34.15fps average
With dynamic branching: 83.4fps average (+144.3%)

fellix said:
So, the possible conclusion is that, if there is a lot of light sources in the viewport (scene at all) the lower fps will be for most of the time, so it's clear that dynbranching is generaly not recomended here for my case, but it is obvious that on less complex scenes branching is able to catch a few more frames out there despite the lack of fine-grained threading of the batches.

Well, if a light affects almost all pixels on the screen, then yes there will be a little worse performance than without dynamic branching for that light. Otherwise dynamic branching should be faster to varying degree. The fewer pixels a light affects, the bigger the advantage of brancing.

fellix said:
Humus, is it possible to define some more lights in your demo - i think it could be a good run for a kind of DynBranch benchmark. :D

Well, then I'd have to move some work over to the pixel shader in the single pass case due to lack of interpolators, or make it a two pass case with 3 lights per pass.
 
Additional ~50% with db is impressive. Even more so the 140+% increase in multipass.

Is there a chance we'll see db and multipass combinations in upcoming games?
 
Well, there are both multipass and single pass techniques used in games, and I don't think any method will go away in general anytime soon, even though I think things will shift more toward single pass for lighting. 140% performance increase is of course impressive, but part of this is because multipass for lighting is less efficient to begin with. There's simply more redundant work being done that you can cut off with dynamic branching. With dynamic branching the multipass method get closer to the performance of single pass, but it's still slower and there will still be computations and texture fetches that are repeated. Just by going to single pass you can reduce a lot of repeated computations, even without doing any dynamic branching. That's also why the performance increase is lower. But 50% is of course not bad at all.
 
The impact of multipass (for lighting) in Far Cry performance, until the >SM2.0 patches, was horrendous from what I recall. And if I'm not entirely off track most of the cases in that game were doable even with SM2.0.

I sure hope developers will use multipass only in cases where single pass isn't feasable anymore.

Oh and yes of course is +50% impressive :)
 
ok here are my results.

x1800xl btw.


I paused the scene to get the results. 1600x1200 with 6xAA, 8x HQ-AF.

few lights on screen:
-----------

no branching: 56fps
with branching: 97fps

multi-pass:

no branching: 33fps
with branching: 88fps

multi-pass, no shadows:

no branching: 37fps
with branching: 92fps


a number of lights on screen:
-----------

no branching: 58fps
with branching: 64fps

multi-pass:

no branching: 33fps
with branching: 54fps

multi-pass, no shadows:

no branching: 37fps
with branching: 58fps



This is probably a good test for nivida hardware as I'd imagine the branch batch size limit doesn't have such a significant affect on final numbers.

[edit]

it might actually be 4x AA as I have 4x forced in the driver, but 6x set in the app.
 
Last edited by a moderator:
Back
Top