New dynamic branching demo

_xxx_ said:
Numbers please? (I have no nV card at the moment)

EDIT: would just like to know if X1x00 really does much better than the somewhat comparable nV counterpart with branching

I can't determine if it's the driver's or application's fault that you hardly can see anything with shadows enabled, but what good would any performance numbers do up until that minor issue has been fixed?

Just for the record I rechecked a couple of more scenarios and w/o branching the demo still runs slightly faster. Humus' point obviously is that dynamic branching increases performance; well apparently in that case not on GeForces ;)
 
Ailuros said:
I can't determine if it's the driver's or application's fault that you hardly can see anything with shadows enabled, but what good would any performance numbers do up until that minor issue has been fixed?

Just for the record I rechecked a couple of more scenarios and w/o branching the demo still runs slightly faster. Humus' point obviously is that dynamic branching increases performance; well apparently in that case not on GeForces ;)

It could be drivers, was looking at some of the ATi ogl demos in there sdk last night, had a similiar problem with the lights just not showing anything.
 
Humus, the site seems to be down. I get host unreachable when I ping it. Just FYI, if you don't already know. :)
 
_xxx_ said:
So is it safe to assume that nV's branching in 79xx is not all that bad as thought (looking at pharma's results)?

pharma, the first set of numbers shows it to be faster with branching? You sure you didn't switch the numbers?
This is not exactly a showcase of dynamic branching, though.

For one thing, it's a simple "if", not a loop like all the POM techniques. For another, the number of instructions being skipped is quite small, especially considering all the texture accesses are outside the branch. And with only 2 lights, all the other overhead, like shadowmap rendering per light, limits what theoretical gains you can have. Disabling shadows greatly reduces the number of pixels that skip the lighting code. So whether your dynamic branching is good or not won't make a huge difference.

Pharma, how is it that you're getting 9 times the fillrate of a 7800GT? I think the fullscreen framerates are more accurate.
 
Razor1 said:
Its up again ;)
Yes, thank you. :)

Edit: I had to force Vsync off. Anyone else have this problem? Does the app request Vsync on?

Some numbers:
P4c 2.4@2.8
2 GB RAM @ 232 MHz (462 DDR)
GeForce 7800 GS - 84.43 Forceware
1600x1200 4xFSAA (full screen)

When everything but shadows are enabled I get around 77 FPS. With shadows I get almost total darkness, the lights are not glowing but are just colours that move and I get 68 FPS.

Dynamic branching and shadows off: 84 FPS.

Nothing enabled: 54 FPS

Btw, I had gamma correct AA and transparency aa (multisampling) enabled and trilinear/anisotropic optimizations too.

At 640x480 4xFSAA (full screen) with all but shadows enabled I get 360 FPS.
 
Last edited by a moderator:
pharma said:
PNY 7900 GTX OC (Stock clocks @ 675/820); Nvidia Driver 84.30; CPU X2 4800

1920x1200 4xAA 16xAF
Shadows enabled, single pass enabled, animate lights enabled:
w/ branching 510 - 553 fps
w/o branching 490 - 530 fps

1920x1200 4xAA 16xAF
Shadows disabled, single pass enabled, animate lights enabled:
w/ branching 586 - 637 fps
w/o branching 590 -670 fps

Results with single pass disabled:

1920x1200 4xAA 16xAF
Shadows enabled, single pass disabled, animate lights enabled:
w/ branching 320 - 341 fps
w/o branching 326 - 349 fps

1920x1200 4xAA 16xAF
Shadows disabled, single pass disabled, animate lights enabled:
w/ branching 345 - 374 fps
w/o branching 366 - 387 fps

Edit: Add results with single pass disabled

Pharma


I believe these results are for 640x480, since they are so similar to my ones under the same resolution. To compare, I'm posting my corresponding results. (Just edited on your text, I guess you wouldn't mind:))


640x480 4xAA 16xHQAF
Shadows enabled, single pass enabled, animate lights enabled:
w/ branching 480 - 520 fps
w/o branching 450 - 470 fps

640x480 4xAA 16xHQAF
Shadows disabled, single pass enabled, animate lights enabled:
w/ branching 680 - 690 fps
w/o branching 670 -690 fps

Results with single pass disabled: (branching results Varied greatly, depending on the position of the ball)

640x480 4xAA 16xHQAF
Shadows enabled, single pass disabled, animate lights enabled:
w/ branching 300 - 500 fps
w/o branching 190 - 195 fps

640x480 4xAA 16xHQAF
Shadows disabled, single pass disabled, animate lights enabled:
w/ branching 410 - 700 fps
w/o branching 225 - 230 fps

Umm, looks like with single pass disabled, the difference is quite significant. So I did more test with single pass off:

1680x1050 4xAA 16xHQAF
w/ branching 110-150 fps
w/o branching 57-58 fps

1280x1024 4xAA 16xHQAF
w/ branching 140-210 fps
w/o branching 79-81 fps

Mind you that my CPU is only P4 3G, so results at lower resolutions might not be comparable to systems with fancy CPU's.
 
pharma said:
I don't think this is correct -- the objects & textures within the app window are too highly detailed. To test, I changed my monitor resolution to 640x480 and ran the app -- textures were horrible and lacked detail. Could not make any adjustments on the menu (F1) as it was 1/2 off screen.

I'm sure Humus will enlighten us. When I get back later will rerun the results.

Pharma

I guess you first turned on fullscreen, changed the resolution, and then turned it off? In that case the new resolution indeeed displays in the dialogue, but the real resolution is changed back to 640x480. I can tell this from framerate changes.
 
Here is the repost with revised results. I have also lowered the resolution so comparisons can be made with other results posted already:

PNY 7900 GTX OC (Stock clocks @ 675/820); Nvidia Driver 84.30; CPU X2 4800

Results at 1680x1050 Full Screen enabled:

1680x1050 4xAA 16xAF
Shadows enabled, single pass enabled, animate lights enabled:
w/ branching 149 - 155 fps
w/o branching 154 - 159 fps

1680x1050 4xAA 16xAF
Shadows disabled, single pass enabled, animate lights enabled:
w/ branching 179 - 182 fps
w/o branching 193 - 198 fps

Results with single pass disabled:

1680x1050 4xAA 16xAF
Shadows enabled, single pass disabled, animate lights enabled:
w/ branching 87 - 91 fps
w/o branching 90 - 95 fps

1680x1050 4xAA 16xAF
Shadows disabled, single pass disabled, animate lights enabled:
w/ branching 95 - 99 fps
w/o branching 99 - 103 fps


Below are the results when run at 640x480 Full Screen enabled:

640x480 4xAA 16xAF
Shadows enabled, single pass enabled, animate lights enabled:
w/ branching 590 - 648 fps
w/o branching 595 - 652 fps

640x480 4xAA 16xAF
Shadows disabled, single pass enabled, animate lights enabled:
w/ branching 698 - 792 fps
w/o branching 798 - 842 fps

Definitely seems the applications branching code does not improve performance on NV cards.

Pharma
 
Last edited by a moderator:
satein said:
And I keep getting this error prompt while change setting on F1 menu

Hmm, that's really strange. That error should not happen, lVec is always written in the vertex shader.

Ailuros said:
Without branching it runs a tad faster on the 7800GTX here; when I use shadows it turns out so dark that I can only see the animated lights. W/o shadows it's fine. ForceWare 84.25.

I forgot nvidia don't support 16bit depth buffers with FBOs. I've uploaded a new version that should hopefully fix this.
 
chavvdarrr said:
afais when not in full-screen the app can run only at 640x480

In windowed mode you can use any resolution. Just resize the window. Actual size can be seen in the titlebar as you resize. I only disable the resolution dropdown menu when you uncheck "fullscreen" because that widget only applies to the fullscreen mode.
 
pharma said:
A nice option would be to export the average fps to a text file, instead of trying to eyeball the range of fps.

You can disable light animation. That should keep the framerate pretty much constant (assuming you don't move around).
 
Humus said:
I forgot nvidia don't support 16bit depth buffers with FBOs. I've uploaded a new version that should hopefully fix this.

Yes it does :)

2048*1536, 4xAA/16xAF
w/o branching ~69 fps
with branching ~68 fps

7800GTX@490/685MHz, 84.25
 
I forgot nvidia don't support 16bit depth buffers with FBOs. I've uploaded a new version that should hopefully fix this.
ive noticed this before and have posted the question why? but got no answers.
i know theyre the opposition but do u have any links where u got this info from, zed
 
Humus said:
You can disable light animation. That should keep the framerate pretty much constant (assuming you don't move around).

Are ya going to be adding POM to this demo? You have peeked my interest ;)
 
zed said:
ive noticed this before and have posted the question why? but got no answers.
i know theyre the opposition but do u have any links where u got this info from, zed

I don't know where I got it from. Probably opengl.org forums. I have no idea what the reason is.

Razor1 said:
Are ya going to be adding POM to this demo? You have peeked my interest ;)

Nah, not to this demo. Maybe for another one. One of my halfdone projects laying around is a parallax with distance function demo, which I guess I could try to finish some time soon.
 
Mintmaster said:
For one thing, it's a simple "if", not a loop like all the POM techniques. For another, the number of instructions being skipped is quite small, especially considering all the texture accesses are outside the branch. And with only 2 lights, all the other overhead, like shadowmap rendering per light, limits what theoretical gains you can have. Disabling shadows greatly reduces the number of pixels that skip the lighting code. So whether your dynamic branching is good or not won't make a huge difference.

3 lights. ;) And some texture lookups (the shadow map) is within the branch. I just uploaded a new version where I've tweaked the branching to improve performance. I added an outer branch condition on all light radii that will put the bumpmap lookup and some math inside the branch. The single pass dynamic branching path is now about 40% faster than no branching on my X1800XL.
 
very interesting.
With new version without branching I get almost constant 120fps in default window (all but branching ON)
With branching off, fps start to jump between 100 and 150, more often around 110-115, so on average its ~ same fps, but with wide variations.
It seems to me that IF branches can eliminate enough workload even Nv cards can show some small gains .
 
Here is a the "strange" FPS oscillation with branching enabled in the new version compared to the flat FPS graph where branching is disabled:

fpsgraph8ki.png
 
From an NV point of view it would be a cool exercise to implement the shaders in cg and see what effect the different profiles have on the performance. It may be that the GLSL compiler is lagging behind in certain optimisations.
 
Back
Top