Dynamic Branching Benchmark

Discussion in 'Architecture and Products' started by JeGX, Jan 8, 2007.

  1. JeGX

    Newcomer

    Joined:
    Jan 5, 2007
    Messages:
    11
    Likes Received:
    3
    Hi all!

    here is a small OpenGL benchmark I coded for my graphics cards tests:
    http://www.ozone3d.net/demos_projects/soft_shadows_benchmark.php

    This benchmark is focused on the pixel processing unit: soft shadows rendering
    with a 7x7 filter. The bench lasts 1 minute. You can enable or disable the dynamic
    branching in the pixel shader in order to see the impact of branching.

    Here are some results:
    X1950XTX / C6.9 / Branching OFF : 1805 o3Marks
    X1950XTX / C6.9 / Branching ON : 3634 o3Marks

    7950GX2 / FW91.47 / Branching OFF : 2306 o3Marks
    7950GX2 / FW91.47 / Branching ON : 2125 o3Marks

    7600GS / FW97.28 / Branching OFF : 373 o3Marks
    7600GS / FW97.28 / Branching ON : 352 o3Marks

    6600GT / FW97.28 / Branching OFF : 391 o3Marks
    6600GT / FW97.28 / Branching ON : 603 o3Marks

    6800GT / FW97.28 / Branching OFF : 583 o3Marks
    6800GT / FW97.28 / Branching ON : 821 o3Marks

    We can see the benefits of dynamic branching: on radeon R5xx the perf are doubled (ratio=2).
    But on G7x the perf stay almost the same (ratio=0.9).

    The weird thing is on NV4x arch (6600GT/6800GT), the branching in pixel shader seems to be
    more efficient (ratio=1.4) than on G7x. Comes from NV4x's threads that contain less pixels ?
    If anyone has an explanation...

    Thanks,
    JeGX
     
  2. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,320
    Likes Received:
    525
    8800 GTX:

    4492 DB Off
    5925 DB On
     
  3. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land

    Errrm? So according to this test G80 has negative branching performance? (Edit: Nope, didn't read it right). But less branching performance than R580?
     
  4. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,633
    Likes Received:
    1,258
    Location:
    British Columbia, Canada
    I'm not sure that I'd jump to any conclusions yet... it's possible that the 8800 is simply bottlenecked elsewhere when DB is enabled.

    Specifically what is the demo doing? What is it branching on? Is the "7x7" convolution filter PCF, or some post-processing screen-space shader?
     
  5. JeGX

    Newcomer

    Joined:
    Jan 5, 2007
    Messages:
    11
    Likes Received:
    3
    The source code of the pixel shader (ps_7x7_bluring_kernel_v3b_tex.glsl) is in the data/ directory.
    Here is the branching code:

    Code:
    if( (shadowColor-1.0) * shadowColor * lambertTerm != 0.0 )
    {
    	float kernel_sum = 0.0;
    	int i = 0;
    	shadowColor =0.0;
    
    	for( i=0; i<KERNEL_SIZE; i++ )
    	{
    		kernel_sum += kernel[i];
    		shadowColor += shadow2D(shadowMap, shadowUV.xyz + offset[i]).x * kernel[i];
    	}	
    	
    	shadowColor /= kernel_sum;
    }
    
    In a word, if the current fragment is located on the edge of the shadow the pixel shader
    perform the shadow map filtering (KERNEL_SIZE texture lookup) with a mean filter.
    The basic idea comes from a nvidia demo.
     
  6. Graham

    Graham Hello :-)
    Veteran Subscriber

    Joined:
    Sep 10, 2005
    Messages:
    1,480
    Likes Received:
    210
    Location:
    Bend, Oregon
    Ahh...

    No UI pops up, it just launches into the benchmark... So I can't get any data.
     
  7. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    3,266
    Likes Received:
    817
    I get same.
    Readme says its version 1.5 though & exe is 7 Jan.
     
  8. ChrisRay

    ChrisRay <span style="color: rgb(124, 197, 0)">R.I.P. 1983-
    Veteran

    Joined:
    Nov 25, 2002
    Messages:
    2,234
    Likes Received:
    26
    From my system. I took these with 16xAA CSAA and 16xAF enabled though. .

    EVGA 680I SLI
    E6300 @ 2.8 Ghz
    Geforce 8800GTX SLI @ Stock Clocks.


    No SLI. 16xCSAA/16xAF

    Code:
    No Branching
    
    3170
    
    With Branching
    
    3965
    
    With SLI 16xCSAA/16xAF

    Code:
    
    No Branching
    
    6098
    
    Branching
    
    7568
    
    With SLI with AA/AF Disabled.

    Code:
    
    No Branching
    
    
    7946
    
    Branching
    
    10328
     
    #8 ChrisRay, Jan 9, 2007
    Last edited by a moderator: Jan 9, 2007
  9. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,320
    Likes Received:
    525
    I had this problem as well - I just forced Windows to run the program in 640x480, which does bring up the frontend popup. The benchmarking portion will still run at 1280x1024, so you can get the results.
     
  10. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    3,266
    Likes Received:
    817
    Cool, that fixed it :)
    DB off 1108
    DB on 2254

    E6600 @stock
    X1900GT also @stock, cat 6.12

    Some funny artifacting where two shadows overlap with DB on though.
    Artifacting is not present in the DB off case.

    Edit, screenshot, looks particularly bad where the rotating doughnuts are approaching & leaving the bit where the two sides shadows meet and in the middle under the balls.
    [​IMG] [​IMG]
     
    #10 hoom, Jan 9, 2007
    Last edited by a moderator: Jan 9, 2007
  11. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,802
    Likes Received:
    3,921
    Location:
    Germany
    How did you do that?

    edit: Win2k-Compatibility-Mode seems to work also to get to see the GUI again...
     
    #11 CarstenS, Jan 9, 2007
    Last edited by a moderator: Jan 9, 2007
  12. pocketmoon66

    Newcomer

    Joined:
    Mar 31, 2004
    Messages:
    163
    Likes Received:
    9
    A cool thing to try is vary the kernel radius with the distance to the nearest occluder :) Or you can take into account the distance to the light as well and estimate the penumbra size. As arrrse has pointed out it's a tricky technique to make 100% robust!
     
  13. Bludd

    Bludd Experiencing A Significant Gravitas Shortfall
    Veteran

    Joined:
    Oct 26, 2003
    Messages:
    3,794
    Likes Received:
    1,479
    Location:
    Funny, It Worked Last Time...
    GF7800 AGP FW 93.71
    DBranching on: 591
    DBranching off: 635

    :(
     
  14. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,633
    Likes Received:
    1,258
    Location:
    British Columbia, Canada
    Here's an image showing roughly the amount of work done at each pixel (alternatively the amount avoided by branching) - brighter is more shadow map reads: Branching.

    A few notes:

    1) ATI is potentially doing less work here as it does not support hardware bilinear PCF. Note the blocky shadows. In any case for custom PCF kernels like this it's best to just do the bilinear weighting of the border pixels yourself and avoid the redundant reads of doing a bilinear lookup at every pixel in the kernel. You can even get fancy and manipulate the bilinear fraction based on your kernel weightings, but that may or may not be worth it.

    2) I wouldn't necessarily conclude that ATI's branching is more efficient than the 8800 series, as 49 texture reads might not necessarily be enough to totally bottleneck an 8800 (in my experience) :) In particular it looks like you're doing multipass lighting which is a big vertex transform (perhaps not in this scene) and fill rate killer.

    3) The method used to "detect whether you're in a shadow edge" region is a bit error-prone. I remember the original NVIDIA paper and it was error-prone then as well ;) It's certainly possible to get a more continuous and conservative estimate (VSM will give you it for example) and that might play even nicer with the branching hardware in both cards.

    That said, interesting demo and results. I suspect comparing ATI/NVIDIA though we're more testing texture read/bandwidth than dynamic branching efficiency. The latter is probably best tested by something synthetic like GPUBench due to the large number of potentially confounded factors.
     
    #14 Andrew Lauritzen, Jan 9, 2007
    Last edited by a moderator: Jan 9, 2007
  15. Miksu

    Regular

    Joined:
    Mar 9, 2003
    Messages:
    997
    Likes Received:
    10
    Location:
    Finland
    X1950Pro & Cat 6.12:

    DB off: 1288
    DB on: 2583
     
  16. JeGX

    Newcomer

    Joined:
    Jan 5, 2007
    Messages:
    11
    Likes Received:
    3
    Thanks for the tip pocketmoon66. I will look at it when I'll add soft shadows in my main software.
     
  17. JeGX

    Newcomer

    Joined:
    Jan 5, 2007
    Messages:
    11
    Likes Received:
    3
    You're right AndyTX, that's why I set the shadow map filtering to GL_NEAREST in order to disable nVidia PCF.

    Yes with my simple mean filter the shadow is a bit blocky:
    [​IMG]

    But with the help of the hardware (nvidia only - GL_LINEAR ) and with the same 7x7 filter, the shadow is better:
    [​IMG]
     
  18. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,633
    Likes Received:
    1,258
    Location:
    British Columbia, Canada
    Ok, cool. I didn't have an NVIDIA machine here handy to test with :)

    Sure. What I'm saying is that you can get the same linear filtering by sampling an 8x8 rectangle and applying the proper bilinear weights to the border pixels (all of the interior pixel bilinear weights will sum to one). This will work on all cards and for sufficiently large filter regions will be as fast or faster than hardware bilinear (i.e. GL_LINEAR).
     
  19. TG01

    Newcomer

    Joined:
    Dec 18, 2006
    Messages:
    40
    Likes Received:
    1
    Sweet .. :)
    my X1900XT256MB C6.12 gets

    1944 DB OFF
    3720 DB ON
     
  20. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,535
    Likes Received:
    144
    Couldn`t Fetch4 help you to some extent in this particular situation?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...