Dynamic Branching Benchmark

Discussion in 'Architecture and Products' started by JeGX, Jan 8, 2007.

  1. TG01

    Newcomer

    Joined:
    Dec 18, 2006
    Messages:
    40
    Likes Received:
    1
    Sleep(0) : A value of zero causes the thread to relinquish the remainder of its time slice to any other thread of equal priority that is ready to run. If there are no other threads of equal priority ready to run, the function returns immediately, and the thread continues execution. (MSDN)

    Sleep(1) will wait for 1millisec each time it is being called etc

    Since you're programming you cannot not know how this affects performace in any given app... ?! right ..?
     
  2. Zengar

    Regular

    Joined:
    Dec 3, 2003
    Messages:
    288
    Likes Received:
    1
    For an empty loop, Sleep(1) us long. For a real rendering loop that takes much time, Sleep(1) can be nothing, while freeing the CPU time for other applications. Probably :)
     
  3. Andrew Lauritzen

    Andrew Lauritzen Moderator
    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,632
    Likes Received:
    1,251
    Location:
    British Columbia, Canada
    On a related note, I've found that turning on vsync gives the driver a better chance at giving some time back to the OS, but even in that case on some previous (unnamed :)) cards, 100% CPU was still taken.

    In any case, this isn't so much of an issue with multi-core CPUs becoming the norm. Vista/DX10 might handle it better as well.
     
  4. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Another option is Sleep(0) - that'll misbehave in FreeBSD and perhaps some other OSs though.


    Uttar
     
  5. TG01

    Newcomer

    Joined:
    Dec 18, 2006
    Messages:
    40
    Likes Received:
    1
    In that case a single sleep(1) would have a really minor effect..
    Remember it's the OS which is handing out chunks of CPU time to each running process based on its priority. So even if you're really trying hard to get 100% CPU in reality you cannot if there are other processes running.
    (Then again DX9 code integrates in the Kernel itself thereby maybe complicating things a little further)
     
  6. JeGX

    Newcomer

    Joined:
    Jan 5, 2007
    Messages:
    11
    Likes Received:
    3
    Interesting discussion :wink:

    Just for you guys, here are three variant of the benchmark exe:
    - oZone3D_SoftShadows_Benchmark_VSYNC_OFF_Sleep(0).exe : 100 %CPU
    - oZone3D_SoftShadows_Benchmark_VSYNC_OFF_Sleep(1).exe : 100 % CPU
    - oZone3D_SoftShadows_Benchmark_VSYNC_ON_Sleep(0).exe : 5 % CPU

    LINK: http://www.ozone3d.net/public/downloads/SoftShadows_Benchmark_New_Exe.zip
    Just put these exe in the benchmark directory.

    From msdn (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/sleepex.asp),
    a value of Sleep(0) causes the thread to relinquish the remainder of its time slice
    to any other thread of equal priority that is ready to run.

    For GLQuake, I guess vsync is enabled, that might explain the little CPU work.

    JeGX
     
    Simon F likes this.
  7. nyt

    nyt
    Newcomer

    Joined:
    May 14, 2003
    Messages:
    80
    Likes Received:
    0
    Location:
    Mtl
    X1900XTX at 672/859, CAT7.1, E6400@3280
    DB OFF 1938, ON 3897
    Both cores at ~50%
    Shadows are indeed very blocky.
     
  8. JeGX

    Newcomer

    Joined:
    Jan 5, 2007
    Messages:
    11
    Likes Received:
    3
    hi,

    at the beginning of the thread, I wondered why dynamic branching performances on Geforce 7 were worse than ones on Geforce 6 or 8. I believe I've got the answer: Forceware drivers. Here are some new results (benchmark v1.5.4) where ratio = Branching_ON / Branching_OFF :

    7600GS - Fw 84.21 - Branching OFF: 496 o3Marks - Branching ON: 773 o3Marks - Ratio = 1.5
    7600GS - Fw 91.31 - Branching OFF: 509 o3Marks - Branching ON: 850 o3Marks - Ratio = 1.6
    7600GS - Fw 91.36 - Branching OFF: 508 o3Marks - Branching ON: 850 o3Marks - Ratio = 1.6
    7600GS - Fw 91.37 - Branching OFF: 509 o3Marks - Branching ON: 850 o3Marks - Ratio = 1.6

    7600GS - Fw 91.45 - Branching OFF: 509 o3Marks - Branching ON: 472 o3Marks - Ratio = 0.9
    7600GS - Fw 91.47 - Branching OFF: 509 o3Marks - Branching ON: 472 o3Marks - Ratio = 0.9
    7600GS - Fw 93.71 - Branching OFF: 508 o3Marks - Branching ON: 474 o3Marks - Ratio = 0.9
    7600GS - Fw 97.92 - Branching OFF: 505 o3Marks - Branching ON: 478 o3Marks - Ratio = 0.9
    7600GS - Fw 100.95 - Branching OFF: 508 o3Marks - Branching ON: 480 o3Marks - Ratio = 0.9

    my conclusion is: dynamic branching in OpenGL works fine (read the performance are better than without dynamic branching: ratio > 1) for forceware <= 91.37. For the drivers >= 91.45, the ratio drops under 1. It seems a "little" bug slipped into the driver code that manages dynamic branching from forceware 91.45. I've also done the test with the simple soft shadows demo provided with the NV SDK 9.5. The results are the same. So what do you think of this conclusion ? I've got it completely wrong or is there really a bug in the forceware drivers :?:
     
    Razor1 and Geeforcer like this.
  9. Geeforcer

    Geeforcer Harmlessly Evil
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,320
    Likes Received:
    525
    This is very interesting. The question is, how in the world could Nvidia not notice that, esp. with all the complaints about DB performance?
     
  10. chavvdarrr

    Veteran

    Joined:
    Feb 25, 2003
    Messages:
    1,165
    Likes Received:
    34
    Location:
    Sofia, BG
    what if that high performance was due to a bug ie incorrect handling of DB in certain situations?

    And after all - how many real-world DB examples we have?
    maybe NV decided to lower results and keep it secret until/when a real app becomes available in order to claim "driver improvement"
     
  11. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    Hmm, doesn't that resembles the sentence it's not a bug, it's a feature, then? :lol:

    So, now there are two odd occasions with DB, on presumably "inferior" NV hardware -- the slightly positive ratio on GF6 series, and now the GF7. Well, they are far cry from what the R500 chippery is showing, and G80 is by my mean fillrate/bandwidth limited in this test.
     
  12. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    To improve DB performance you need smaller batches, to have smaller batches you need to use more registers..as long as you can hide texturing latency.
    Wouldn't be suprised if the driver would artificially inflate the number of registers used by a shader to improve DB performance..
     
  13. Bludd

    Bludd Experiencing A Significant Gravitas Shortfall
    Veteran

    Joined:
    Oct 26, 2003
    Messages:
    3,794
    Likes Received:
    1,479
    Location:
    Funny, It Worked Last Time...
    And now:

    GF8GTX FW 97.92

    DB ON: 5473
    DB OFF: 4164
     
  14. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    I've always wonder of how much the fixed batch size (1K fragments) on GF7 marchitecture is contributing over the previous 4K [NV40] one. Not that 1K is much (if any better) than 4K, but having more than one program counter in the fragment core is sure to be helpful. :wink:
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...