SM 3.0, yet again.

Discussion in 'Architecture and Products' started by Frank, Mar 7, 2005.

  1. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    Ok. But Tridam's results showed, that the batches are expanded to cover the whole area that uses the shader if there are pixels that run the other branch, while there is still a penalty of 9 clocks for each branch instruction.

    So, in how far wouldn't it be better to use a single (lineair) shader or multipassing if branches are actually taken? And if they aren't, why not use a different shader for that frame? That would save you at least 9 clocks per pixel.

    I'm still trying to understand in what cases that would be useful.
     
  2. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    Batch size is only increased if more quads take the same branch.
     
  3. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    And then there's the demo that 99160 just posted that shows a dramatic performance improvement by implementing dynamic branching.

    So it's clearly obvious that it is possible to extract a performance improvement through dynamic branching vs. "compute all paths and choose the correct result" path.

    And it is further clearly obvious that any multipass technique that one chooses to use will require one to pass the geometry multiple times to the video card, and thus will not run nearly as fast as either of the above solutions in a geometry-limited case.
     
  4. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Given how much things are geometry limited in the first place (not particularly frequently!) it doesn't seem like there are that many occasions that are going to be geometry limited if the rendering quirements are asking for an extra pass.
     
  5. John Reynolds

    John Reynolds Ecce homo
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    4,491
    Likes Received:
    267
    Location:
    Westeros
    And Dave, the perennial tease, has changed his sig again. :p
     
  6. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Except that extra passes increase the geometry limitations, since each pass is inherently shorter than one single pass.
     
  7. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Yes, but you must have reached some other limitations to require that extra pass and in many circumstances those are likely to be more limiting than a geometry pass.
     
  8. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Like what? And why?

    After all, here's a specific instance where you could get geometry-limited through multipassing very quickly:

    Imagine Humus' stencil-based dynamic branching multipass demo that he did some time ago. The basic idea is that you don't render if the pixel in question is some distance away from the light source. This rendering is done in two passes per light source (1. Check for visibility. 2. Render visible pixels.)

    Now, in this situation, if you are even getting close to geometry-limited in, say, a 4-light scene, this technique multiplies the required geometry rendered by a factor of eight, for approximately the same total pixel processing (as true dynamic branching), and therefore is rather likely to reduce performance.
     
  9. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    It's not just extra geometry, it's bandwidth as well in many cases, since the latter passes may have to use stencil/z-test or blending to combine results.
     
  10. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    Has anyone done some more testing in the last months? I would really like to see the results of those tests. That might be better than speculating.
     
  11. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    It's nice to note that suddenly to have SM3.0 support is cool again, LOL :roll: :lol:
     
  12. Frank

    Frank Certified not a majority
    Veteran

    Joined:
    Sep 21, 2003
    Messages:
    3,187
    Likes Received:
    59
    Location:
    Sittard, the Netherlands
    Did you check the dates? :D
     
  13. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    No, I didn't. I can just smell it in the air.. ;)
     
  14. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,532
    Location:
    Winfield, IN USA
    Not to me, I still don't see much need/use for it. :)
     
  15. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Just wait the reviews.. new spin is coming :)
     
  16. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Yep, full reverse spin I bet. Once ATI has SM3.0, I think all the SM3.0 naysaying will disappear, and all of a sudden a whole crop of "only possible with SM3.0" scenarios will appear. And people in the past who were exclaiming no big deal between sm2.0b and sm3.0 will suddenly be at the head of the bandwagon, especially if ATI's performs better. For example, if ATI's dynamic branching performs better, then dynamic branching support will suddenly be an achilles heel, despite the fact that previously, it wasn't, and the real life scenarios where it was used were few and far between. Now, such support will be seen as *crucial*

    :popcorn mode engaged:

    (who remembers how horrible it was to waste *two* MB slots and how terrible it was not to support small form factor pcs, until.....)
     
  17. ANova

    Veteran

    Joined:
    Apr 4, 2004
    Messages:
    2,226
    Likes Received:
    10
    I know I won't, at least not unless ATI's solution manages to show something actually worthwhile pertaining to SM3's use, which nvidia certainly hasn't done yet.
     
  18. AndrewM

    Newcomer

    Joined:
    May 28, 2003
    Messages:
    219
    Likes Received:
    2
    Location:
    Brisbane, QLD, Australia
    That is exactly what DemoCoder was talking about.

    :)
     
  19. Humus

    Humus Crazy coder
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,217
    Likes Received:
    77
    Location:
    Stockholm, Sweden
    A factor of 8 is quite stretching it. If you're limited by vertex fetch, then you can in some cases get near that. In normal situations where the shader is of decent length and you're more limited by the vertex shader, you won't get anywhere near a factor of 8. First of all, the visibility pass is very cheap, it's more or less just transform. Compared to the lighting shader it's short. In my demo it's maybe half the instructions of the lighting shader. For more advanced lighting the relative cost of the visibility shader goes down even further. Also, it's not like you can pop my lighting vertex shader right into a ps3.0 dynamic branching case. The vertex shader needed for that case of course gets larger since you still have to do computations for all lights and pass to the fragment shader. So it's not cutting the workload with 75%, but rather closer to like 30%-50% or so. So realistically we're not talking about a factor of 8 but rather something in the range 2-3.
     
  20. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    215
    Location:
    Uffda-land
    I think ATI has been pretty consistent in their message. . .its the hallelujah chorus that at times has been off message.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...