Faking dynamic branching - technical discussion

Discussion in 'Rendering Technology and APIs' started by Mintmaster, Jul 3, 2004.

  1. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    The limitation I'm talking about is the size of the batch of quads.
     
  2. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    Since the size of the batches is probably related to latency hiding, I would expect that it would be invariable. Perhaps in a future architecture, they'll reduce the size of the batches if latency hiding isn't necessary (i.e. no nearby texture instructions). This would probably pave the way toward unification of pixel and vertex pipelines.
     
  3. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    I've heard this batch theory from other sources, so I believe it's true, but how is this supposed to work? You can only know whether you have to do both branches after you've tested the if condition for all pixels in the batch. Wouldn't that add an enormous latency?
     
  4. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    Not really. The branching cost is around 9 cycles. Maybe that one pipeline pass is used to test the if condition and that a decison is taken after that.

    Honestly I've done these tests some weeks ago and I still don't have a satisfactory explanation.

    For example, if a decision is taken for every batch of 4096/8192 pixels :
    I create a block of 4100/8196 pixels using a branch and a next similar block using the other branch.
    The first 4096/8192 pixels should be computed at full speed (only one branch). However the next pixels shouldn't because there are 4 pixels using branch 1 and 4092/8188 pixels using branch2 in the second batch.

    However that isn't working that way. It seems that the first batch is expanded to 4100/8196 pixels. I can't explain that.
     
  5. Drak

    Newcomer

    Joined:
    May 16, 2004
    Messages:
    71
    Likes Received:
    0
    Tridam,

    Are you saying that if the number of pixels submitted for rendering is greater than 1024, then the hardware (+ driver software?) can rearrange the quads in a batch so that a batch contains only pixels taking the same branch? If on the other hand, the number of pixels submitted for rendering is less than 1024, then, there is no sorting of quads and all the quads are executed in the same batch?

    Also, are you saying that the 4 quads of 4 pipelines in an Ultra won't execute quads separately until they are completed but instead runs the same instruction for the whole batch before moving onto the next? Is that for branches specifically or for pixel shaders in general?
     
  6. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    No, rearranging pixels is of course not possible. I'm saying that the batch size doesn't seem to be fixed and seems to be 1024 quads (actually I think that it is 2048 quads) or more. I can't explain that.

    For pixel shaders in general.
     
  7. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    It would add latency so 1024 quads seems like a lot. Tridam, are all of the quads in your test from a single triangle or multiple triangles? I don't know that it should matter, but the results seem a little strange.
     
  8. Evildeus

    Veteran

    Joined:
    May 24, 2002
    Messages:
    2,657
    Likes Received:
    2
    Tridam,
    Next time you have a 6800 in your hands, could you test once more your code, with newer drivers? It would be interesting to see if there's any improvements :)
     
  9. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    I'm doing fillrate tests on fullscreen with 2 triangles.

    However I've also done some tests with more triangles. With small triangles the results says that both branches are computed for every pixel.
     
  10. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    I have a 6800 here. I've tested my code with different driver revs and the result is the same.
     
  11. Evildeus

    Veteran

    Joined:
    May 24, 2002
    Messages:
    2,657
    Likes Received:
    2
    Ok, thanks :)
     
  12. nelg

    Veteran

    Joined:
    Jan 26, 2003
    Messages:
    1,557
    Likes Received:
    42
    Location:
    Toronto
    If this remains so, it seems like a stretch to say that the NV40 can do dynamic branching.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...