optimized by post-T&L vertex cache

Discussion in 'Architecture and Products' started by ultrafly, Feb 20, 2003.

  1. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,563
    Likes Received:
    171
    Location:
    In the Island of Sodor, where the steam trains lie
    Sorry I meant to also ask how large are the triangles with the "large" case. Are any being clipped or off-screen culled?
    I've been trying to think if might be due to fill-rate behaviour but I need a bit more information. Do you know the frame rates for four situations? What's the resolution of the image?

    This probably won't have much effect, but could you try removing your "FIFO pre-load" of the vertices or, if you still want to have some effect on XBox (or perhaps all Nvidia chips???), just preload say, vertices 1 to 4? I suspect it's not helping you with the ATI HW.
     
  2. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    Do you just do two strips next to each other (+ the first cache filling "strip")?

    What I'm interested in is what happens after vertex 44 in your case. New PrimeVertexCache or a new strip like 44, 30, 30, 45, 31, 46, ... .
    If it's the second case, how many parallel strips do you do between each PrimeVertex? (I would call the example you gave 1+2 strips.)

    Btw, what's the framerates in the two small-poly cases?

    I have a theory, but don't know if it holds water.
     
  3. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    Before and after reduce the size of triangles,All triangles are in screen.No clip and no cull.
    I think the fill-rate of optimized code is the same as unoptimized code,and changed at the same time.
     
  4. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    That is a sample.
    In my code,parallel strips between each PrimeVertex are 100;

    "two small-poly cases" What is your means?

    I am very interested with your theory.Please share with me,thx.
     
  5. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    up...............
     
  6. ERP

    ERP
    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    OK I'm guessing here.....

    Your performance probably becomes partially setup bound as you reduce the tri size, and the optimised version has more degenerate tris in it than the none optimised version. Abrash's suggestion is spoecifically for NV2X which has a special case for degenerate setup which will in most cases prevent this, R9XXX may or may not.
     
  7. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    When i reduce the tri size,only changed is the fill-rate,the throughput of T&L is not changed.The optimised code should be faster after changed the size,but the result is reverse.
     
  8. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    Do you mean the optimized code with reduced tri size is slower than the unoptimized code with small tris, or slower than the optimized code with large tris?
     
  9. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    The result of my test is:the optimized code with big tris is faster than the unoptimized code with big tris,and the optimized code with small tris is slower than the unoptimized code with small tris.All tris could been seen in screen,no clip,no cull.
     
  10. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    Then ERP's explanation is perfectly possible.
     
  11. Hyp-X

    Hyp-X Irregular
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,170
    Likes Received:
    5
    Ok.

    Would you be so kind enough to tell us how the post-T&L vertex cache of the R300 operates?

    1. How many vertices can the cache hold? (I know that the DX9 optimization docs say to query the driver but the result is "not-supported".)

    2. Does the cache has a FIFO or LRU organization, or something else?

    3. Does the R300 benefiting from using optimized triangle strips instead of optimized triangle lists?

    4. Are the degenerate triangles are rejected with triangle setup rate, or are they rejected much faster by detecting repeated indices?

    5. Is the content of the cache preserved between DIP calls if there's no render state / VB changes between them?

    If you cannot give the answers (or get them), please say so, I'll try dev-support next.
    (But I think it would be better to publish the info - it's in ATI's interest!)
     
  12. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    I test the efficiency of degenerate triangles on R300,the result is:

    first 20000 normal triangles+ 396 degenerate triangles, 515 fps
    second 2 normal triangles+20394 degenerate triangles, 578 fps

    The result is disillusionary.
     
  13. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    When it comes to ATI hardware I only comment on things I have read on the public internet, I sleep better that way, so I'm afraid you'll have to try dev support unless one of the other ATI guys here can answer.

    I would observe that if a DX9 query says 'not supported' then you cannot assume anything.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...