optimized by post-T&L vertex cache

Discussion in 'Architecture and Products' started by ultrafly, Feb 20, 2003.

  1. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    hi.

    I use the method raised by Mike Abrash(<<Xbox Vertex Performance>>).But I find some strange things.

    I suppose the post-T&L wertex should be 11 wertexs in my R9500 pro 64M.

    First, after using 20,000 triangles,I got two results,optimized code is 255 fps, and no optimized code is 177 fps. But when I reduce the size of triangles(no change the numbers of triangles), I found the fps under no optimized code is faster than optimized code. why?

    Second,I found use D3DFILL_SOLID is faster than D3DFILL_WIREFRAME,why?

    I am sorry for my pool english.
     
  2. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    Solid generates fewer primitives than wireframe. For every triangle sent, in wireframe you get 3 lines.
     
  3. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    Why the no optimized code is faster than optimized code after changed the size?
     
  4. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    anyone can help me :?:
     
  5. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    my optimize method is:

    sample,suppose the post-T&L wertex should be 15 vertexs,
    the triangles:
    30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
    | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ |
    15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
    | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ | \ |
    0 1 2 3 4 5 6 7 8 9 10 11 12 13 14


    the index of the optimized code is:
    0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14 ,0, 0 ,15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 6, 21, 7, 22, 8, 23, 9, 24, 10, 25, 11, 26, 12, 27, 13, 28, 14, 29,29, 15, 15 ,30, 16...

    the index of the no optimized code is:
    0 ,15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 6, 21, 7, 22, 8, 23, 9, 24, 10, 25, 11, 26, 12, 27, 13, 28, 14, 29 ,29,15,15,30,16...
     
  6. Tagrineth

    Tagrineth SNAKES... ON A PLANE
    Veteran

    Joined:
    Feb 14, 2002
    Messages:
    2,512
    Likes Received:
    9
    Location:
    Sunny (boring) Florida
    I'm no software guru, but it looks like you're reusing a LOT of vertices. Why?

    The unoptimised is using half as many entries for the same data...
     
  7. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    The no-optimised order certainly isn't _bad_.

    I presume what you're trying to do with the optimal order is to fill in cache entries per line, then use them, then make sure the next line's LRU is updated so they don't get evicted. Problem is your line is too large; you've got twice as many vertices in your cache (if you check, you will see you need 30 vertices for the size you have).

    I'm not sure I'd recommend sending that many degenerate triangles (more than 50%). I'd stick with the 'no-optimised order' myself, with a shorter line repeat rate maybe.
     
  8. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    the cache is FIFO,not LRU

    after
    0, 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14 ,0

    the cache(15 vertexs) is:
    0,1,2,3,4,5,6,7,8,9,10,11,12,13,14

    after
    0 ,15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 6, 21, 7, 22, 8, 23, 9, 24, 10, 25, 11, 26, 12, 27, 13, 28, 14, 29

    the cache(15 vertexs) is:
    15,16,17,18,19,20,21,22,23,24,25,26,27,28,29

    after
    15,30,16,31,17,32,18,33,19,34,20,35,21,36,22,37,23,38,24,39,25,40,26,41,27,42,28,43,29,44

    the cache(15 vertexs) is:
    30,31,32,33,34,35,36,37,38,39,40,41,42,43,44

    .............................

    every vertex process once.

    No optimized code:

    after
    0 ,15, 1, 16, 2, 17, 3, 18, 4, 19, 5, 20, 6, 21, 7, 22, 8, 23, 9, 24, 10, 25, 11, 26, 12, 27, 13, 28, 14, 29

    the cache(15 vertexs) is:
    22,8,23,9,24,10,25,11,26,12,27,13,28,14,29

    when process the second line:
    15,30,16,31,17,32,18,33,19,34,20,35,21,36,22,37,23,38,24,39,25,40,26,41,27,42,28,43,29,44

    15:not in cache,reload and process again
    30:not in cache,reload and process again
    ...............

    one vertex process more then one times.
     
  9. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    Where do you get the information that the R9500 Pro cache is FIFO?
     
  10. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    I suppose.I couldn't find any information about the ATI's vertex cache.
    The NVIDIA GPU's vertex cache is FIFO,so i suppose the ATI's vertex cache is also FIFO.
     
  11. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    The ATI's vertex cache is FIFO or LRU? :shock: :shock: :shock:
     
  12. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    I wouldn't assume one way or the other... I must admit I don't know myself, maybe one of the other ATI chaps on here might?
     
  13. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    thanks for your reply. :idea:
     
  14. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    Probably a FIFO, as that should actually perform better than an LRU (at least according to Hoppe).

    I was a bit confused by this. I'm assuming that you are forming triangles but why aren't these ordered more like..... 0,1,15; 1, 16, 15; 1, 2, 16;... ?
     
  15. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    Why the no optimized code is faster than optimized code after changed the size in my project?

    thanks.
     
  16. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    Tristrips.

    I think the algorithm's valid, but requires a lot of assumptions about how the hardware works. Personally, I wouldn't use an algorithm of this kind - I'd take a few % inefficiency in exchange for portability, but what do I know? :)
     
  17. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    I think the method should be efficient.
    But i don't understand why get inverse result by reduce the size of triangles?
     
  18. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    But surely they don't even form tristrips. The vals given would make triangles (0,1,1) (1,1,2) (1,2,2)... etc!
     
  19. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    Yep. It uses degenerate triangles to pre-fill a FIFO style cache. Then the second row does the actual drawing, and pre-fills the second line of the cache.

    My uncertainty in it's efficiency is that I'm not sure that many degenerate triangles are at all a good idea.
     
  20. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    use degenerate triangles to fill post-T&L vertex cache by especial order of the vertexs.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...