Nvidia,pre-T&L or post-T&L Cache?

Discussion in 'Architecture and Products' started by ultrafly, Oct 9, 2002.

  1. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    NV25's Vertex Cache is pre-T&L or post-T&L cache?

    Thx.
     
  2. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,172
    Location:
    La-la land
    Can it even be post-T&L considering vertex shaders and such, and that a model probably won't look the same from one frame to the next?

    *G*
     
  3. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
  4. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    The cache is post T&L. Its for skipping vertices that are shared between triangles, not for remembering vertices from the last frame.

    Its relatively small as shown in the previous post, which is why fans and strips are such good practice in speeding up the geometry. They re-use a lot of vertices, and if your strip zig-zags are short enough you can get some triangles for much cheaper .
     
  5. ultrafly

    Newcomer

    Joined:
    Oct 9, 2002
    Messages:
    56
    Likes Received:
    0
    Location:
    ShenZhen,China
    Your means the NVIDIA GPU has no pre-T&L cache?
     
  6. Kristof

    Regular Alpha

    Joined:
    Jan 30, 2002
    Messages:
    733
    Likes Received:
    1
    Location:
    Abbots Langley
    There is probabaly a pre-TnL input "buffer" but not a true cache (matches the data fetches, you do a burst data fetch which contains possibly multiple vertices or parts of vertex data), post TnL cache makes most sense as explained due to one vertex being used by a lot of triangles. By placing this post-TnL the costly vertex processing is only done once, if you place the cache at the front-end you'd still be stuck doing the vertex processing multiple times.

    K-
     
  7. Zephyr

    Newcomer

    Joined:
    Aug 18, 2002
    Messages:
    74
    Likes Received:
    0
    Another important question:

    Whether NV2x need indexed primitives to use its vertex cache or not?

    thx
     
  8. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    I don't know for sure, but it does make some amount of sense.

    Indexed primitives definitively show that you're reusing vertex[100] (for example).

    Simply duplicating the vertex's value would require costly compares which may be an even tradeoff or worse for determining if that vertex was identical.
     
  9. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,298
    Likes Received:
    137
    Location:
    On the path to wisdom
    Either that or they need to be part of the same primitive (strip, fan)
     
  10. Zephyr

    Newcomer

    Joined:
    Aug 18, 2002
    Messages:
    74
    Likes Received:
    0
    I have got contradicatory answers about it.

    DX8.1SDK:

    1, Use triangle strips instead of lists and fans. For optimal vertex cache performance, arrange strips to reuse triangle vertices sooner, rather than later.

    2, Draw using indexed primitives. This may allow for more efficient vertex caching within hardware.

    However,

    "Vertex caches are only available when using indexing!" can be found in many places in developer.nvidia.com.

    That is to say the implementation of vertex cache in NV2x is not a true transparent VC, right? If so, I think it is not a smart choice.
     
  11. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    I don't see any contradiction at all. The DX documentation is general hints for all hardware.

    NVIDIA is only talking about their hardware. Everybody elses may or may not require indexed vertices.

    About it being a smart choice, ponder whether or not you can effectively (and cost effectively) determine if a vertex is identical to one in the cache.

    I don't know for sure, but one can imagine that comparing an index number is much cheaper than comparing all the parameters of the pretransformed index for indentity.
     
  12. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    I don't see that what you are asking for is practical.

    IIRC with strips or fans you simply pack the vertices contiguously and, with the exception of the first two vertices, each vertex implicitly defines a triangle. By default you get N/(N+2) triangles per vertex for a strip/fan with N vertices.

    Obviously the strip/fan scheme on its own can't achieve better than 1 Tris/Vert, whereas indexed triangles typically can do much better, say around 2 Tris/Vert.

    The question is, "How would you make a cache scheme work with raw strip/fan data?". You'd have to compare every incoming vertex (which is a lot of data) with the original vertex data of all the verts you have in the cache in order to identify a match. Frankly, that would be silly waste of silicon when all the developer has to do is used indexed triangle lists - the HW then only has to compare indices.
     
  13. Zephyr

    Newcomer

    Joined:
    Aug 18, 2002
    Messages:
    74
    Likes Received:
    0
    Sure, an implementation using indexed primitives is cheeper, but old games or applications, even if they used a good meshing data, also cannot get any benefits from the vertex cache in NV2x.
     
  14. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    To get a good tri/vert ratio from strips/fans you need to have long strips which could imply that, for a decent sized model, a particular vertex won't reappear until quite a number of vertices have passed through the system.

    The "cache" on the chips is typically quite small and so in these situations it's unlikely to help much. Note that mesh models usually have to undergo some processing to get the order 'optimal'. For example, a recently published paper used the equivalent of a space-filling curve to re-order the polys which gave good results irrespective of the cache size. (OTOH I think one the IHV's tools for mesh optimisation is tuned to a specific cache size/behaviour).
     
  15. Zephyr

    Newcomer

    Joined:
    Aug 18, 2002
    Messages:
    74
    Likes Received:
    0
    Strip and fan, especially strip, are the key to save both memory size and transfer size. About index, it only save memory size and even enlarge transfer size. If using well mashed strips, both indexed vertices and non-indexed vertices can get same vertex cache hit rates if vertex cache doesn't need indexed primitives.

    Of course, I admit that the cost of vertex cache implementation supporting non-indexed primitives is higher, but it is a true transparent implementation.
     
  16. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    No IHV does that (prove me wrong if anyone does that). It's just way to far from being practical.
     
  17. Zephyr

    Newcomer

    Joined:
    Aug 18, 2002
    Messages:
    74
    Likes Received:
    0
    I think 24 elements (18 effectively) in vertex cache of NV2x also can do a little help, even not so big, for the geomatric throughput.

    And yes, NVTriStrip v1.1 is a such tool.
     
  18. Zephyr

    Newcomer

    Joined:
    Aug 18, 2002
    Messages:
    74
    Likes Received:
    0
    I just want to get a "true" confirmation that vertex cache in NV2x does need indexed primitives and cannot work with non-indexed primitives!
     
  19. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    What you are forgetting is that an indexed system with cache can greatly reduce the transfer of data into the chip and, as we all should know, bandwidth is a valued commodity. Your proposed scheme would not have this benefit.

    IMHO there'd be a < 5% chance of a "compare vertex data for cache matches" unit being present in graphics hardware.
     
  20. ERP

    ERP Moderator
    Moderator Veteran

    Joined:
    Feb 11, 2002
    Messages:
    3,669
    Likes Received:
    49
    Location:
    Redmond, WA
    It has both a pre and post transform cache.

    I am unclear on exactly how the pretransform cache works, I'd assume it's primary job is collecting vertex attributes from the input streams, although it may actually behave as a more conventional cache.

    The post transform cache has no effect on none indexed vertices, and as SimonF says I would be surprised if any hardware did any different.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...