Nvidia,pre-T&L or post-T&L Cache?

Can it even be post-T&L considering vertex shaders and such, and that a model probably won't look the same from one frame to the next?

*G*
 
The cache is post T&L. Its for skipping vertices that are shared between triangles, not for remembering vertices from the last frame.

Its relatively small as shown in the previous post, which is why fans and strips are such good practice in speeding up the geometry. They re-use a lot of vertices, and if your strip zig-zags are short enough you can get some triangles for much cheaper .
 
RussSchultz said:
The cache is post T&L. Its for skipping vertices that are shared between triangles, not for remembering vertices from the last frame.

Its relatively small as shown in the previous post, which is why fans and strips are such good practice in speeding up the geometry. They re-use a lot of vertices, and if your strip zig-zags are short enough you can get some triangles for much cheaper .

Your means the NVIDIA GPU has no pre-T&L cache?
 
There is probabaly a pre-TnL input "buffer" but not a true cache (matches the data fetches, you do a burst data fetch which contains possibly multiple vertices or parts of vertex data), post TnL cache makes most sense as explained due to one vertex being used by a lot of triangles. By placing this post-TnL the costly vertex processing is only done once, if you place the cache at the front-end you'd still be stuck doing the vertex processing multiple times.

K-
 
Another important question:

Whether NV2x need indexed primitives to use its vertex cache or not?

thx
 
I don't know for sure, but it does make some amount of sense.

Indexed primitives definitively show that you're reusing vertex[100] (for example).

Simply duplicating the vertex's value would require costly compares which may be an even tradeoff or worse for determining if that vertex was identical.
 
Zephyr said:
Another important question:

Whether NV2x need indexed primitives to use its vertex cache or not?

thx
Either that or they need to be part of the same primitive (strip, fan)
 
I have got contradicatory answers about it.

DX8.1SDK:

1, Use triangle strips instead of lists and fans. For optimal vertex cache performance, arrange strips to reuse triangle vertices sooner, rather than later.

2, Draw using indexed primitives. This may allow for more efficient vertex caching within hardware.

However,

"Vertex caches are only available when using indexing!" can be found in many places in developer.nvidia.com.

That is to say the implementation of vertex cache in NV2x is not a true transparent VC, right? If so, I think it is not a smart choice.
 
I don't see any contradiction at all. The DX documentation is general hints for all hardware.

NVIDIA is only talking about their hardware. Everybody elses may or may not require indexed vertices.

About it being a smart choice, ponder whether or not you can effectively (and cost effectively) determine if a vertex is identical to one in the cache.

I don't know for sure, but one can imagine that comparing an index number is much cheaper than comparing all the parameters of the pretransformed index for indentity.
 
That is to say the implementation of vertex cache in NV2x is not a true transparent VC, right? If so, I think it is not a smart choice.
I don't see that what you are asking for is practical.

IIRC with strips or fans you simply pack the vertices contiguously and, with the exception of the first two vertices, each vertex implicitly defines a triangle. By default you get N/(N+2) triangles per vertex for a strip/fan with N vertices.

Obviously the strip/fan scheme on its own can't achieve better than 1 Tris/Vert, whereas indexed triangles typically can do much better, say around 2 Tris/Vert.

The question is, "How would you make a cache scheme work with raw strip/fan data?". You'd have to compare every incoming vertex (which is a lot of data) with the original vertex data of all the verts you have in the cache in order to identify a match. Frankly, that would be silly waste of silicon when all the developer has to do is used indexed triangle lists - the HW then only has to compare indices.
 
RussSchultz said:
About it being a smart choice, ponder whether or not you can effectively (and cost effectively) determine if a vertex is identical to one in the cache.

I don't know for sure, but one can imagine that comparing an index number is much cheaper than comparing all the parameters of the pretransformed index for indentity.

Sure, an implementation using indexed primitives is cheeper, but old games or applications, even if they used a good meshing data, also cannot get any benefits from the vertex cache in NV2x.
 
Zephyr said:
Sure, an implementation using indexed primitives is cheeper, but old games or applications, even if they used a good meshing data, also cannot get any benefits from the vertex cache in NV2x.
To get a good tri/vert ratio from strips/fans you need to have long strips which could imply that, for a decent sized model, a particular vertex won't reappear until quite a number of vertices have passed through the system.

The "cache" on the chips is typically quite small and so in these situations it's unlikely to help much. Note that mesh models usually have to undergo some processing to get the order 'optimal'. For example, a recently published paper used the equivalent of a space-filling curve to re-order the polys which gave good results irrespective of the cache size. (OTOH I think one the IHV's tools for mesh optimisation is tuned to a specific cache size/behaviour).
 
Simon F said:
That is to say the implementation of vertex cache in NV2x is not a true transparent VC, right? If so, I think it is not a smart choice.
I don't see that what you are asking for is practical.

IIRC with strips or fans you simply pack the vertices contiguously and, with the exception of the first two vertices, each vertex implicitly defines a triangle. By default you get N/(N+2) triangles per vertex for a strip/fan with N vertices.

Obviously the strip/fan scheme on its own can't achieve better than 1 Tris/Vert, whereas indexed triangles typically can do much better, say around 2 Tris/Vert.

The question is, "How would you make a cache scheme work with raw strip/fan data?". You'd have to compare every incoming vertex (which is a lot of data) with the original vertex data of all the verts you have in the cache in order to identify a match. Frankly, that would be silly waste of silicon when all the developer has to do is used indexed triangle lists - the HW then only has to compare indices.

Strip and fan, especially strip, are the key to save both memory size and transfer size. About index, it only save memory size and even enlarge transfer size. If using well mashed strips, both indexed vertices and non-indexed vertices can get same vertex cache hit rates if vertex cache doesn't need indexed primitives.

Of course, I admit that the cost of vertex cache implementation supporting non-indexed primitives is higher, but it is a true transparent implementation.
 
Simon F said:
Zephyr said:
Sure, an implementation using indexed primitives is cheeper, but old games or applications, even if they used a good meshing data, also cannot get any benefits from the vertex cache in NV2x.
To get a good tri/vert ratio from strips/fans you need to have long strips which could imply that, for a decent sized model, a particular vertex won't reappear until quite a number of vertices have passed through the system.

The "cache" on the chips is typically quite small and so in these situations it's unlikely to help much. Note that mesh models usually have to undergo some processing to get the order 'optimal'. For example, a recently published paper used the equivalent of a space-filling curve to re-order the polys which gave good results irrespective of the cache size. (OTOH I think one the IHV's tools for mesh optimisation is tuned to a specific cache size/behaviour).

I think 24 elements (18 effectively) in vertex cache of NV2x also can do a little help, even not so big, for the geomatric throughput.

And yes, NVTriStrip v1.1 is a such tool.
 
MDolenc said:
No IHV does that (prove me wrong if anyone does that). It's just way to far from being practical.

I just want to get a "true" confirmation that vertex cache in NV2x does need indexed primitives and cannot work with non-indexed primitives!
 
Zephyr said:
Strip and fan, especially strip, are the key to save both memory size and transfer size. About index, it only save memory size and even enlarge transfer size. If using well mashed strips, both indexed vertices and non-indexed vertices can get same vertex cache hit rates if vertex cache doesn't need indexed primitives..
What you are forgetting is that an indexed system with cache can greatly reduce the transfer of data into the chip and, as we all should know, bandwidth is a valued commodity. Your proposed scheme would not have this benefit.

IMHO there'd be a < 5% chance of a "compare vertex data for cache matches" unit being present in graphics hardware.
 
It has both a pre and post transform cache.

I am unclear on exactly how the pretransform cache works, I'd assume it's primary job is collecting vertex attributes from the input streams, although it may actually behave as a more conventional cache.

The post transform cache has no effect on none indexed vertices, and as SimonF says I would be surprised if any hardware did any different.
 
Back
Top