Drawing many moving objects with OpenGL

Your advice is correct.
Thanks for the confirmation, it made sense logically but I figured there was a chance newer input assemblers could have started buffering the last cacheline and multiplex from both of them for unaligned access's.
Cache line aligning your vertex data is always a good idea.
I was wondering is this common practice in the industry? My guess is the increased memory footprint gives pause as to whether to do it or not. More/more detailed assets vs faster rendering of said assets.
 
Thanks for the confirmation, it made sense logically but I figured there was a chance newer input assemblers could have started buffering the last cacheline and multiplex from both of them for unaligned access's.
Indexed geometry is the commom case. This means that ndices are linearly accessed, but vertices are not. A good vertex cache optimizer tries to order repeated accesses to a vertex so that the post transform cache (parameter cache) contains the vertex, meaning that the VS execution can be skipped. As vertices are grouped by triangles (connected to other triangles), this also does rather good job in ordering the vertex buffer memory accesses in a cache friendly manner (assuming the 3 triangle vertices are stored next to each other in memory). However, there are always some outliers vertices that need a new cache line load and that cache line is not accessed for a long time (causing it to evict before the other data is used). In this case it is highly preferable that this vertex does not cross a cache line boundary (it would touch two cache lines instead of just one).

Sometimes it is actually better to replicate some vertices in memory even when index buffering is used. You should piggyback an outlier vertex to a cache line that contains another outlier vertex that is not accessed simultaneously as the others in the same cache.
 
Indexed geometry is the commom case. This means that ndices are linearly accessed, but vertices are not.
I know, but the example of the more advanced hardware I was supposing might exist is way more complicated to explain in the case of indexed primitives. When you guys mentioned tipsify it made me remember reading that paper... every time I recall that paper I think of that snakey dragon model they used to perform their benchmarks. That model is what I used for thinking about more advanced vertex/triangle ordering for accelerated rendering with respect to overdraw.
 
Hi! For various reasons, this was put on the back burner for a while, but I'm currently deploying it on the machines for which it was developed. I don't have any real benchmarks for now (I might within a couple of weeks) but so far it seems to be performing within acceptable parameters, as Data would say. I'll keep you updated. :)
 
What video cards do the machines have? I'm still curious how come you didn't see a performance improvment with front to back rendering and was wondering if the video card/drivers have to do with that.
 
They have GF100-based Teslas if I remember correctly… but they're currently undergoing maintenance, so benchmarks may have to wait a bit more than expected.
 
Back
Top