The order in which the triangles get rasterized

Rainsing

Newcomer
Hi there,

I am on a quest to improve our transparency sorting within a single mesh. The idea is to reorder the vertex indices at export time, so that some triangles always get rendered before others.

A precondition for this approach to work is, the GPU keeps the order in which the triangles are issued from CPU when it rasterize them.

Since modern GPUs are highly parallel hardware, I'm not sure if this holds true under all circumstances. I am by no means an export on GPU architectures, maybe the answer to this question is apparent... hmm, I don't know.
 
The GPU will maintain the order you specify if blending is enabled. Some GPUs might perform some reordering of opaque surfaces, but the resulting visibility must be correct.
 
Note that GPUs will maintain the order of the actual fixed function blending operations, but there are no guarantees about shader execution ordering. In particular, if you introduce side-effects with UAV/image writes from shader stages they will not be ordered.
 
Andrew is correct. The semantics of blending say the results have to be applied in order, not that the calculations need to be. And even at the top of the pipe, there is no promise of order. A tiled renderer will have a different *total* order than a simple forward renderer.
 
Thanks a lot guys! Your replies answered my question well. Then I guess there must exist some queuing mechanism in the output merger stage to guarantee the blend order?
 
Thanks a lot guys! Your replies answered my question well. Then I guess there must exist some queuing mechanism in the output merger stage to guarantee the blend order?
Easiest way is to make sure that only a single write to a particular pixel is outstanding at once. If a second write comes in, then the device could stall until the previous write is finished. Again, this is independent of the shader execution order.
 
Easiest way is to make sure that only a single write to a particular pixel is outstanding at once. If a second write comes in, then the device could stall until the previous write is finished. Again, this is independent of the shader execution order.
Yes, and going back to the first post in this thread this is how the most solid (especially in terms of image quality) order-independent transparency algorithms currently work. A first pass takes care of recording all fragments that contribute to a pixel via atomic ops, while a second full screen pass takes care of processing the full set of per-pixel fragments. The full screen pass maps one "thread" to each pixel and therefore data races are avoided. Unfortunately the first pass is what makes these algorithms require an amount of memory proportional to the number of transparent fragments on the screen. Most game developers prefer to know in advance how much memory a given algorithm requires :)

OIT algorithms that can re-use operations supported in the back-end/ROP of the GPU (e.g. stochastic transparency) can take advantage of proper ordering and don't necessarily require unbounded memory. On the other hand, to the best of my knowledge, the set of operations supported in the ROPs is too limited to design an OIT algorithm that is fast, requires a fixed amount of memory and provides very good image quality, all at the same time.
 
Back
Top