RoOoBo said:
I don't understand how that works. If you don't have shaded vertices that can be rasterized into fragments you don't have fragments to shade. The priorization of fragments implies that there may be stalls when all the queued fragments have been processed and new vertices have to be shaded. It may take many cycles until the new vertices are assembled into triangles, the triangles rasterized and the new fragments reach the shader. If the vertex programs are long that stall may be hundreds of cycles long.
This kind of stall might happen but it's an infrequent situation. You want to optimise the most frequent situation though.
There is no additional work performed when vertices are prioritized as all those vertices have to be shaded. The rendering of a batch ends after the last fragment from the last triangle assembled from the last vertices are fully processed so there is no way you can finish the batch before the last vertices are shaded.
This policy would lead to two problems:
- the first is with complex vertices holding fragments to go through, starving the rest of the pipeline (this isnt such a big problem anyway)
- you very quickly and often fill up the pixel queues, by rasterizing lots of triangles and not processing their fragments, cause you are busy processing all vertices first; this is a stall and when one triangle is producing 20 pixels on average, this is something very much likely to happen and something you want to avoid; when the pixels queue is filled up, you must cancel all vertex processing, process some pixels, go back to vertices, fill up, stall, and so on
However I don't see how executing fragment first solves that problem as you still have to process vertices in those large groups and wait until the first vertices in the group generate new fragments.
I think this kind of stall is unavoidable but it's more desirable than starving the whole pipeline or having to flush vertices.
If the batch is vertex limited the fragment stages are going to be underutilised whatever you do and fragments will only execute when the vertex queues are full. If the fragment queues become full while the vertex queue is still being filled (unlikely as the fragment queues are quite larger) it won't matter as the render time will be determined by the number of vertex to shade and the temporal burst of fragments will be hidden by other vertices generating less fragments.
Yes, that's true, but batches tend to be more often fragment limited than vertex limited.
I think there's no perfect solution for every case, but this solution works best in the average case and it's easy to implement. As Dave pointed out, this is a high level view of things, the actual low level implementation might be slightly different and use some logic to prioritize vertices in some corner cases where it might make more sense.