That's a very interesting formulation.Nom De Guerre said:No, not with the latest graphics chips and not until we know where the vertices are between vertex shading and rasterization, since that is the stage (the "hole", if you like) where backface culling currently happens in the GPU.
Does this mean the vertex shaders do get executed, but as soon as the position is written (of all three vertices) the backface test is performed and the shader is interrupted? This suddenly makes a lot of sense! Reorder the vertex shader so writing the position happens as early as possible. The backface culling unit can then watch the vertex cache for triangles that have their position computed and stops every shader unit that's working on the vertices of a backfacing triangle (so it can do more useful stuff). This way the shader doesn't have to be splitted so there's no overhead caused by dependencies (requiring the same instructions in both parts) and no extra setup. So no worst case either compared to just brute force executing everything. A win-win situation, am I right?
I don't think this can be translated to software easily though. I could perform some check after writing the position to see whether to invoke the backface culling but that seems real hard to implement and does add overhead. Anyway, I now have some new insights in the problem and there are a few things I'm going to experiment with... thanks!