Curious so what's the possibility of a polygon sorter between the pixel pipelines and the T&L & Vertex Shaders. Basically something simple where sorting just based on the Z Depth (and current Z Depth Test of course). Reason I'm saying this is it seems to me there is going to be a longer and longer wait for the pixel pipelines to finish rendering a polygon while the vertex shaders can easily pipe them through faster into a buffer waiting to be rendered.
Obviously can only sort with display states and can't sort alpha blended polygons or sort other polygons which depend on the destination pixel. Also, the way polygons are sent to the renderer might make such gains nil (or cause losses instead if the pipeline has to stall while waiting for sorting to finish though it should just grab whats on top). Though it seems if the Vertex Shaders are fast enough can build up a large enough buffer of polygons that hopefully can be sorted of course this won't work well if lots of state changes are executed.
Obviously can only sort with display states and can't sort alpha blended polygons or sort other polygons which depend on the destination pixel. Also, the way polygons are sent to the renderer might make such gains nil (or cause losses instead if the pipeline has to stall while waiting for sorting to finish though it should just grab whats on top). Though it seems if the Vertex Shaders are fast enough can build up a large enough buffer of polygons that hopefully can be sorted of course this won't work well if lots of state changes are executed.