LeStoffer,
- all you need to store per primitive is bounding box info, pixel shader ID, vertex shader ID, and a reference to the input data set. So, the amount of bandwidth/space required for scene capture is negligeable compared to storing post-transform vertices (which I think is how PowerVR operates).
- sorting primitives is efficient because there are orders of magnitude less primitives than vertices.
- The sort presents primitives in somewhat-optimal order (front to back) for a heirarchical-Z rendering system, i.e. a fair amount of occluded pixels won't be shaded.
- Since the first pass is only concerned with determining primitive position, the vertex shaders need not compute texture coordinates, vertex normals, or lighting information. In pass2, bounding box info is available per primitive, so it can be used to cull entire primitives at a time. For culled primitives, texture coordinates, normals, lighting, etc... will never be computed. If you are storing post-transform vertices, then the full vertex shader must be run for every vertex in the scene, even if the vertex does not belong to any visible triangles.
AFAICS the geometry read bandwidth for this scheme and storing post-transform vertices is similar. But it doesn't write very much scene data at all, and limits full vertex shading to vertices "likely" to be visible.
I guess that instead of using a HZ buffer, you could tile the primitives and proceed a la PowerVR (saving even more frame z/buffer bandwidth). I think MfA proposed something like this with heirarchical, program specifiable bounding boxes a while ago.
Edit: Inspirer - what I'm suggesting is this :
(Pass 1)
For each primitive (all vertices processed in a given GPU state) :
For first vertex :
- run vertex program, obtain screenspace (x,y,z). Set Min = Max = (x,y,z)
For each following vertex :
- run vertex program, obtain screenspace (x,y,z).
Minx = min(Minx, x);
Miny = min(Miny, y);
Minz = min(Minz, z);
similarly for Max
after the last vertex in the primitive, Min,Max is your bounding box.
Output bounding box and state information for each primitive.