GRIN, somehow I find this funny :
The delay stream does not capture the entire scene. Our tests suggest that a sliding window of 50K–150K triangles is sufficient to reduce the depth complexity close to the theoretical optimum.
Well DUH, after culling and clipping most scenes do not even contain 150K triangles so when hitting the theoretical optimum they are probably storing the entire scene
Occlusion information cannot be generated for primitives that use per-pixel rejection tests, e.g., alpha or stencil tests, until these tests have been executed. Furthermore, occlusion tests cannot be applied for primitives that modify their depth values in the pixel shader or update the stencil buffer when the depth test fails.
Hmm, fails with stencil tests where have we heard that before...
Our LRZ-buffer uses 8x8 pixel tiles. Each tile requires 32 bits of video memory as the minimum and maximum depth values are stored as 16-bit1 floating point numbers. Both of the values are needed to allow changing the depth comparison mode during the frame. The LRZ-buffer is stored in video memory and accessed through a single-ported 32KB on-chip cache, split into eight banks to facilitate multiple simultaneous accesses. Paging to video memory is performed with a granularity of 256 bytes. Each depth buffer has its own separate LRZ-buffer.
This is IMHO the interesting bit. They use Depth Areas of 8x8 pixels for which they store Depth Values with limited accuracy. Now what this means is that "before" you can update one of these depth values you need to have a triangle that covers at least a complete 8x8 tile. With other words if your occluding geometry is finely tesselated it can not occlude because the area each occluder covers is smaller than the tiles and hence the info can not be updated (if only 50% of a tile has a close by triangle you can not change the depth values stored for that tile area). They fix this using a tile cache which actually stores full details Z values, but only for a limited area so they assume that there is enough spacial locality for this to work. If a detail tile gets flushed out of memory they only store the near/far values not all the detail to save bandwidth.
Transparent surfaces were excluded from all scenes in order to make the optimal depth complexity exactly 1.0.
Erhm ? How exactly should I understand that, did they ignore all transparent polygons completely and take them out of the stream or did they maintain them in the stream ? For a scene such a 3DMark2003 a lot of the follow on passes are transparent so they would have a massive impact on the results, also this scene uses stencil tests about which they said it made the system fail, did they also ignore this ? 36K tris is also not very representative for a 3DM03 scene IMHO, did they specifically select this frame because it happened to work ? Notice how all frames contain a nice big close to the camera occluder (car, dragon, character) obviously in those cases your system is going to work, how about a more random scene ?
PowerVR [2000] captures the geometry of the entire frame and uses tile-based rendering with internal buffers. Occlusion culling is implemented by using on-chip sorting and thus the culling efficiency is not affected by the order of the input primitives. The majority of bandwidth problems are avoided but the limited amount of video memory makes capturing large scenes impractical.
Since when has the amount of memory been a problem now that we have cards with 256MB (even 512MB) of memory ?
Also in their whole section about "Order-independent transparency" they completely fail to mention that PowerVR delivered this functionality in a mass market console product known as DreamCast...
The fundamental nature of order-independent transparency is that
all visible data must be collected before the processing can begin.
This is exactly what a sufficiently long delay stream does.
So do they store the whole scene geometry or not ? They said this was impractical before...
I didn't read the AA section in details but :
Our approach is to conservatively assume that all triangle edges are discontinuity edges and to use a simple geometric hashing algorithm for detecting shared non-sharp edges (Figure 6).
This seems to imply they only consider triangle edges for AA similar to the Matrox approach meaning that triangle intersections will not get AA which people have concluded is not acceptable ?
Also for villagemark they end up with an overdraw of 1.18 for a scene with all transparency removed, this means that they are still processing 18% too many pixels. I mean they say villagemark is 1K polies, that should fit in thei delay stream completely so the result should have been perfect ? Why do they still process 18% too many polygons in such a trivial case ?
Their "compression" seems pretty basic, seems like they have a window of 4 vertices and if you happen to re-use the same vertex a coupe of times in quick succession they will use an index rather than re-store the vertex data... not exactly an impressive compression scheme - its actually similar to a very very small vertex cache at the back-end of your vertex shader. They then say how a vertex format is usualy small anyway, yeah right... with people starting to store full float values in texture coordinates that argumentation goes right out of the water when using complex shaders.
All just IMHO...
K-