Just how impractical is an analytic coverage calculation assuming that it would provide acceptable AA results for geometry edges (ignoring all the discussion about "proper" AA and whether box coverage, sinc, or whatever gives the ideal result... I assume analytic pixel/box coverage would give comparable results to high sample MSAA anyway)?
I think this is what Humus was suggesting. Or something similar...
I would think an early thing to get out of the way would be to avoid calculating coverage on a screen full of pixels that are nowhere near the primitive edge. Can that be accomplished with a lookup table using X,Y intercepts of the edge projection, or perhaps a comparison of coordinate values along the primitive edge to a central pixel coordinate lookup table, to eliminate pixels that aren't even close to being involved?
Currently the rasteriser solves this problem with dedicated hardware - for each fragment it generates it determines if there is an edge involved and if so, produces mask information based upon the geometry sampling points. So, yeah, this is done "early", before the ROPs even see the fragment and have to think about AA.
Second, how expensive is the actual calculation of coverage inside a box relative to other modern GPU operations? I'm thinking that finding two intercepts in the box region and calculating the area of the triangle or trapezoid created is pretty straightforward, but is this more calculation than, say, a typical shader operation on a single pixel?
What you're talking about is analogous to viewport clipping (well that doesn't calculate area), but now the viewport is a single pixel on the screen - a process that is performed by dedicated hardware. I guess it would be phenomenally expensive because each triangle can cover hundreds of screen pixels.
Third, what other penalties go with that kind of approach?
Unfortunately in this thread we haven't got as far as quantifying the error of 4xMSAA or 8xMSAA on poly edges. When does MSAA get good enough? What's the margin between still and moving images like.
Anyway, you would need to come up with a way to record "triangle area" per screen pixel in the render target (i.e. in memory), collate each triangle against Z (or a range of Z! since if you're measuring the area within a pixel you have to take account of how depth varies across the fragment). Then the ROPs would need to be able to "average" over your arbitrary number of fragments, clipping the fragments against each other according to Z (which is sort of a viewport clipping problem, all over again). You could limit the number of triangles (fragments) per pixel - e.g. 8, to make things less ruinous.
Overall, it sounds to me like the death of all the "efficiencies" gained with Z-buffer based rendering.
I'm about as layman as it gets, but I would guess this would be done early in the pipeline, after triangle setup and Z-rejection? Would this result in another value being tagged to each pixel (texel? another term? you could have multiple render-pixels for a single screen pixel at this point, each with unique color/Z values corresponding to the primitive with which is associated).
"Fragment" (teehee, I actually forgot that word in the first version of this reply, so went back and re-worded for clarity - hope it's clear).
Seems this would add to bandwidth usage as you need to carry these values along with Z and color? And would that alter the way early Z-rejection is done (rejecting only pixels with full occulusion, so would coverage calculation be done before or after Z-rejection?)
The bandwidth cost would seem to be pretty hideous. I suppose you could approximate things. If you're going to limit the number of triangles per pixel, you might also limit the number of clipping coordinates to around the four sides of the pixel. Say 8 per side, triangle A has its vertices at points 5, 8 and 13, with Z of 0.5, 0.5 and 0.5001, B has its vertices at 4, 7 and 21 with Z of 0.5, 0.5 and 0.5001.
Code:
[COLOR=black]32------1------2------3------4------5------6------7------8[/COLOR]
[COLOR=black]| bbbbbbbbbaaaaaaaaaaaaaaaaaaa|[/COLOR]
[COLOR=black]31 bbbbbbbbbbbbaaaaaaaaaaaaaaaa9[/COLOR]
[COLOR=black]| bbbbbbbbbbbbbbbaaaaaaaaaaaaaa|[/COLOR]
[COLOR=black]30 bbbbbbbbbbbbbbbbbaaaaaaaaaaa10[/COLOR]
[COLOR=black]| bbbbbbbbbbbbbbbbb aaaaaaaaaaa|[/COLOR]
[COLOR=black]29 bbbbbbbbbbbbbbb aaaaaaaa11[/COLOR]
[COLOR=black]| bbbbbbbbbbbbbb aaaaaaa|[/COLOR]
[COLOR=black]28 bbbbbbbbbbbb aaaa12[/COLOR]
[COLOR=black]| bbbbbbbbbbb aaa|[/COLOR]
[COLOR=black]27 bbbbbbbbbb 13[/COLOR]
[COLOR=black]| bbbbbbbb |[/COLOR]
[COLOR=black]26 bbbbbbb 14[/COLOR]
[COLOR=black]| bbbbb |[/COLOR]
[COLOR=black]25 bbbb 15[/COLOR]
[COLOR=black]| bb |[/COLOR]
[COLOR=black]24-----23-----22-----21-----20-----19-----18-----17-----16[/COLOR]
Looks like fun
Oh, and would things like normal mapped true geometry tesselation screw up any such approach?
Tessellation is geometry anyway, so that would just be geometry. Normal mapping is a texture-filtering/shader-antialiasing problem.
I'm guessing the reply will be "more expensive than MSAA." But wouldn't it equal essentially infinite sample AA once the calculation is done? How does it compare to 16X or better AA? And I would think hardware acceleration would certainly be possible - a fixed unit more or less that is tagged onto any architecture.
In the race to "good-enough" real time AA, MSAA looks like it's destined always to win!
I'd really like to see AA quality quantified, for moving objects. The HQV video testing software has some "objective" tests for the visual quality of digital video replay - surely the 3D industry could stand to have some similar tools...
Jawed