Would (pseudo)analytic AA be practical in HW ?

no_way

Regular
just a random thought. Would area sampling AA be feasible to implement in HW at all ? I dont mean analytic area sampling ( even this might perhaps be doable with nowadays powerful FP units), but pretty simple coverage-mask implementation. For example, could such implementation be doable ( sorry im thinking in software developer terms, not a HW guru :p ):
For simplicity, we have a deferred renderer, so you have all your triangles binned/sorted for this tile already. I dont know how triangle setup units work in current hw, but im assuming you have floating-point eye-space X,Y,Z coordinates for every triangle.
Now take a pixel ( i mean final image pixel , not a "fragment" ) to be rendered.
Lets make a "zoomed-up" Z-buffer copy of that pixel into temp buffer, lets say of 16x16 grid. Also, lets have 256 8-bit counters allocated and reset, and 256 registers :eek: for storing triangle colour values.
Now start rasterizing triangles into that buffer in front-to-back order ( in HW this would probably need second triangle setup engine working in parallel with "global" one, for software implementation i'd still use the same subroutines ;) ). For first triangle, assign counter #0 and just write out depth values, increasing that counter for each subpixel written. For the next one take ctr #1 and render again, testing depth values. So, in the end you have bunch of counters indicating coverage of each triangle. One could also do early exit when all subpixels have been touched and all of next triangles Z values are smaller than previous one's.
Now for each visible triangle, run pixel shader once to get the colour value, and for final pixel write out coverage weighted colour of all visible tris ( or all 256 tris in HW, zero coverage just wouldnt contribute anything ).
In the end, it would work out to be some kind of coverage mask AA utilizing triangle rasterization engine for coverage calc.
In software, it should be pretty easy to implement and optimization would be pretty trivial, for hardware .. well .. i let the HW wizzes here tear it apart :p
But even this thing wouldnt work, would area sampling be doable with todays HW ? Im asking because area sampling would be "perfect" AA if we'd have single-coloured polygons, pixel shading makes things a bit more complicated of course.
 
With any type of AA, it only seems natural to me to go ahead and separate texture filtering from edge anti-aliasing (edge and texture AA are fundamentally different...different algorithms are more efficient for the two...). That is, if you do a coverage-mask-like technique, you could go ahead and calculate the texture color for that part of the mask at the center of the area (Shouldn't be too hard to find), just once.

Of course, such an AA approach does lead to natural problems with IMR's, but we can let hardware vendors cross that bridge when they come to it. I think we're a ways away yet from such a technique.
 
Chalnoth said:
With any type of AA, it only seems natural to me to go ahead and separate texture filtering from edge anti-aliasing (edge and texture AA are fundamentally different...different algorithms are more efficient for the two...). That is, if you do a coverage-mask-like technique, you could go ahead and calculate the texture color for that part of the mask at the center of the area (Shouldn't be too hard to find), just once.

Some of this is true (the part about efficiency), however obviously the problem is quite complex, and particularly the problem of separate texture anti-aliasing is more difficult than it at first appears -

1. SSAA (with edge and texture filtering effectively combined) should guarantee correct AA on edges and textures, regardless of the content of the shader program and associated textures, MSAA does not. This is because in SSAA implementations the filtering of the values takes place on colour values in screen space after all the sample values have been calculated appropriately in the shader.

Now consider MSAA where you are performing (bilinear/trilinear/anisotropic) filtering in the shader for one texture sample point (in your case above taken at the centroid of the covered samples). It doesn't necessarily give correct, or even near-correct, results for the value of the texture at that point, and by extension the remainder of the shader calculation. Depending on the error introduced at the sampling stage the final value produced may be completely wrong.

This can easily be seen if you consider a shader that is sampling a normal map - linear interpolation of normals is not correct, so an MSAA implementation with current filtering techniques would not give an appropriate value for the texture at the sampling point. By point sampling the normal map you can guarantee a correct normal, but also by extension no texture filtering. With more advanced sampling techniques and foreknowledge of the type of data you are manipulating you could potentially get around this problem by writing a better filter, but obviously not with MSAA implementations on legacy hardware (GF3/4 etc).

SSAA, by calculating the full shader for each point-sampled texture position and doing the filtering on the final calculated fragment color values from these points, gives a correct result with appropriate texture filtering.


2 - Sampling at the centroid may introduce artifacts.

As different sample points become covered the centroid 'pops' from place to place, so where you get two triangles joining along a boundary I think you would get a shimmer from the centroid mismatching between the adjacent triangles. Alternatively you can always sample at a center point that assumes that all samples are covered, but this can result in you sampling from a point that is 'outside' the triangle. This can also cause artifacts unless the geometry is textured with this condition in mind.
 
Yes, area sampling is feasible for HW, but not your 16x16 method. :eek: is an appropriate reaction for 256 8-bit registers and color values. More reasonable would be 4 or so registers and an overflow buffer since most pixels won't have anymore complexity than that.
 
3dcgi said:
:eek: is an appropriate reaction for 256 8-bit registers and color values.
Like i said, for me
Code:
LocalAlloc(0,sizeof(BYTE) * 256)
is no biggie 8).
How many gates does a 1-bit register take ? About 20 ? This would make about ~50k gates for registers only .. and 90% of them would sit idle most of the time.. waste indeed.
Edit: got the 20 gate figure from here . Dunno if those students use same techniques as NV or ATI :p[/url]
 
Back
Top