Anti-Aliasing... so far

arjan de lumens said:
As for Mali: the 16x AA mode in Mali110/55 currently only uses an 8x8 grid; Mali200 is 16-queen. All Mali cores support "transparency antialiasing" (SS and MS variants) too and have been doing so for a long time.

Thank you for the more detailed explanation. Now that above is highly interesting (especially the adaptive AA part).

Is that 8*8 grid being achieved by rotating two times 4x samples by totally different (fixed) angles?
 
MfA said:
I disagree.

If a fragment which covers the footprint of a MSAA sample gets bumped off the list you can simply add it's area to the owner of that MSAA sample (ie. the surface which actually covers the centroid).
I can't quite figure out what you are saying, but it appears to be one of:
  • If a fragment is kicked off the list, assume that the fragment doing the kicking covers the same area. For N slots, this fails when the vertex contains a vertex of valence N+1, as in this case you have N+1 nonoverlapping polygons geometrically covering the pixel - there is no a priori reason why these polygons won't demand 1 sample each.
  • If a fragment is kicked off the list, assume that its samples can be reassigned to the surface owning the pixel center. This has horrible invariance problems if you ever submit the same geometry twice (which AFAIK quite many games do).
  • Use magic to determine which samples are covered by which polygon??
At worst you will just get back to the situation where the owners of the MSAA sample have complete coverage within that sample, and no coverage outside it ... which effectively will give you the same result as MSAA. At best not every MSAA sample is covered and/or there are not enough visible surfaces within the pixel to have reduced the accuracy of the fragment coverage information ... and your edge anti-aliasing accuracy is determined by the accuracy of the coverage masks.
The situations where you need to track the largest numbers of visible surfaces for a pixel (and, by extension, where you would discard data for lossy-MSAA schemes) are usually pixels that contain mesh vertices. These are often not anywhere close to a visible edge in the final picture, but throwing away ANY visibility/color information for such a pixel is very likely to cause the pixel to be rendered incorrectly in a very visible manner - I have described earlier in this thread how discarding data in two different ways cause two classes of visible rendering errors.

MSAA-with-compression-schemes can give very good antialiasing of otherwise horribly aliased edges, but they all too often do so at the cost of rendering errors for pixels that are NOT near obvious visible edges.

Performance-wise, any lossy MSAA compression scheme also has moderate serialization problems beyond those of lossless schemes: Every time you add a fragment to a pixel, you need to decompress the pixel, modify the pixel with the new fragment data, then re-compress the pixel - you CANNOT cache the decompressed data (if you try, you fail invariance: caching behaviour is nondeterminstic and rendering two frames with identical render states+geometry would cause the rendering result to change randomly depending on cache behavior). If you use no compression at all, then if two polygons cover different samples of the same pixel, you can obviously process them in parallel or even out of rasterization order. If you use lossless compression, you can cache decompressed data and still work on the data in the cache in parallel.

Lossy MSAA compression schemes also often have serious trouble maintaining exact correct Z values per sample, which normally causes problems of its own.
  • Plain coverage-bitmap based methods cannot antialias the edge between two intersecting polygons.
  • Z3 with its slightly-inaccurate per-sample Z values fails miserably with geometry that is submitted twice unless you are extremely careful (for each sample that you wish to Z-test, adjust the source Z value used for the test so that it exactly matches what it would be if you packed it with Z3, then depacked it again - only then can you do a somewhat-safe test)
  • slot-limited methods usually get a somewhat-wrong Z value for some samples when you run out of slots - this is the main reason why gemeotry-submitted-twice fails so often with these methods.
Toggling between multisampling and supersampling within a frame (which is necessary for reasonable-quality "Transparency AA" at reasonable performance) is also usually hard to integrate into a lossy-MSAA scheme.
 
arjan de lumens said:
I can't quite figure out what you are saying, but it appears to be one of:
  • If a fragment is kicked off the list, assume that the fragment doing the kicking covers the same area.
  • The smallest fragment which doesn't cover a MSAA sample centroid gets kicked off the list (you have just as many slots as MSAA samples).

    [*]If a fragment is kicked off the list, assume that its samples can be reassigned to the surface owning the pixel center. This has horrible invariance problems if you ever submit the same geometry twice (which AFAIK quite many games do).
    It's coverage mask within the MSAA sample footprints gets ORed with the coverage mask of the fragments which own the MSAA samples (if present). The impact of fragments on the final color of the pixel might be larger than they should, but the impact using only MSAA would be larger still (it has defacto full coverage of the MSAA sample footprint). The error is bounded, it cannot get larger than with only MSAA.
    [*]Use magic to determine which samples are covered by which polygon??
    That's what Z-checks are for (you can store slope information per fragment for extra precision, but even with only a plain Z value you can guarantuee it won't get worse than plain MSAA).

    I wouldnt call it a lossy MSAA scheme, it's a losslessly compressed MSAA scheme which stores conservative fragment coverage information for each MSAA sample and which uses left over space when not all MSAA sample centroids are covered by different surfaces to store extra fragments.
 
Last edited by a moderator:
MfA said:
I wouldnt call it a lossy MSAA scheme, it's a losslessly compressed MSAA scheme which stores conservative fragment coverage information for each MSAA sample and which uses left over space when not all MSAA sample centroids are covered by different surfaces to store extra fragments.
So, let me see if I can get this straight: you store 1 color/z per MSAA sample (or sample "centroid" as you call it) the way one would usually do, then compress the standard MSAA data for the pixel losslessly, then try to use the remaining per-pixel space freed by that compression to store "extra" coverage/edge/fragment information. That "extra" information would then seem to me to be a lossily-compressed add-on to the initial lossless MSAA data and thus subject to the usual problems with lossy-MSAA if you try to actually use it for anything.
 
arjan de lumens said:
So, let me see if I can get this straight: you store 1 color/z per MSAA sample (or sample "centroid" as you call it) the way one would usually do
The centroid is not a sample as such, it's just the sampling point which if covered guarantuees the surface covering it gets a fragment slot (ie. the same as with MSAA, if the sampling point is covered it gets stored ... with plan MSAA you simply have implicit full coverage of the sample footprint).
then try to use the remaining per-pixel space freed by that compression to store "extra" coverage
never said coverage information would only be stored if room was saved by compression, in fact I stated the exact opposite. "which stores conservative fragment coverage information for each MSAA sample"

Threre is compression because each fragment can cover multiple subsamples, but all in all you still need to use a little more storage for the coverage information.

Some pseudo code ...

newFragment arrives :

foreach fragment in fragments HSR($fragment, newFragment)
AddFragment(fragments, newFragment);
CullFragments(fragments, culledFragments);

//add the coverage of culled fragments within subsamples to the coverage of the fragments covering the centroids of those subsamples, we assume they both belong to the same continuously textured object ... might not be right, but we won't be more wrong than with MSAA
foreach fragment in culledFragments foreach subsample in SubsamplesCovered($fragment) AddCoverage(SubsampleOwner($subsample), Coverage($fragment) & SubsampleFootprint($subsample));

I guess you would need yet another extra flag per fragment, to indicate whether it's a pure fragment or a composite fragment ... when a subsample owned by a composite fragment get covered the part of the composite fragment within that subsample's footprint should be added to the new owner, no new fragments should be allowed to be created from applying HSR to composite fragments.
 
Last edited by a moderator:
Hmm, I'm still trying to figure your scheme out - correct me if I am wrong:
  • You have N*M sub-samples in total for a pixel, divided into N groups, each of which has M sub-samples - one of which is designated as the "centroid" sample of the group.
  • You have N "slots" per pixel, each containing one fragment with a color/z value and an N*M-bit bitmap indicating which of the samples in the pixel are covered with that particular color/z value.
  • If you are adding a fragment to a pixel and this causes the total number of fragments in the pixel to exceed N, then you pick one fragment based on some criterion (coverage or whatever) and kick it out (with the caveat that a fragment owning a "centroid" sample is never kicked out).
Nope, still lossy.

The samples that belonged to the fragment that was kicked out still need to be either left in limbo (exclude them from all subsequent processing of the frame, including final sample averaging) or you must fill/extrapolate/guess values for the samples from other fragments covering the pixel. For both approaches, I have already described in this thread why they produce disturbing visible rendering errors.

Pegging the "centroid" samples like you seem to be doing will guarantee that the "centroid" samples have correct colors, but you are still failing to provide guarantees for any other sample.
 
arjan de lumens said:
Nope, still lossy.
All the information which would be present in a normal MSAA buffer is preserved losslessly though. The fragments which cover centroids have the exact same color as their MSAA buffer counterparts ... they just have a little extra information in the form of a coverage mask, and potentially there are some extra fragments (depending on the amount of visible surfaces covering MSAA sample centroids in the pixel).
 
Last edited by a moderator:
arjan de lumens said:
Hmm, I'm still trying to figure your scheme out - correct me if I am wrong:
  • You have N*M sub-samples in total for a pixel, divided into N groups, each of which has M sub-samples - one of which is designated as the "centroid" sample of the group.
  • You have N "slots" per pixel, each containing one fragment with a color/z value and an N*M-bit bitmap indicating which of the samples in the pixel are covered with that particular color/z value.
  • If you are adding a fragment to a pixel and this causes the total number of fragments in the pixel to exceed N, then you pick one fragment based on some criterion (coverage or whatever) and kick it out (with the caveat that a fragment owning a "centroid" sample is never kicked out).
Nope, still lossy.

The samples that belonged to the fragment that was kicked out still need to be either left in limbo (exclude them from all subsequent processing of the frame, including final sample averaging) or you must fill/extrapolate/guess values for the samples from other fragments covering the pixel. For both approaches, I have already described in this thread why they produce disturbing visible rendering errors.

Pegging the "centroid" samples like you seem to be doing will guarantee that the "centroid" samples have correct colors, but you are still failing to provide guarantees for any other sample.

You answered your own question in how to make it work. See the bolded part. You still have higher than N samples you just have an irregular sample pattern. With too few samples this could cause artifacts, but there might be enough noise from the various fragments that it doesn't matter.
 
3dcgi said:
You answered your own question in how to make it work. See the bolded part. You still have higher than N samples you just have an irregular sample pattern. With too few samples this could cause artifacts, but there might be enough noise from the various fragments that it doesn't matter.
Not quite. The situation where it doesn't work is as follows: First, render a high-polygon object. This object will send some sample locations of some pixels to limbo. Then render another (opaque) object in front of the first object. The samples that were sent to limbo stay there. The errors that you introduce this way might not seem big: they are in general of such a nature that you won't be able to pick them out if you only render a single frame. BUT: If the first object is moving from frame to frame and the second object stands still, you are supposed to see a perfectly still image, but: since the pattern of samples that end up in limbo keeps changing from one frame to the next, there will be slight but noticeable flicker/movement in the area of the screen where the first object was initially rendered. In e.g. an FPS, this could be exploited for nefarious means: by standing still and staring intensely at a wall, you can find out whether there is an enemy moving around behind the wall.

As I have been trying to explain to you guys, any MSAA/coverage-bitmap scheme that doesn't guarantee the integrity of ALL samples at all times will need to either assign wrong color/Z values for some samples (wrong color can to some extent be hidden for a while; wrong Z = pixel-sparkling-hell if you submit geometry twice) or exclude some samples from final downsampling, which has the problem I described above. You lose either way.

Trying to store more sub-samples per pixel that you can store full color/Z information for is alluring, but fundamentally a flawed idea - information-theoretically, it's basically the same problem as guaranteed lossless compression. Many intelligent people, including myself, have spent considerable time and energy pursuing such approaches, only to be slapped in the face with non-obvious but common test cases that fail miserably.
 
Last edited by a moderator:
Simon F said:
And besides, when there was hardware in the PC space that did automatic, per-pixel translucency sorting in hardware, the ISVs didn't make use of it. (i.e. the applications still spent time sorting the polygons :rolleyes: )

off-topic:

i'm affraid you can't blame the pc ISVs for not falling for a 1%-coverage best-case scenario. and that's why i loathe the pc as a platform "promoting" graphics progress.
 
We're going to have to disagree on this one argen. Think about normal 4x MSAA. Objects that cover a single sample point will pop in and out of view from frame to frame. This is no different from a coverage based method. If an opaque object covers another object then the only place a coverage based method should show any flickering is at the edges of the opaque front object. If something else happens the reduction algorithm is indeed flawed.

I think I now see where you're coming from. Z3's merge criteria never struck me as being robust and it might show the problem you're talking about. However, that doesn't mean there isn't criteria that is robust. I believe MfA was describing something different than Z3.

The problem with coverage based methods isn't rendering consistency. It's data storage. Traditional MSAA has nice consistent memory requirements with no need for masks and other elements. Since memory is fairly cheap and MSAA is simple MSAA remains king of the hill for hardware AA.
 
3dcgi said:
We're going to have to disagree on this one argen. Think about normal 4x MSAA. Objects that cover a single sample point will pop in and out of view from frame to frame. This is no different from a coverage based method. If an opaque object covers another object then the only place a coverage based method should show any flickering is at the edges of the opaque front object. If something else happens the reduction algorithm is indeed flawed.
Removing samples from consideration like you and Jawed suggest will cause flickering on polygon edges *WITHIN* the opaque front object's rendered area, not just on its silhouette edges.
I think I now see where you're coming from. Z3's merge criteria never struck me as being robust and it might show the problem you're talking about. However, that doesn't mean there isn't criteria that is robust.
If I have 2 fragments that cover parts of a pixel and wish to merge them into 1 fragment, then the resulting fragment CANNOT contain as much information as the original 2 fragments - it's basic information theory. Asserting that there exist merging criteria that are 100% robust is equivalent to asserting that guaranteed lossless compression is possible, and the mathematical proof that guaranteed lossless compression is NOT possible is really really simple.

The most obvious failure mode of ALL fragment merging schemes is that you - after merging - invariably get the Z value slightly wrong for at least some of the area covered by at least one of the original fragments. If you then pass in geometry twice (there are many valid reasons why you might want to do this), setting Z-compare-mode to EQUAL in the second pass, then the Z values will slightly mismatch in the second pass, causing Z-test errors, and the second pass fails to update parts of the pixel, causing a very visible glitch (most of the time in a place where there was no obvious edge in need of AA in the first place). There are a practically infinite number of other unavoidable but less common failure modes as well, some involving alpha-blend, some involving the stencil-buffer, some that cause see-through errors and so on and on.
I believe MfA was describing something different than Z3.
Yes. He has a different data representation and a different merging scheme - as far as correctness goes, that doesn't help one bit. His argument seems to be that his scheme ought to work because he is making guarantees about some designated "centroid" samples, and that therefore it should never be worse than plain MSAA. Well, if you screw up some non-"centroid" samples (unavoidable), then you can choose to either keep the screwed-up samples or discard some/all non-"centroid" samples from final downsampling. As I have been trying to demonstrate, NEITHER approach is safe.

MfA's scheme can likely guarantee results no worse than plain MSAA for *still images*, but not scenes with movement in them.
 
arjan de lumens said:
Not quite. The situation where it doesn't work is as follows: First, render a high-polygon object. This object will send some sample locations of some pixels to limbo. Then render another (opaque) object in front of the first object. The samples that were sent to limbo stay there. The errors that you introduce this way might not seem big: they are in general of such a nature that you won't be able to pick them out if you only render a single frame. BUT: If the first object is moving from frame to frame and the second object stands still, you are supposed to see a perfectly still image, but: since the pattern of samples that end up in limbo keeps changing from one frame to the next, there will be slight but noticeable flicker/movement in the area of the screen where the first object was initially rendered. In e.g. an FPS, this could be exploited for nefarious means: by standing still and staring intensely at a wall, you can find out whether there is an enemy moving around behind the wall.
I don't see how. Full coverage will always kick out partial coverage in any compression scheme that relies upon limiting the number of triangles per pixel to X. Yes, this does mean that you have to re-evaluate coverage every time a pixel is written to, but artifacts will be rather rare.

To really reduce the artifacts, though, you'd need to lump surfaces sent through the pipeline twice into one triangle in the compressed framebuffer. This shouldn't be that hard, as if you just make sure that your z-buffer compression scheme is identical the second time through, it'll be easy to check if this triangle was already rendered into this pixel.

The worst case for this algorithm would be a situation where you have lots of sub-pixel triangles. But for any normal mesh, 6 triangles per pixel should be enough.
 
arjan de lumens said:
The most obvious failure mode of ALL fragment merging schemes is that you - after merging - invariably get the Z value slightly wrong for at least some of the area covered by at least one of the original fragments. If you then pass in geometry twice (there are many valid reasons why you might want to do this), setting Z-compare-mode to EQUAL in the second pass, then the Z values will slightly mismatch in the second pass, causing Z-test errors, and the second pass fails to update parts of the pixel, causing a very visible glitch (most of the time in a place where there was no obvious edge in need of AA in the first place).
But if the program is sending all the same geometry, the compression scheme should return the same z-values the second time as it did the first.
 
Chalnoth said:
But if the program is sending all the same geometry, the compression scheme should return the same z-values the second time as it did the first.
For this to work, you have to make the hardware work on the geometry mesh as a single massive primitive - the procedure then becomes:
  • Pick a pixel you wish to render
  • Find every polygon that contributes to the pixel
  • Compute source Z for every sample of the pixel, using the data from all the polygons affecting it
  • Compress that set of source Z values
  • Now you have source Z values that can safely be compared against the destination Z values that the first mesh produced.
The 2nd step of this process sounds difficult.

If, on the other hand, you pass the mesh one polygon at a time, you are buggered. The compression of the destination Z will produce situations where the contents of one polygon have modified contents of other polygons. When you are processing just a single polygon, you don't have the necessary data set to simulate such effects when computing the source Z value, so when you look up in the destination Z buffer, your values won't match.
 
Chalnoth said:
I don't see how. Full coverage will always kick out partial coverage in any compression scheme that relies upon limiting the number of triangles per pixel to X. Yes, this does mean that you have to re-evaluate coverage every time a pixel is written to, but artifacts will be rather rare.
If you have removed a sample point from your data structure, then adding it back (even if only because you have a full-coverage polygon) means that you must make a wild guess about which data used to be in the sample point before you removed it in the first place. There is no way to make such a guess 100% reliable, and if you guess wrong, you now have made yourself a nice see-though artifact.

Also, the sample-pattern artifact I tried to describe will mainly appear along polygon edges in the front object, where the constantly-changing pattern of denied samples will cause the relative weights of the polygons affecting the final pixel to slosh back and forth. "Full coverage" is kinda hard to apply when you have a polygon edge.
To really reduce the artifacts, though, you'd need to lump surfaces sent through the pipeline twice into one triangle in the compressed framebuffer. This shouldn't be that hard, as if you just make sure that your z-buffer compression scheme is identical the second time through, it'll be easy to check if this triangle was already rendered into this pixel.
If you have a compressed framebuffer with N slots per pixel, and you have N+1 polygons affecting the pixel, at least one of your polygons will be damaged rather than combined correctly. And as I explained in the other post, with 1 polygon-at-a-time, you cannot do source-Z compression that will match the contents of the destination Z buffer.
The worst case for this algorithm would be a situation where you have lots of sub-pixel triangles. But for any normal mesh, 6 triangles per pixel should be enough.
Unless you have some sort of advanced LOD or dynamic tessellation scheme in place, making assumptions about the absence of sub-pixel triangles doesn't strike me as very safe.
 
arjan de lumens said:
For this to work, you have to make the hardware work on the geometry mesh as a single massive primitive - the procedure then becomes:
  • Pick a pixel you wish to render
  • Find every polygon that contributes to the pixel
  • Compute source Z for every sample of the pixel, using the data from all the polygons affecting it
  • Compress that set of source Z values
  • Now you have source Z values that can safely be compared against the destination Z values that the first mesh produced.
The 2nd step of this process sounds difficult.
No, not really. Here's the way I envison the process:
  • Triangle 1 hits the pixel we're looking at. Since there will be a trivial z-pass, a coverage mask will be applied that indicates what portion of this pixel is covered by this triangle.
  • Repeat step 1 as above until the maximum number of triangles is reached.
  • The N+1 triangle is rendered to the current pixel. The z-comparison is done. The new coverage mask is examined, and the triangle that contributes to the smallest number of samples is discarded.

Note that this hole will now be smaller than any other triangle in the list, so it can never be filled except if other triangles are occluded.

And I don't see why GLEqual won't work just fine here, as you'd first compress the z-data for the samples, then uncompress. That should give equal results to the values read from the compressed z-buffer.
 
Regarding seeing through artifacts. I never heard anyone complain about this with Parhelia and it has merge criteria that is not ideal and only stores a single z per fragment. So it can't be a terrible issue for mainstream graphics. Professional graphics would be a little more discerning which is why 3dlabs implemented an overflow buffer (my term) up to their maximum 16 samples. At least this is my impression. Regardless I think the problems are solveable.
 
3dcgi said:
Regarding seeing through artifacts. I never heard anyone complain about this with Parhelia
I'm sure there were plenty on this very board.
 
3dcgi said:
Regarding seeing through artifacts. I never heard anyone complain about this with Parhelia and it has merge criteria that is not ideal and only stores a single z per fragment. So it can't be a terrible issue for mainstream graphics. Professional graphics would be a little more discerning which is why 3dlabs implemented an overflow buffer (my term) up to their maximum 16 samples. At least this is my impression. Regardless I think the problems are solveable.
Not a good example. Digging through actual videocard reviews of the Parhelia, many of them do point out that there were - at the time - a number of rendering artifacts, un-AAed edges and just plain incompatible games with FAA - and I have seen no evidence that suggest that those issues were fixable.
 
Back
Top