Anti-Aliasing... so far

arjan de lumens · Oct 26, 2005

OK, read the patent - seems I made one misunderstanding: When a polygon owning a sample is rejected, the sample is not filled with some other polygon. Instead it is removed from consideration entirely - no polygon, past, present or future, is allowed to own the sample location AT ALL ever (at least not until the next frame), and the sample is not included in the final downsampling summation.

In that case, 2-pass rendering won't exhibit the problem I described. That still doesn't mean I am going to greenlight it - and here is why not:

Imagine a scenario where you render first one high-polygon object, then you render another object on top of that object, so that the first object is no longer visible. What happens now is that the first object removes some sample points from consideration. If you have a scenario where the first object moves, but the second object stands still, you will get subtle movement/flickering in the image where the image should have been 100% steady, because the pattern of sample points denied/made available to the second object keeps varying all the time. In a game, it could be possible to use this effect to see on a wall where something is moving behind it.

AA algorithms is a bit like programs that need to be secure - you need to attack them relentlessly until they break, and only if they resist every imaginable avenue of attack do you declare them as good to go.

Jawed · Oct 26, 2005

Sadly the patent application doesn't address what happens when triangles are completely removed from a pixel's sample-set by occlusion, once the triangle limit has been reached (four in this case).

Arguably the triangle count "N" in the patent should be decremented correspondingly, which would enable the GPU to "re-open" all invalid samples to the possibility of accepting a triangle.

Jawed

arjan de lumens · Oct 26, 2005

Reopening invalid samples in that algorithm sounds a bit dangerous - you need some heuristics to determine when it is 'safe' to reopen them and even when it is 'safe' to NOT reopen them, and you need to make an estimate of the information that used to belong to them before they were invalidated. If you make bad heuristics/estimates, it will be correspondingly easy for me to attack your algorithm, whether it is by 2-pass rendering, see-through artifacts, or some other avenue.

Jawed · Oct 26, 2005

Here's a working of a data structure based upon figures 6A-6E in the patent application (8xMSAA):

If there's one byte of coverage mask per visible triangle, this is how the coverage masks end-up (triangles 0 to 3):

0. 00000001 - blue in 6A (started as 11111111)
1. 10000000 - blue in 6B
2. 01101000 - red in 6C
3. 00000110 - yellow in 6D

The green triangle in 6E can't be sampled, and it simultaneously knocks out the fifth sample originally owned by triangle 0. 1/7 is the per-sample fraction because the four-triangle- visible limit has been exceeded.

If a new orange triangle F covers samples 1-4, it actually knocks out two complete triangles, triangles 0 and 3 (blue 6A and yellow 6D). So the triangle coverage masks look like this:

0. 10000000 - blue in 6B
1. 01100000 - red in 6C
2. 00001111 - orange F
3. 00000000 - nothing

As you can see, sample 5 which had been "discarded" earlier is still unset now (no triangle has bit 5 set). With only 3 triangles visible the per-sample fraction remains as 1/7.

If by pure coincidence :smile: another green triangle G comes along, covering sample 5 then the list will happily accept it as there's one slot unused:

0. 10000000 - blue in 6B
1. 01100000 - red in 6C
2. 00001111 - orange F
3. 00010000 - green G

With all samples allocated to a triangle the fraction can return to 1/8.

If a final triangle H, covering sample 4 appears, it can't be rendered as the triangle list is full, and the orange F triangle would remain visible - but sample 4 now becomes invalid:

0. 10000000 - blue in 6B
1. 01100000 - red in 6C
2. 00000111 - orange F
3. 00010000 - green G

And the fraction returns to 1/7.

Obviously this is all just a guess...

Jawed

Ailuros · Oct 26, 2005

Simon F said:
Well, I'll explain another alternative to you in person sometime later, if you like

That's not fair

Reverend · Oct 27, 2005

OpenGL guy said:
What "passing resemblance" do you refer to? The fact that it's a rotated grid?

I suppose this technique has very little to do with the AA quality compared to :

OpenGL guy said:
As you mentioned, VSA100 was using SSAA, I don't see anyone supporting SSAA as their standard AA technique because it's too slow or expensive to make fast (2 or 4 VSA100 chips if you recall). Also, ATI has been using gamma corrected MSAA, I don't believe 3dfx's technique was gamma corrected, but I may be mistaken.

Z3 was mentioned by arjan. It's a great technique. Why hasn't it been used so far?

Ailuros · Oct 27, 2005

SA used to have some interesting ideas about combining Z3 with fragment AA; but as I said great ideas remain great until they fail in at least one instance. I doubt IHVs are not researching the field, rather the contrary.

Here's a quote from Eric Demers that shows that they're not limiting research to just garden variety MSAA:

There's still not a single PC solution out there that gives AA quality like we do. Not only do we have a programmable pattern (which is currently set to sparse), we are also the only company offering gamma-corrected AA, which makes for a huge increase in quality. Honestly, we've looked at the best 8x and even "16x" sampling solutions, in the PC commercial market, and nobody comes close to our quality. But we are always looking at ways to improve things. One thing people do have to realize is that if you use a lossless algorithm such as ours (lossy ones can only degrade quality), the memory used by the back buffer can be quite large. At 1600x1200 with 6xAA, our buffer consumption is over 100 MBs of local memory space. Going to 8xAA would have blown passed 128MB. Consequently, with a lossless algorithm, the increase in subsamples must be matched with algorithm changes or with larger local storage. We've looked at things such as randomly altering the programmable pattern, but the low frequency noise introduced was worst than the improvement in the sampling position. Unless you have 32~64 subsamples, introducing random variations is not good. So we are looking at other solutions and algorithms. Current and future users will be happy with our solutions. Stay tuned.

http://www.3dcenter.de/artikel/2003/11-06_english.php

arjan de lumens · Oct 27, 2005

Reverend said:
Z3 was mentioned by arjan. It's a great technique. Why hasn't it been used so far?

Much like the algorithms I discussed with Jawed above, you generally get almost-correct rather than exactly-correct behaviour with this class of algorithms (unless, of course, you allow an unbounded number of polygons per pixel, in which case both Z3 and the algorithm Jawed found just break down to ordinary Multisampling in terms of worst-case memory usage). It might look perfectly fine and indeeed jaw-droppingly good 80, 90, 99 or even 99.999% of the time, but even if it fails 0.001% of the time, it's going to trigger full-scale paranoia in the minds of IHVs like Nvidia/ATI.

It's a bit psychological: while the issue of aliasing is well understood and a known problem, an anti-aliasing method that fails in an obscure corner case represents an unknown risk. And people ABHOR unknown risks, no matter how small they turn out to actually be.

As for the problem of large memory footprint of Multisampling, I would suggest using lossless compression schemes to try to reduce the memory footprint of the framebuffer as much as possible and keep a reservoir for the case where the initial framebuffer overflows - this will require a full memory allocation + complete virtual memory system for the reservoir implementation to be robust (which will require considerable OS cooperation), but you can make statistical arguments or empirical tests to determine how big the reservoir needs to be to handle 'common' cases and then just extend it if you turn out to be wrong or if an evil hacker launches a full-scale attack on it.

Geo · Oct 27, 2005

Ailuros said:
SA used to have some interesting ideas about combining Z3 with fragment AA; but as I said great ideas remain great until they fail in at least one instance. I doubt IHVs are not researching the field, rather the contrary.

Here's a quote from Eric Demers that shows that they're not limiting research to just garden variety MSAA:

That was two years ago. They still have something in their pocket, or is that "just" gamma-corrected and temporal AA and we've already seen those bolts shot?

Ailuros · Oct 27, 2005

geo said:
That was two years ago. They still have something in their pocket, or is that "just" gamma-corrected and temporal AA and we've already seen those bolts shot?

Pffffff to that so called "temporal AA" thing. There's also adaptive AA now available, which for some mysterious reason wasn't available on R300 since 2002 albeit I'd think that it would had been theoretically possible.

Albeit I believe that we'll still see MSAA in the nearest future, I'm also expecting to see some time in the future entirely new AA ideas, yet I wouldn't expect anything outside the multisampling realm. As an example I'd think that a form of stochastic MSAA would still save a lot of fillrate and bandwidth compared to stochastic SSAA as an example (or semi-stochastic if purely stochastic shouldn't be ideal).

arjan de lumens · Oct 27, 2005

With stochastic patterns, you are rather efficiently getting rid of aliasing, but you are trading the aliasing for a bit of noise. If you are not careful, you can get e.g. noisy/grainy polygon edges, which is a bit disturbing, especially at low sample counts.

One way to mitigate the particular problem of noisy polygon edges could be to e.g. instead of picking every sample location stochastically, set up a bunch of N-queen patterns and then use a stochastic algorithm to select which one to use for each pixel. (For those not familiar with N-queen, imagine an NxN chessboard - an N-queen pattern is then basically a placement of N queens on the chessboard so that no queen can attack any other queen).

Geo · Oct 27, 2005

A lot of the discussion I hear, and not just in this thread, but certainly here as well, is there is only perfection and crap.

And I just don't buy that. It makes me think a paradigm shift is needed, and makes me wonder if we are beginning to see it with AAA and NV's TrAA. I like the ATI name better precisely because it can be expanded over time.

When I work on a project at home, if I need a hammer I go get a hammer. If I need a screwdriver I go get a screwdriver. What I don't do is go to the hardware store, stand in front of the hammers and go "Jaysus, what an inelegant solution for screws --worthless!" then go stand in front of the screwdrivers and say "Keerist, that'll never drive a nail --worthless!".

It seems to me that rather than searching for the perfect algo, that we should be aiming at chips and drivers that are smart and brawny enuf to apply a range of very good algos to the situations that are perfect for them.

arjan de lumens · Oct 27, 2005

The perfection vs crap dichotomy is valid in some cases, but not all. A screwdriver that breaks 1 out of 1 million screws is generally still rather useful; a CPU that executes 1 out of 1 million instructions wrong is utterly useless; having a server that lets though 1 out of 1 million hack attemps will make you feel uneasy.

If you are going for the 'collection-of-reasonably-good-algorithms' approach instead of the '1-ultimate-algorithm', then developers will likely need to be made aware of what the various algorithms are, what their strengths, pitfalls and interactions are and how they can be managed together for the best possible result. Having the hardware try to pick algorithms, you will end up making a lot of assumptions about how 3d applications/games and their content are structured, and these assumptions can very well be violated by a big game title that you weren't aware of at hardware design time; having drivers pick algorithms, you will get an endless amount of game profiles being stuffed into the drivers.

Geo · Oct 27, 2005

ATI has gone a certain way with CrossFire in addressing how you deal with that issue, it seems to me. A slower-but-reliable algo as the default, hardlock for those apps you've had the time to check yourself, and "user play around and see what you like" for the rest.

Of course I think it would be fair to say there hasn't been enuf real-world usage of it yet to see how that model is received.

Certainly the more transparent "tools" on the table, the better. Transparency AA/current ATI AAA, would be examples --even tho I changed context on "transparent" in the middle of the thot.

Ailuros · Oct 27, 2005

arjan de lumens said:
With stochastic patterns, you are rather efficiently getting rid of aliasing, but you are trading the aliasing for a bit of noise. If you are not careful, you can get e.g. noisy/grainy polygon edges, which is a bit disturbing, especially at low sample counts.

One way to mitigate the particular problem of noisy polygon edges could be to e.g. instead of picking every sample location stochastically, set up a bunch of N-queen patterns and then use a stochastic algorithm to select which one to use for each pixel. (For those not familiar with N-queen, imagine an NxN chessboard - an N-queen pattern is then basically a placement of N queens on the chessboard so that no queen can attack any other queen).

The necessity of high sample densities for any kind of stochastic method to make sense, thats also pretty clear to me.

I'm not sure anymore if it was Simon of someone else, that said that he got quite satisfying results with some sort of semi-stochastic 16x sample method (albeit I think it was supersampling). Memory is vague so I might be entirely wrong.

Speaking of, does Mali's 16xMSAA use a 16 by 16 grid?

Ailuros · Oct 27, 2005

geo said:
Certainly the more transparent "tools" on the table, the better. Transparency AA/current ATI AAA, would be examples --even tho I changed context on "transparent" in the middle of the thot.

Frankly there are cases where alpha test textures cannot be avoided; for any other cases developers really should go through the extra trouble and use alpha blending instead; or just skip alpha tests entirely. My usual example being HL2; don't know about anyone else but if someone would tell me that the tons of fences in it couldn't had been done with alpha blends, then I can easily say that I could have enjoyed the game just as much w/o all those fences around. Since I've actually played the game when no transparency AA was available I almost got eye cancer from the endlessly flickering fences....

While transparency antialiasing is definitely a great thing to have, it's not the absolute eulogy either at all times. As Wavey said if the amount of alpha test data is extremely high you'll lose almost as much performance as with using full-scene supersampling.

I'll say it again: can I have please an antialiasing algorithm that has an equivalent effect on polygon interior data as for example 16x sample SSAA but with minimal performance penalty? That means to me that I don't necessarily need right now for my own preferences absolutely higher polygon edge AA sample densities (albeit more is frankly always better), but rather far more sophisticated filtering algorithms and to that without a ton of optimisations if they don't mind

Jawed · Oct 28, 2005

Yeah I've been mystified why it is that the fences couldn't have been alpha blended. They'd appear soft-edged when close-up (sort of like out of focus due to depth of field) but at distances they'd look really good.

What is the cost of alpha-blended "transparent" textures? Judging by the scarcity of alpha-blended grass in UT2K4, I imagine it's an expensive technique.

Presumably lower-end GPUs would need to revert to non-blended textures.

Jawed

Ailuros · Oct 28, 2005

Alpha blends take more time and you need back to front. Seeing how low the performance difference between ordinary MSAA and MSAA + transparency AA is in games like HL2, I doubt it would have had any serious performance impact. Of course if alpha blends weren't possible it's a totally different chapter.

Jawed · Oct 28, 2005

Presumably you could render the entire scene except for the fences, and then render the fences last, with only the fences rendered back-to-front?

Seems simple, what am I missing?

Jawed

OpenGL guy · Oct 28, 2005

Jawed said:
Presumably you could render the entire scene except for the fences, and then render the fences last, with only the fences rendered back-to-front?

Seems simple, what am I missing?

You have to depth sort all alpha blended rendering. So if you have explosions, glass, etc. and then have to sort fences as well, then you've added complexity.

Anti-Aliasing... so far

arjan de lumens

Jawed

arjan de lumens

Jawed

Ailuros

Epsilon plus three

Reverend

Ailuros

Epsilon plus three

arjan de lumens

Geo

Mostly Harmless

Ailuros

Epsilon plus three

arjan de lumens

Geo

Mostly Harmless

arjan de lumens

Geo

Mostly Harmless

Ailuros

Epsilon plus three

Ailuros

Epsilon plus three

Jawed

Ailuros

Epsilon plus three

Jawed

OpenGL guy

Similar threads