View Full Version : 8x OGAA - what's the point?
BoddoZerg
19-Nov-2002, 14:10
From Beyond3d's Geoff Ballew interview (http://www.beyond3d.com/previews/nvidia/nv30launch/index.php?p=3):
The 6XS is a skewed grid, the 8X mode is an ordered grid.
I can't help but wonder - what's the point of an ordered grid 8x mode? If the sample pattern is 4x2, then for horizontal lines you'll only get 2 gradients. In terms of visual quality, nVidia's 8x may not be any better than ATi's 6x jittered grid!
Seems to me that brute-force increasing the number of samples you take is a dumb way to improve FSAA quality.
Tagrineth
19-Nov-2002, 14:16
From Beyond3d's Geoff Ballew interview (http://www.beyond3d.com/previews/nvidia/nv30launch/index.php?p=3):
The 6XS is a skewed grid, the 8X mode is an ordered grid.
I can't help but wonder - what's the point of an ordered grid 8x mode? If the sample pattern is 4x2, then for horizontal lines you'll only get 2 gradients. In terms of visual quality, nVidia's 8x may not be any better than ATi's 6x jittered grid!
Seems to me that brute-force increasing the number of samples you take is a dumb way to improve FSAA quality.
It isn't jittered-grid, it's rotated... ATi doesn't jitter subsamples. So at equal levels, 3dfx's FSAA is still a microstep above ATi's. By the way, yes, I have now seen a Radeon 9700 Pro in action, and the 4x FSAA looks fantastic, but still just a tiny bit less than 3dfx's... the jittered sampling really works wonders, even with the same sample 'pattern'.
Thowllly
19-Nov-2002, 14:25
Actually, you will only get 1 gradient (plus maybe some extra single pixel gradient colors). Several months ago I decided that my next gfx card should support at least 4xRGMS and 4x anisotropic filtering. So my next card will not be a GFFX. Simple as that.
Joe DeFuria
19-Nov-2002, 14:30
It isn't jittered-grid, it's rotated... ATi doesn't jitter subsamples...
Actually, ATI doesn't just "rotate" the samples. ATI won't fully disclose what they do exactly, but it's not just a rotated grid. Jury is still out if ATI uses a consistent sample pattern for all pixels, or if they actively choose a sample pattern on a pixel by pixel (or per group of pixels) basis based on some criteria.
http://www.beyond3d.com/reviews/ati/radeon9700pro/index.php?page=page15.inc
Joe DeFuria
19-Nov-2002, 14:41
The 8X mode might actually prove to be quite interesting, though we'd need more specifics. Looking at these two statements:
On GeForce4 there was a total of 4 Z units per pipeline, which enabled you to output 4 MSAA samples per pipe per clock, is this still the same for GeForce FX?
Yes it is.
Given that, with the 6XS and 8X modes (which are both combinations of super and multisampling according to the same interview), I'm inclined to think that nVidia may limit the multisampling aspect to a maximum 4X.
So I'll postulate:
1) So 6XS might be 4X ordered grid supersampling combined with 2X rotated grid multisampling, (Since GeForce4 can do 2X rotated grid multisampling, I assume GeForceFX can as well).
2) 8X might be 4X ordered grid supersampling combined with 4X ordered grid multisampling. Hmmm....that doesn't make much sense though. ;)
In any case, the supersampling aspect will help with those bloody alpha textures. On the down side, it should come with a significant performance hit: The color compression won't be as effective with the supersampling aspect of these AA modes.
Reverend
19-Nov-2002, 15:22
The "point" may well be a couple of things :
1) "No one else has our 8x AA... that's the max on the market"
2) It may simply be "cheaper" transistor-wise than 6X actually
Of course, since everyone likes to do shootouts in a review, there cannot be any other modes by any other IHV to be compared to the GeForce FX's 8X.
Joe DeFuria
19-Nov-2002, 15:29
Of course, since everyone likes to do shootouts in a review, there cannot be any other modes by any other IHV to be compared to the GeForce FX's 8X.
Well that's where you are quite wrong. Quite a few sites have benchmarked "maximum quality" settings in the past and directly compared performance.
Would only be prudent to benchmark 8X on NV30 against 6X on R300 for an "apples to apples" comparison, no? ;)
And for consistency sake, I hope those same sites use Nv-30's "conservative" setting for aniso, because that is it's maximum quality setting, against R-300s aniso...
Would only be prudent to benchmark 8X on NV30 against 6X on R300 for an "apples to apples" comparison, no? ;)
Great idea, try suggesting that to Anand please, I'm sure he'd be thrilled... ;)
BoddoZerg
19-Nov-2002, 15:54
The "point" may well be a couple of things :
1) "No one else has our 8x AA... that's the max on the market"
2) It may simply be "cheaper" transistor-wise than 6X actually
Of course, since everyone likes to do shootouts in a review, there cannot be any other modes by any other IHV to be compared to the GeForce FX's 8X.
Tell me again where these 125 million transistors are going to if everything in NV30 is cheaper.
Althornin
19-Nov-2002, 16:08
The "point" may well be a couple of things :
1) "No one else has our 8x AA... that's the max on the market"
2) It may simply be "cheaper" transistor-wise than 6X actually
Of course, since everyone likes to do shootouts in a review, there cannot be any other modes by any other IHV to be compared to the GeForce FX's 8X.
Tell me again where these 125 million transistors are going to if everything in NV30 is cheaper.
The shader pipelines?
They are more advanced....
It isn't jittered-grid, it's rotated... ATi doesn't jitter subsamples. So at equal levels, 3dfx's FSAA is still a microstep above ATi's. By the way, yes, I have now seen a Radeon 9700 Pro in action, and the 4x FSAA looks fantastic, but still just a tiny bit less than 3dfx's... the jittered sampling really works wonders, even with the same sample 'pattern'.
With the same sample pattern, you get the same coverage result. There's no difference in saying the samples are 'jittered' or 'rotated' if it comes down to the same pattern.
But 3dfx used supersampling, and ATI's AA is gamma-corrected.
Reverend
19-Nov-2002, 16:32
Of course, since everyone likes to do shootouts in a review, there cannot be any other modes by any other IHV to be compared to the GeForce FX's 8X.
Well that's where you are quite wrong. Quite a few sites have benchmarked "maximum quality" settings in the past and directly compared performance.
Would only be prudent to benchmark 8X on NV30 against 6X on R300 for an "apples to apples" comparison, no? ;)
And for consistency sake, I hope those same sites use Nv-30's "conservative" setting for aniso, because that is it's maximum quality setting, against R-300s aniso...
I have my own standards for a review and I do/can not speak for others.
[edit]I think it is time to discard shootouts masquerading as reviews, especially with the availability of the R300 and NV30. As much as everyone likes to compare (and as much as various websites knows shootouts gets them higher views), shootouts lead to much confusion to most folks. Take a product, review it for what it's worth and leave it to folks to decide. Folks should take the time to read various reviews, look at all the screenshots and decide for themselves the IQ differences and trade-offs. Much more to rant about this but I'm off to bed!
Joe DeFuria
19-Nov-2002, 17:42
....shootouts lead to much confusion to most folks.
I'll have to respectfully disagree. Poorly done shootouts lead to much confusion and spread of misinformation.
If properly done, a shoot-out would provide a valuable compliment to a stand-alone review.
Tagrineth
19-Nov-2002, 19:53
2) 8X might be 4X ordered grid supersampling combined with 4X ordered grid multisampling. Hmmm....that doesn't make much sense though. ;)
One set of samples might be "AccuView"-shifted while the other one is straight-up ordered grid? Meh, still, it'll look like crap compared to ATi's properly set up 6x Rotated.
MistaPi
19-Nov-2002, 19:54
I really like the 6x (4xS) FSAA on the GF4Ti, the super-sampling helps a great deal on texture shimmering.
Do you guys really think 4xS would take a huge performance hit over 4x? I mean, the GF FX has alot fillrate to take from.
Bambers
19-Nov-2002, 20:14
4xS is not 6x its a skewed 4x grid.
For performance hit I'd have thought you'd only get a noticable hit in situations with lots of alpha tranparencies etc and in situtations where pixel fillrate is limited due to use of pixel shaders.
How would nvidia actually get 6xS?
It sounds like they only have 2x1, 1x2 and 2x2 SSAA and 2xRG and 4xOG MSAA to work with. I'm stuggling to get 6 samples out of that lot :-?.
Only thin I can think of is that 6xS uses a quincux or 9tap filter.
The only problem is its mathematically possible to define image quality.
Ideally you'd want to equalize that scalar quantity between cards, and then run benchmarks.
Of course, in practise its extremely complicated, and you'll probably see different indexes for various components of the image (at fourier frequency X we see this card with a great advantage compared to the other).
The flamewars over which index to use would be .. hilarious =) (kinda like error analysts when they can't figure out which 'quantity' more clearly shows their points).
So in the end it will realistically have to be subjective. Ie pick out modes which both 'look' good and similar when run side by side (Say using blindtesting to doublecheck yourself), then run you're benchmarks.
Laa-Yosh
19-Nov-2002, 21:30
How would nvidia actually get 6xS?
It sounds like they only have 2x1, 1x2 and 2x2 SSAA and 2xRG and 4xOG MSAA to work with. I'm stuggling to get 6 samples out of that lot :-?.
Only thin I can think of is that 6xS uses a quincux or 9tap filter.
Well maybe they render 3 differently shifted versions of the scene with 2X rotated grid multisampling. Sounds like good quality, but with a considerable performance hit...
Assuming NVIDIA didn't change anything on the 2x and 4x multisampling pattern, these could be the sample patterns for 6xS and 8x:
6xS:
http://www.3dconcept.ch/news/6xssmall.gif
8X:
http://www.3dconcept.ch/news/8xsmall.gif
Any other ideas? (Beside the oposite way of supersampling of course...)
The obvious variations on the above would be using a 3x3 filter in 8x case.
And of course scaling the buffers in both directions rather than the one you propose and sampling down, it might not be possible to get exactly 6 or 8 samples perpixel this way, but it would probably look better.
Oh, yes, 8x would then be 2x2 supersampling combined with 2x RGMS what would result in:
http://www.3dconcept.ch/news/8x2.gif
by supersampling 1.75 x 1.75 you would get something like this for the 6xS case:
http://www.3dconcept.ch/news/15x15xsmall.gif
or by using 2x1.5 you would get:
http://www.3dconcept.ch/news/15x2x.gif
depending on the downfilter, some pixel will be reused of course. A 2x2 downfilter would reuse 4 of the 8 subpixels and 2 of the 4 texel samples for example.
Bambers
19-Nov-2002, 23:29
8x is only ordered grid according to the interview here...
8x is only ordered grid according to the interview here...
Then we should call the second 8x one 8xS, one of the secret modes you can enable in an upcoming rivatuner? ;)
Simon F
20-Nov-2002, 09:19
RAM:
In those patterns there's a much bigger correlation in the samples along a 135 degree line (compared to other directions). Surely that would rather defeat the purpose?
Dave Baumann
20-Nov-2002, 10:23
One thing to throw into this discussion: if 4xS, 8xS and 8X are all mixed Super/Multi sample modes then what does the x symbolise for the xS modes?
I'm wondering if the texture sample position for two the pixel pipelines in 8X mode are actually the same.
Remember, there must be something different between 8X and the xS modes because 8X works in OpenGL meaning that as far as the application / extension knows this is still straight Multisampling.
EasyRaider
20-Nov-2002, 10:31
No, I don't buy that 8x mode (edit: ram's suggestion). 2x2 supersampling would probably be too slow, it wouldn't look good in benchmarks at all. And they'd surely have called it 8xS.
Either way, I'm really disappointed. An ordered grid can't give smooth enough near-horizontal and near-vertical edges. From the looks of this, my next card will have an ATI chip.
EasyRaider
20-Nov-2002, 10:38
One thing to throw into this discussion: if 4xS, 8xS and 8X are all mixed Super/Multi sample modes then what does the x symbolise for the xS modes?
I'm wondering if the texture sample position for two the pixel pipelines in 8X mode are actually the same.
Remember, there must be something different between 8X and the xS modes because 8X works in OpenGL meaning that as far as the application / extension knows this is still straight Multisampling.
AFAIK, it's 4xS, 6xS and 8x. Good points, though, and I think you may well be right.
Dave Baumann
20-Nov-2002, 10:52
AFAIK, it's 4xS, 6xS and 8x. Good points, though, and I think you may well be right.
Early morning Typo.
8x is only ordered grid according to the interview here...
Then we should call the second 8x one 8xS, one of the secret modes you can enable in an upcoming rivatuner? ;)I am quite sure, that 8x is the 8xS-Mode, with a subpixel-mask in our 3dcenter-fsaa-article since 9 month. (I'm afraid, nVidia will not pay us a license fee :))
I tell Unwinder also since month, he should make a patchscript to overwrite "4x 9 tap" with the 8xS Mode, but until how he was occupied with his Softwquadro Software and article (very good work) and other stuff.
Well, I don't know which mode will be the 8x one, but the Nvidia statement that it is ordered grid and the higher performance cost of the 2x/2x2 solution speaks for the 4x / 2x1 mode as '8x'.
Well, I don't know which mode will be the 8x one, but the Nvidia statement that it is ordered grid and the higher performance cost of the 2x/2x2 solution speaks for the 4x / 2x1 mode as '8x'.In few minutes our new 3dcenter article should be online (german language.) I hope my suggestions are correct, at least there is no real contradiction with any official information :)
nggalai
20-Nov-2002, 16:38
Well, I don't know which mode will be the 8x one, but the Nvidia statement that it is ordered grid and the higher performance cost of the 2x/2x2 solution speaks for the 4x / 2x1 mode as '8x'.In few minutes our new 3dcenter article should be online (german language.) I hope my suggestions are correct, at least there is no real contradiction with any official information :)
OK, is online:
http://www.3dcenter.org/artikel/2002/11-20_a.php
Thanks to aths for that one. and sorry for having been such a pain in the ass. ;)
ta,
-Sascha.rb
In few minutes our new 3dcenter article should be online (german language.) I hope my suggestions are correct, at least there is no real contradiction with any official information :)
Yes, but I still don't see why 8x must be 2x / 2x2 and not 4x / 2x1. It's a trade off between higher quality and higher speed. Kirk said (http://www.extremetech.com/print_article/0,3998,a=33705,00.asp), that with the NV30 "everything has been built around full-speed operation of 4X FSAA." So the conclusion would be that they take advantage of this high performance 4xAA in the 8x operation mode. If "S" stands for "skewed", as you say in your article, why don't they call 8x 8xS then?
In few minutes our new 3dcenter article should be online (german language.) I hope my suggestions are correct, at least there is no real contradiction with any official information :)
Yes, but I still don't see why 8x must be 2x / 2x2 and not 4x / 2x1. It's a trade off between higher quality and higher speed. Kirk said (http://www.extremetech.com/print_article/0,3998,a=33705,00.asp), that with the NV30 "everything has been built around full-speed operation of 4X FSAA." So the conclusion would be that they take advantage of this high performance 4xAA in the 8x operation mode. If "S" stands for "skewed", as you say in your article, why don't they call 8x 8xS then?Actually, I dont know the background of the nVidia nomenklatura. Or may be because 6xS is still D3D only, like 4xS.
But 8xS *is* an ordered grid - an "rotated ordered grid" :D, not a "rectangular ordered grid".
8x due 2x4 (4x + 1x2) is, spoken in EER-Term, only very little better than 4xS. I dont believe that nVidia would add an 8x-Mode with nearly no advantage over 4xS.
But 8xS *is* an ordered grid - an "rotated ordered grid" :D, not a "rectangular ordered grid".
This definition doesn't sum up. 3dfx' 4x pattern is also a "rotated ordered grid" like you say, but nobody calls this ordered grid.
8x due 2x4 (4x + 1x2) is, spoken in EER-Term, only very little better than 4xS. I dont believe that nVidia would add an 8x-Mode with nearly no advantage over 4xS.
History speaks against it. Nvidia preferred speed over quality with the GF3. Nothing stopped Nvidia from adding 4x AA instead of 4xS, although 4x is not much better in terms of edge anti aliasing over their 2x AA.
I think we stop arguing at this point about this issue, as I don't believe we can convince each other based on the available facts.
RAM:
In those patterns there's a much bigger correlation in the samples along a 135 degree line (compared to other directions). Surely that would rather defeat the purpose?
I'm not quite sure what you mean. In what way would that defeat the purpose?
But 8xS *is* an ordered grid - an "rotated ordered grid" :D, not a "rectangular ordered grid".
This definition doesn't sum up. 3dfx' 4x pattern is also a "rotated ordered grid" like you say, but nobody calls this ordered grid.There is a difference. 3dfx' RG-Solution is an "rotated ordered grid" for just one pixel. nVidias 8xS-Solution is an "rotated OG" for the complete grid.
But 8xS *is* an ordered grid - an "rotated
[quote]8x due 2x4 (4x + 1x2) is, spoken in EER-Term, only very little better than 4xS. I dont believe that nVidia would add an 8x-Mode with nearly no advantage over 4xS.
History speaks against it. Nvidia preferred speed over quality with the GF3. Nothing stopped Nvidia from adding 4x AA instead of 4xS, although 4x is not much better in terms of edge anti aliasing over their 2x AA.
I think we stop arguing at this point about this issue, as I don't believe we can convince each other based on the available facts.To realise 2x RG, the triangle Setup has to split a framebuffer-line into two. It is not a big expense to realize 4x OG, too.
There is a difference. 3dfx' RG-Solution is an "rotated ordered grid" for just one pixel. nVidias 8xS-Solution is an "rotated OG" for the complete grid.
?
Why should that matter for the defininition of a grid? What counts is how the samples are distributed in the final pixel.
There is a difference. 3dfx' RG-Solution is an "rotated ordered grid" for just one pixel. nVidias 8xS-Solution is an "rotated OG" for the complete grid.
?
Why should that matter for the defininition of a grid? What counts is how the samples are distributed in the final pixel.An edge takes more room than just a single pixel. The Sample grid of 8xS is ordered (but rotated), the sample grid of 4x RG is not ordered. The Voodoo5-Solution produces some "holes" in the overall sampling mask, at certain locations its more supposable to detect an egde than on others.
If (only) supersampling is applied, the "overall" ordered grid also produces (a little bit) better texture quality than the Voodoo5-Solution.
But on the other hand, the EER of the 4x-Mask Ã* la 3dfx is much more efficient.
So 4xS and 6xS is "skewed", with different EER on both axis, and 8x is an ordered grid with equal EER in both directions.
Actually i dont know why nVidia dont call his 8x simply 8xS :>
An edge takes more room than just a single pixel. The Sample grid of 8xS is ordered (but rotated), the sample grid of 4x RG is not ordered (if you look at the mask covering the complete framebuffer). The Voodoo5-Solution produces some "holes" in the overall sampling mask, at certain locations its more supposable to detect an egde than on others. If (only) supersampling is applied, the "overall" ordered grid also produces (a little bit) better texture quality than the Voodoo5-Solution. But on the other hand, the EER of the 4x-Mask Ã* la 3dfx is much more efficient.
Today's 2x RG solutions aren't called ordered although it is a 45° angle and therefore "ordered" over the complete framebuffer according to your definition.
Bambers
20-Nov-2002, 19:07
Whenever I've seen ordered its allways refered the pattern you get with simple render X times as large and merge back to the display size supersampling.
Today's 2x RG solutions aren't called ordered although it is a 45° angle and therefore "ordered" over the complete framebuffer according to your definition.Yep.
My suggestions are, that "an ordered grid" is may be not "the (common) ordered grid".
In my opinoin it is highly likable, that 6xS is 2x RGMS + 1.5x2 OGSS. (Of course its possible that 6xS is just 2x RGMS + 1x3 SS.) "My" (in fact, it was Xmas' suggestion) Sample grid results in an EER of 3x4 (even 3x"efficiently 6".) An 8x Mode due 4x OGMS + 1x2 SS got only an EER of 2x4, what means the edges are less smooth.
IIRC, nVidia also says, that the new modes are "non-rectangular". 8x due 4x and 2 SS is rectangular. Here is the argument: "Your" 6xS delivers an EER of 2x6, and "your" 8x an EER of only 2x4. I dont believe that nVidia implement a poorer 8x than 6x-Mode.
Colourless
20-Nov-2002, 19:54
I think you are clutching at straws. Nvidia will not have all of the sudden decided to not use the commonly used AA types just to confuse everyone.
Testiculus Giganticus
20-Nov-2002, 20:02
And why is the R9700 in another league when it comes to AA quality?
Joe DeFuria
20-Nov-2002, 20:02
Bottom line....
We need some follow-up questions to B3D's follow-up questions. 8)
BTW....can someone enlighten me as to exactly what nVidia's current 4XS mode does in terms of number of type sample positions? I was always under the impression that the "S" implied the use of supersampling, not "skewed". (IE, 4XS is a combination of supersampling and multisampling...)
can someone enlighten me as to exactly what nVidia's current 4XS mode does in terms of number of type sample positions? I was always under the impression that the "S" implied the use of supersampling, not "skewed". (IE, 4XS is a combination of supersampling and multisampling...)
4xS is a combination of 2x RGMS and 1x2 supersampling:
http://www.3dconcept.ch/news/4xs.gif
http://www.3dconcept.ch/news/4xvs4xs.gif
Joe DeFuria
20-Nov-2002, 20:11
Thanks, ram!
Tagrineth
20-Nov-2002, 20:19
OUCH, does anyone else think the lower 4xS gradient looks kinda... badly divided?
3 pale, one medium, then two dark pixles?
Should be 2-2-2...
OUCH, does anyone else think the lower 4xS gradient looks kinda... badly divided?
3 pale, one medium, then two dark pixles?
Should be 2-2-2...This depends by the bias. Also, due the "skew" in the grid, there is no guarantee for complete "isotrop" smoothing.
Bambers
20-Nov-2002, 20:54
Even with a rotated grid pattern like the v5 and 9700 you won't always get perfectly even divisions as it depends on the angle of the edge.
For vertical lines its obvious that 4xS is no better than normal 4xOGMS.
OUCH, does anyone else think the lower 4xS gradient looks kinda... badly divided?
3 pale, one medium, then two dark pixles?
Should be 2-2-2...
Try drawing out a grid that gives the optimal gradient in the above case, then try it on a few different angles and you'll quickly discover that, no fixed arrangement of subpixels give optimal gradients.
Personally I'm stunned how well brute force super and multisampling solutions have scaled. Considering the amount of wasted framebuffer space and bandwidth they require, I would have expected one of the major players to at least attempt a more intelligent solution.
I suspect though that raw performance numbers will still be the primary factors by which a card is judged and comitting to a significantly different antialiasing solution might require more rendering logic, potentially compromising the base performance.
Tagrineth
21-Nov-2002, 01:03
Personally I'm stunned how well brute force super and multisampling solutions have scaled. Considering the amount of wasted framebuffer space and bandwidth they require, I would have expected one of the major players to at least attempt a more intelligent solution.
Matrox. :roll:
Matrox. :roll:
Oh, please, 4x4 OGMS for edges isn't really a very intelligent solution. And FAA doesn't even work for all edges. I consider ATI's 6x MS a much better AA solution.
Tagrineth
21-Nov-2002, 01:44
Matrox. :roll:
Oh, please, 4x4 OGMS for edges isn't really a very intelligent solution. And FAA doesn't even work for all edges. I consider ATI's 6x MS a much better AA solution.
Yeah, but see, you were talking about using intelligent AA to reduce bandwidth AND memory footprint. Matrox VERILY addressed both, what more are you asking for?
ATi's 6x has a good sample pattern, but it doesn't differentiate anything and burns RAM like crazy. That's hardly 'intelligent'. :-?
ATi's 6x has a good sample pattern, but it doesn't differentiate anything and burns RAM like crazy. That's hardly 'intelligent'. :-?
Still, considering what's known about NV30, it's the best solutÃ*on available.
Joe DeFuria
21-Nov-2002, 01:52
Yeah, but see, you were talking about using intelligent AA to reduce bandwidth AND memory footprint. Matrox VERILY addressed both, what more are you asking for?
Personally, I'd ask for something that doesn't fail in as many cases as Parhelia's implementation does...like with stencil use, or intersecting polygons.
That's the trouble with "intelligent" solutions. They usually are smart about addressing some problem (like bandwidth or memory footprint), but at the same time are typically not as robust as "dumb" or brute force solutions.
Tagrineth
21-Nov-2002, 01:54
ATi's 6x has a good sample pattern, but it doesn't differentiate anything and burns RAM like crazy. That's hardly 'intelligent'. :-?
Still, considering what's known about NV30, it's the best solutÃ*on available.
Well, of course.
All I'm saying about Matrox is that they are a majour player, and they DID implement an intelligent AA. :-?
Personally I'm stunned how well brute force super and multisampling solutions have scaled. Considering the amount of wasted framebuffer space and bandwidth they require, I would have expected one of the major players to at least attempt a more intelligent solution.
Matrox. :roll:
If the Matrox solution actually worked in all cases, as opposed to what appears to be complete hack, then I might agree.
The 3DLabs solution looks intriguing, but I'm unclear on it's details.
But neither cards are really competitive in the market place.
Typedef Enum
21-Nov-2002, 02:12
Personally, I'd ask for something that doesn't fail in as many cases as Parhelia's implementation does...like with stencil use, or intersecting polygons.
The "word" is that Matrox has addressed all of the issues with FAA...but it will take a followup part in order to realize the fixes.
If the "word" pans out, then I think Matrox will have the best edge implementation available. When it does work, it's really amazing.
I think the best example I ran into was NOLF....the very beginning when Cate is standing in front of that window...
Joe DeFuria
21-Nov-2002, 02:17
The "word" is that Matrox has addressed all of the issues with FAA...but it will take a followup part in order to realize the fixes.
The "word" also has it that by the time Matrox gets another part out, it'll be just in time to get crushed by some other competitor part. ;)
Better than an order grid or rotated grid is a sparsely sampled grid. For best results you use different patterns for different pixels. The patterns should be based on the pixel screen location and shouldn't vary from frame to frame for a given pixel (to eliminate pixel flashing). Sparsely sampled grids are easiest to implement using a lookup table for the patterns and a small library of programmable patterns. Note that the sparse sampling guarantees that near horizontal and near vertical edges have N gradations in intensity where N is the number of samples (this is easiest to see by noticing there is one sample on every row of the grid, the same is true for columns).
There are two reasons why sparsely sampled grids are better than rotated grids. First, they allow different patterns across nearby pixels. This breaks up patterns across pixels that would otherwise result in aliasing. The second is that it more effectively guarantees that you get N gradations for near horizontal and near vertical edges for larger sample sizes.
Some sample patterns:
For 4x
--------x----
x------------
------------x
----x--------
For 6x
--------x------------
----------------x----
x--------------------
------------x--------
----x----------------
--------------------x
For 8x
------------------------x----
--------x--------------------
----------------x------------
x----------------------------
----------------------------x
------------x----------------
--------------------x--------
----x------------------------
Joe DeFuria
21-Nov-2002, 03:45
Hmmm...
That looks remarkably like the sample pattern analysis of the Radeon 9700 AA.
http://www.beyond3d.com/reviews/ati/radeon9700pro/index.php?page=page15.inc
4X is obvious:
http://www.beyond3d.com/reviews/ati/radeon9700pro/aa_imp_4x.gif
For the 6X shot, it appears the Dave's guesses for the "single set" of sample points (in red) might be wrong. Below, I have re-done them (in blue), and note the similarity to the "sparse" pattern as mentioned by SA:
6X:
http://defuria.com/misc/radeon6x.jpg
(I'll flip SA's pattern along the vertical axis:)
------------x--------
----x----------------
--------------------x
--------x------------
----------------x----
x--------------------
We also know that ATI claims to be able to adjust sample patterns per pixel, so ATI may very well have implemented more or less exactly what SA is suggesting. From the same page linked above:
Radeon 9700 PRO's FSAA has a programmable lookup table of sample positions. A set number of predefined sample patterns can be programmed in and the best sample layout for the pixel being rendered can be chosen from that list based on various criteria.
The "word" is that Matrox has addressed all of the issues with FAA...but it will take a followup part in order to realize the fixes.
If the "word" pans out, then I think Matrox will have the best edge implementation available. When it does work, it's really amazing.
I think the best example I ran into was NOLF....the very beginning when Cate is standing in front of that window...
The "word" also has it that ATI's gamma-corrected 6x MS looks better than 16xFAA. And it will be quite complicated for matrox to find intersection edges without a higher-res Z buffer. It's not impossible, but hard to do.
EasyRaider
21-Nov-2002, 19:50
For the 6X shot, it appears the Dave's guesses for the "single set" of sample points (in red) might be wrong. Below, I have re-done them (in blue), and note the similarity to the "sparse" pattern as mentioned by SA:
What's your point? Both suggestions are sparse patterns.
On a different note, it's funny how this one feature suddenly changes my preference from NVidia to ATI.
Joe DeFuria
21-Nov-2002, 19:54
What's your point? Both suggestions are sparse patterns.
Yes, I just thought it was kinda creepy that they (SA's and my R-300 example) were pretty much the same sparse pattern. ;)
Althornin
22-Nov-2002, 02:28
What's your point? Both suggestions are sparse patterns.
Yes, I just thought it was kinda creepy that they (SA's and my R-300 example) were pretty much the same sparse pattern. ;)
Well, it IS likely that some research has been done on what the "best" (overall) sparse sample pattern is. So i find the coincidence less than amazing.
I might add that another advantage of sparsely sample grids is a very simple 1d lookup table with simple edge crossing evaluation for setting a sample mask. For example in the cases above you have 3 1d lookup tables as follows (assuming the leftmost position starts at 0):
For 4x
2
0
3
1
For 6x
2
4
0
3
1
5
For 8x
6
2
4
0
7
3
5
1
To evaluate an edge crossing the pixel at any angle you calculate the x position of the edge at each sample row and compare it to the value in the lookup table. If less than the lookup value then the sample is on the right of the edge, otherwise it is on the left and you set your mask bit accordingly (this is all done in parallel of course). Of course you handle the case where the edge enters and leaves the sides of the pixel by which rows you start and end your evaluation of the lut.
The nice thing is that it can be completely programmable, just by changing the values in the 1d lut.
Joe DeFuria
22-Nov-2002, 04:13
Well, it IS likely that some research has been done on what the "best" (overall) sparse sample pattern is. So i find the coincidence less than amazing.
Well excuuuuuuussseeee me. ;)
Let me change my statement then...
I find it amazing that with all this research having been done, and all the knowledgeable types on this board, that no one had put 2 + 2 together and drew the conclusion (that apears rather obvious after SA's posts) that Radeon 9700's AA is based on "programmable sparse patterns."
Up to this point, no one had explained with any reasoning, what ATI was doing with their AA.
arjan de lumens
22-Nov-2002, 05:32
To evaluate an edge crossing the pixel at any angle you calculate the x position of the edge at each sample row and compare it to the value in the lookup table.
AFAIK, the common way to rasterize a triangle is to use three edge functions of the form a*x+b*y+c. If all the edge functions for a sample give a positive value, then the sample lies within the triangle. If one of the edge functions gives a negative value, then the sample lies outside. If some of the edge function return zeros, we look at the signs of a and b to determine if the point is inside or outside.
The point? Edge functions are the only way to perform accurate rasterization (they don't introduce roundoff errors). Of course, this leads to the problem: how do we find the x position at the edge of a sample row? If we do division anywhere at all, like x = -(b*y+c)/a, we introduce a rounding error, which makes the rasterization imprecise (=bad, bad thing). So how do we avoid the division?
One possible way: For each scanline, scan left and right, for each possible x position in each line of samples check if we are still inside the polygon or not and mark sample line ends once we run off them. Do for for both left and right ends of the line. This could be sped up by e.g. doing the division method to estimate endpoint position and then scanning left and right from that position to find the actual endpoints (bad: nondeterministic search time), or by doing some sort of binary search, homing in on each edge/sample row crossing in logarithmic time (better: deterministic time), etc. The per-scanline workload for this kind of solution will be the same for rotated and sparse grids, and much smaller for ordered grids.
OK... once the x endpoints are computed correctly (which is the hard part of rasterization), then you can easily test actual sample points from a LUT or whatever method you use to store the sample locations.
You may want to do tile-based rasterization instead of scanline-based, which of course adds a few problems of its own ...
In addition to the sample pattern, the AA technique is probably even more important. Fragment AA methods with coverage masks offer many advantages. Since most pixels only have one or two fragments, then storing and processing the data for just one or two fragments will allow the processing of the majority of pixels regardless of how many samples there are per pixel, without any loss of quality.
For two fragments (the most common AAed pixel) you can precisely compute the pixel coverage for an 8 bit pattern (8x AA) using only two colors, two zs, and an 8 bit mask. That is only 17 bytes for 32 bit color and 32 bit zs. If you stored a color and z at each sample it would require 64 bytes or about 4 times the data and memory bandwidth. For 16x AA a two fragment pixel requires 2 colors, 2 zs, and a 16 bit mask, or only one more byte than 8x AA with almost the same performance. In this case you compress the amount of memory and memory bandwidth by a factor of about 8.
An interesting problem is what to do when 2 fragments are not enough.
1. The A buffer technique simply adds more fragments to the pixel dynamically as needed. This method has a large memory requirement that is also highly variable and unpredictable.
2. You can also fall back to one color and z per sample. It is probably simplest to preallocate all this memory up front, even though only a small portion of it will likely be used.
3. Lastly you can merge the fragments, keeping the total number per pixel fixed. This technique minimizes the amount of buffer storage needed and the storage is fixed.
Solutions 1 and 2 are lossless, since they always compute the precise coverage down to the sample. Solution 3 is potentially lossy, since it can discard coverage information in the pixel. Although solution 3 is potentially lossy, the differences will most likely only be noticable in areas of the scene with highly complex, finely detailed geometry such as dynamic 3d geometric hair and fur etc.
Althornin
22-Nov-2002, 06:14
Well excuuuuuuussseeee me. ;)
Let me change my statement then...
I find it amazing that with all this research having been done, and all the knowledgeable types on this board, that no one had put 2 + 2 together and drew the conclusion (that apears rather obvious after SA's posts) that Radeon 9700's AA is based on "programmable sparse patterns."
Up to this point, no one had explained with any reasoning, what ATI was doing with their AA.
Heh. didnt mean to come off in any negative light.
You are right, previously we all speculated where ATI got the smaple pattern for 6x from...
arjan de lumens
22-Nov-2002, 06:15
Actually, the A-buffer technique does sound like it could work as a compression method for Multisampling, which would reduce the amount of memory that would need to be read/written per pixel a great deal, although the worst-case memory usage is slightly worse than for plain Multisampling.
Could the R300 (and NV30?) multisample compression schemes be said to be a form of A-buffering?
Solution 3 requires a somewhat intelligent algorithm for merging fragments to work well .... wonder if we will see it in 3d hardware (or whether Parhelia is implementing just that scheme).
The last problem is the computation of z values at the samples. The problem with all three techniques above is that there is only one z value per fragment across the entire pixel. This assumes all the samples have the same z value for a fragment, sometimes a poor assumption.
There are several solutions to this problem.
1. Do nothing.
2. Revert to one color and z per sample if this problem is detected.
3. Keep z slope information for each fragment and compute the zs per sample.
Solution 1 will cause artifacts whenever edges join at steep angles (such as room corners, etc.) and for interpenetrating triangles.
Solution 2 works well with solution 2 above since the storage and calculation method are already accounted for. However, it might cost more in performance.
Solution 3 works well with solutions 1 and 3 above and is the basis for the Z3 algorithm.
Randell
22-Nov-2002, 08:52
Solution 3 works well with solutions 1 and 3 above and is the basis for the Z3 algorithm.
aah
/mre quickly digs out Z3 .pdf again
Joe:
While SAs posts are (as always) good compilations of knowledge. I wonder what part of them (in this thread) is something that "no one had put 2+2 together" and understood. Or even what part that not the majority of the "knowledgeable types" already understood.
SA:
I hope you don't take my comment as something negative. You sum it up real good, and it's always interesting to read your posts. I just thought that Joe put a lot of other people in a too bad light.
Joe DeFuria
22-Nov-2002, 14:58
Basic,
While SAs posts are (as always) good compilations of knowledge. I wonder what part of them (in this thread) is something that "no one had put 2+2 together" and understood.
I think you're misunderstanding what I'm saying. I'm really not meaning to put ANYONE in a bad light.
It's just that from what I can tell until SA's posts and my follow-up, no one had publically made the speculation that Radeon 9700's AA was "programmable sparse sample AA." If someone already had, then my bad.
It's quite possible for all I know that everyone else who's much more knowledgable about this stuff than me had already made that association, but hasn't said anything about it.
Again, this isn't some world-beating revelation or anyhthing like that of course, but my point is, I haven't seen anyone come out and say it, that's all! I know I did not make the association until SA made his first post in this thread with the sample patterns...
Reverend
22-Nov-2002, 16:02
Joe, the people that can put "2 and 2 together" that visit this forum posts much less than, say, you for example. Sometimes, there isn't even a need to.
Joe DeFuria
22-Nov-2002, 16:08
Um, Rev...that's exactly what I was saying.
But you know, for the rest of us who don't put 2+2 together as fast as others, maybe they could "share the wealth" a bit more often when it comes to "divulging the obviousness" of Radeon 9700's methods of AA....seeing that there's an article on this very site that doesn't profess to know.
So excuse me again for stating the obvious for us poor "intellectually challenged yet gifted in verbosity" souls.
Laa-Yosh
22-Nov-2002, 17:01
Quick AA-related question... I understand that the GeForce4 has 4 Z-sample units per pixel pipe, so that it can do 4x MSAA without a loss of fill rate.
How many such units does the Radeon 9700 have; based on it's 6X MSAA mode, I'd say 6, am I right?
Bambers
22-Nov-2002, 18:30
Yes, the r300 has 6 z units/pipe.
(atis quoted max AA sample fillrate of 15.6G/325M/8 = 6. geforce fx has 16G/500M/8 = 4)
Z3's 8-bit slopes are really a form of lossy z compression. It is possible they could result in artifacts, but they likely wouldn't be very noticeable, if at all.
The one byte slopes are lossy in Z3. An error analysis indicates the amount of error is roughly equivalent to the undersampling caused by using 8 samples per pixel.
The fragment merging can be more problematic, however, depending on the maximum number of fragments and the detailed complexity of the geometry.
If you really want very high quality AA then reverting to one color - one z per sample when fragments are insufficient is a good solution. Another solution is to increase the number of fragments per pixel and use higher precision slopes.
Joe:
Maybe I was a bit touchy. It's just that now and then there is someone writing something insinuating "if you smart people at this site haven't mentioned this, you might not be so smart at all". Which makes me think "huh, I thought that was implicit in this, this and that discussion".
But in this case it must have just been me being touchy. Sorry 'bout that.
Back on topic and to your observation.
Yes, it is interesting that the pattern SA gave is in the actual measured pattern from a R300. But I would say that it's interesting for a different reason than you said. :)
When designing the sample pattern, what do you want to optimize?
First, you'd want it to be good for the angles that has most visible aliasing errors - almost horizontal and vertical edges. To make it good for horizontal edges, you'd distribute the samples evenly in vertical direction. And vice versa for vertical edges.
This results in positions in a sparse grid. N samples, one on each row, and one on each column in a NxN grid. And this is the optimal case for near vert/horiz edges no matter how your hardware is designed. Even with VSA100 where you can give x/y coordinates for each sample.
But those rules are of course not enough. Youd'd want it to be good for all other angles too. It can be a bit hard to find a formula for how bad the aliasing in edges in different angles are perceived with different sampling patterns. But a good pattern should have the samples evenly distributed over the pixel area. One way to measure how well distributed the samples are is the minimum distance between subpixels.
So if you maximize the minimum distance between subsamples, there's a good chance that you'd get a good pattern.
Doing this optimization results in six different patterns with rotated and mirrored versions, a total of 36 patterns.
No what I thought was interesting.
All 36 of them can be found in the sample pattern images you posted above. The "red" and "blue" patterns are two of these 36 patterns. Btw, the pattern SA gave can be found in non-mirrored form, the three lowest blue dots are the three higest in that pattern.
So the "what's new compared to what we knew about R300" comment were about:
We already knew a (possible) sample pattern, and it fitted the description "sparse sampled grid". Sample patterns have changed with driver revisions for R200, so they are at least programmable in some sense. I don't remember if I've heard anything about R300, but it's not that far fetched to think they left it. And finaly, even though the sampling pattern fits what SA said, it doen't mean that the implementation is done as he described. (At least in R200 they seemed to have a more flexible method.)
Ooops, that sounded a lot like "you're wrong, wrong and wrong". Didn't mean it like that. :) Just meant that while your observation was interesting, it didn't reveal new information.
So given the various AA implementation possibilities, my own preference is one that does not sacrifice any quality while maintaining high performance. While this may sound a bit like a paradox, very high quality AA like very high quality pixel shaders is local and typically only needed for a small percentage of the screen area. Nevertheless, it has a global impact on the viewer, just like the shaders.
Rather than use a lower quality AA over the whole screen, the right choice of methods that I mentioned previously with enough samples in the bit mask, the right patterns, the right fall back, and the right types of filters can provide a means of achieving overall CGI quality AA without sacrificing performance. There is a ways to go yet.
arjan de lumens
23-Nov-2002, 22:18
A couple of comments/ideas on the Z3 method:
The method could conceivably be improved by representing polygon-pixel coverage with polygon edge functions rather than a coverage bitmap sampled at a bunch of grid points. This should give results arbitrarily close to multisampling with an infinite number of sampling points. This method, of course, both produces a storage problem when you need to store multiple edge functions for a polygon, and a new opportunity for compression as edge functions are very similar between adjacent pixels and polygons that share edges.
It is possible, although probably uncommon, to get artifacts with interpenetrating polygons even with Z3. Consider two fragments that interpenetrate in a pixel. Presumably, color/texture value is sampled once for each fragment, at the center of the pixel for at least the first fragment. Now, the second fragment can cover the pixel center position, so that the remaining visible part of the first fragment has a color sampled outside its visible part. This effect could produce visible jaggies if the intersection line between the interpenetrating polygons hits a sharp enough color transition in one of the polygons' textures. One partial solution to this problem would be to store RGBA color slopes in addition to just Z slopes, but this gets expensive quickly.
No AA method is perfect, unless you have the needed resources to actually take the integral of "appropriate" color across the pixel (with, say, a sinc filter; ultra-expensive)
The problem you describe is a problem with multi-sampling in general. Under the majority of conditions the combination of high quality multi-sampling with high quality anisotropic-mipmap filtering is adequate since the texture samples cover most of the pixel. There is still a potential problem, however. The filtering might average an intense color for one fragment that should be hidden by another fragment, however the pixel will still display as intense. Supersampling the colors does not have this problem.
It is possible to supersample the colors using an adaptation of Z3 or similar coverage mask AA method. It is rarely worth the performance tradeoff at this point. Using a high quality pattern, more samples in the bit mask, a better shaped filter, a high quality fall back, etc. are areas that still need the most focus.
Bigus Dickus
24-Nov-2002, 08:01
Out of curisoty, what method do off-line rendering programs use? Supersampling?
Laa-Yosh
24-Nov-2002, 12:57
Out of curisoty, what method do off-line rendering programs use? Supersampling?
Waries wildly between renderers... Some can sample geometry and shaders seperately (like 3ds max), some sample them together (like Mental Ray). Sample positions also wary between ordered grid and jittered samples, star patterns, etc. Then you can add in raytracing, GI, photon maps and so on...
Most AA algorythms are also adaptive and can increase sampling rates (and can sometimes undersample as well). Heavy use of LOD with 4-5 or even more levels and mip-mapping also helps to reduce high frequency details. Tolerance level for the final image quality is pretty low though, even a small amount of noise, crawling or such is usually unacceptable - so you usually end up with upping the sampling rate higher and higher... :)
Multi-sampling with anisotropic filtering is a performance optimization that will continue to grow in importance as pixel shaders come into more general use. Chips are just getting to the point where they can perform some very interesting pixel shader effects at acceptable frame rates for one set of calculations per pixel. It is simply impractical at this point to run pixel shaders for each of 8 or 16 samples per pixel for supersampled AA at every pixel, especially for the small improvement.
In the future, it is likely that developers will apply supersampling to selective areas under control of the shader. This is especially true of shaders that actually generate the local image such as procedural textures and selective ray tracing.
Colourless
25-Nov-2002, 20:34
SA's 8x Sparse sample pattern produces some really nice looking FSAA.
I made a small program that would allow me to easily set the samples positions for my V5 6000, and I entered in SA's samples. There was a very noticable improvement over the default Rotated Grid sample pattern that is normally used. Trying to put a number to the amount of difference, the Sparse pattern was probaly 2x as good as the Rotated Grid that was used
Here is what the default 8x Sample Pattern set by 3dfx in Glide and OpenGL: http://www.users.on.net/triforce/glide8xpos.png Even just looking it, you can tell it's not a very good pattern. There are a number of cols and rows with 2 samples, and a few with none at all.
Oddly enough, the default 8x Sample Pattern in Direct3D is actually different. Here it is http://www.users.on.net/triforce/d3d8xpos.png It is actually a much better pattern, and produces really nice results. However, it's got some nasty critical angle problems.
Althornin
25-Nov-2002, 22:07
Colourless - could you be so kind as to take some comparative screenshots of what those modes look like?
Dam you v5 6000 owners :)
Yea thanks for the info Colourless and I am sure a lot of people would like to see any FSAA before/after images you can get with that pattern.
Colourless
26-Nov-2002, 13:53
Sure I'd make some screenshots. My limitation is I can only get them from OpenGL (and Glide) apps. Screenshots from D3D programs do not work. Also, I don't exactly have many apps OpenGL/Glide that I can run.
I 'would' run Basic's apps, but they are useless for a V5 since FSAA only works in Fullscreen.
Althornin
26-Nov-2002, 19:17
Sure I'd make some screenshots. My limitation is I can only get them from OpenGL (and Glide) apps. Screenshots from D3D programs do not work. Also, I don't exactly have many apps OpenGL/Glide that I can run.
I 'would' run Basic's apps, but they are useless for a V5 since FSAA only works in Fullscreen.
Thats fine, i'd just like to be able to look at some side by side shots and KNOW that the ONLY difference between them IQ wise is because of different sample positions. That would let me see how much of an impact various sample positions make. Plus, it would be cool and interesting :)
I 'would' run Basic's apps, but they are useless...
Hrmpfff... :evil:
:D
They are useless in another way too.
I don't think anyone doubt that you can put the subsamples in the positions you say. And that's exactly what the program shows. What would be interesting is a normal in-game scene, to convert the theory into a practical example.
But isn't it ironic that the reason I came up with the idea for that program was some intensive discussions about the sampling positions of a V5? :D I didn't write the pogram until long time after those discussions had faded though.
Colourless
26-Nov-2002, 20:28
Althornin (and others):
Here are 2 initial screenshots from Quake 3. They show some minor differences in the step (mostly noticable if you zoom in). Q3 of course is a shocking game to use to tell the difference with 'high quality' FSAA since it doesn't have high enough detail in the world and does have high enough contrast.
Anyway here they are:
http://www.users.on.net/triforce/fsaa/default.jpg
http://www.users.on.net/triforce/fsaa/sparse.jpg
I'll probably re-install DeusEx and see if I can find some parts of the game with particularily horrid aliasing.
Basic:
Way to go making my quote now look out of context (by only quoting part of it) :-)
Althornin
27-Nov-2002, 01:00
Thanks, Colourless. Thats a decent difference. Man, that 8x image (sparse) looks very clean! bet it looks great in motion.
Here is an example of a 16x sparsely sampled pattern.
------------------------------------------------x------------
----------------x--------------------------------------------
--------------------------------x----------------------------
x------------------------------------------------------------
--------------------------------------------------------x----
------------------------x------------------------------------
----------------------------------------x--------------------
--------x----------------------------------------------------
----------------------------------------------------x--------
--------------------x----------------------------------------
------------------------------------x------------------------
----x--------------------------------------------------------
------------------------------------------------------------x
----------------------------x--------------------------------
--------------------------------------------x----------------
------------x------------------------------------------------
Since we're giving out patterns here how about this one for 8x. I believe it will be slightly better than SA's pattern for short edges that are nearly horizontal or vertical, but it might be a little worse for steep angles. The pattern looks more ordered, but I've never really bought into the random is good idea for AA.
-------------------x---------
------------x----------------
-x---------------------------
-----------------------x-----
-----x-----------------------
---------------------------x-
----------------x------------
---------x-------------------
Although once a decent sparse pattern is choosen refining it more might be picking at straws.
arjan de lumens
27-Nov-2002, 04:46
That pattern looks like a 9x rotated grid pattern with the center sample missing and the other samples slightly skewed, and will presumably have noticeable worst-case behaviour when polygon edges are at same angles as the grid - unlike SA's grids, which seem to not have any particular class of worst-case behavior at all. And SA's grids, at least for 8x and 16x, look decidedly non-random to me at least.
Bigus Dickus
27-Nov-2002, 04:50
And SA's grids, at least for 8x and 16x, look decidedly non-random to me at least.
I agree, that 16x grid is anything but random.
Althornin
27-Nov-2002, 05:53
I agree, that 16x grid is anything but random.
Its not supposed to be, its supposed to be "sparse"
Once you have enough samples and a good sample pattern, the next source of AA improvement is a better filter.
Using a simple Bartlett filter instead of box filter will offer a good amount of improvement. You get two benefits, more gradations for the same number of samples, and a more correct filter shape. Of course as soon as you go to a wider weighted filter (involving surrounding pixels) you might as well make the weights programmable.
One mistake that many make when they create weighted filter shapes such as Gaussian filters is that they make the weights circularly symmetric around the pixel. This causes the sum of the weights for any particular sample point not to sum to 1/n. This creates uneven sampling which results in such artifacts as wavey edges. To solve the problem, Gaussian filters, windowed sync filters, etc. should not use circularly symmetric weights but should use weights that sum to 1/n for all samples. This results in a 4 sided tent shaped filter with horizontal and vertical silhouettes that are the desired Gaussian or windowed sync shape. Note that box filters and Bartlett filters do not have this problem, all sample weights sum to 1/n since they are not circularly symmetric but are 4 sided.
Ordered grids and staggered grids are easy to use weighted filters with. Programmable sparsely sampled grids are also fairly easy to weight using programmable weights. In this case you have 9 weights per sample in the grid lookup table (1 weight for the center pixel and 8 weights for the adjacent pixels).
When using a broad area filter, its best to apply the filter after the frame buffer is complete so that all the final colors are in the frame buffer. Some of the filter-as-you-go techniques are harder to implement.
Assuming the use of fragment AA with coverage masks to achieve compression, it is important to use a good fall back when there are more fragments than a pixel can handle.
For current scenes two fragments per pixel is usually good enough since most AAed pixels consist of a foreground edge and a background surface. This is because the triangles are still large and the complexity is low for most scenes.
For current scenes therefore edge quality (jaggies) is the major AA concern. For future scenes, however, subpixel triangle quality becomes more of a concern.
One fall back mechanism for fragment AA is to merge the fragments. This works well as long as there are only 2 surfaces in the pixel and merging is based on both the z and the color. Using only the z value for merging as Z3 does can produce artifacts if the polygons on the surface are different colors (such as a beach ball).
As scene complexity increases a larger portion of the pixels will have more than 2 surfaces, especially for outdoor scenes. In the long run, therefore, a fall back to uncompressed pixels with one color and z per sample works best.
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.