View Full Version : Matrox 16x "FAA"
I was a bit "worried" about Matrox FSAA implementation, many of us right now use FSAA not only to remove jaggies. But also to reduce shimmering textures due to Aniso and high LOD. (And in ATIs case it actually improves texture clarity and also removes those horrible mip map borders due to Aniso + Bilinear.)
Seems like Matrox FSAA will only remove jaggies, no more, no less.
I just hope their texture filtering methods are better than current videocards..
Kristof
14-May-2002, 12:14
It won't even remove all "jaggies"... jaggies created by 2 polygons intersecting will not be AAed as far as I understand the technique. So if a wall and floor intersect then that edge between floor and wall will be jagged. The technique is also potentially sensitive for small maths errors resulting in gaps as seen in the 3dmark dragon screenshot.
Ascended Saiyan
14-May-2002, 12:44
Since we are on the topic of anti-aliasing what sort of anti-aliasing specifically does it resort to when there Fragmented Anti-Aliasing{FAA} implementation fails ?
Hopefully it resorts to J-RGSS.
If developers lay off the T-junctions and intersections like they should there need be no problem :)
Kristof
14-May-2002, 13:28
If developers lay off the T-junctions and intersections like they should there need be no problem :)
Why should they avoid intersections ? Do you prefer gaps ?
With intersections I mean when the triangles actually intersect, not just meet without sharing texture coordinates. That wall and floor can still share vertex coordinates, then the question becomes wether it can merge fragments. In that case there need be no problems (there might be for their specific implementation, but its hard to say with the info we have unless I missed something).
LeStoffer
14-May-2002, 15:31
From the Anandtech preview:
There is no way to predict under what games FAA will result in artifacts but it does happen. Matrox mentioned to us that out of approximately 40 games they tested, around 5 - 7 exhibited artifacts with FAA enabled.
Okay then, but how bad is these artifacts? Are we "just" talking about jaggies not getting AA'ed - or are we talking about FAA changing pixel colours to something that is way off?
Regarding their texture filtering: It seems that Parhelia can do 16-tap anisotropic filtering without a fillrate performance drop (because of the quad texturing unit) so that's one step forward.
This baby should rock with FAA and AF on. 8)
Joe DeFuria
14-May-2002, 15:53
Yes, when this card is benched on "today's games" using AA and Aniso filtering, it should be a clear victor over GeForce4.
The only thing that really "bothers" me with the FAA technique, is that it apparently will not work properly when the stencil buffer is used. (At least, that was stated in one of the previews I've read.) I was under the impression that stencil use is gaining more and more use in games (particularly, Doom3), to the point of that stencil use is becoming almost standard?
So unfortunately, FAA might see declining usability over the life of the card, for games anyway.
EDIT: Does anyone know if the fragment technique itself and stencil buffering is generally mutually exclusive, or would it be a limitation of Matrox's current implementation?
The problem with stencil is the same as with alpha textures. Alphatesting aswell as stenciltesting discards fragments internally in polygons and will remain undetected by multisampling techniques. The same problem would occure on GF3/4 too afaik.
LeStoffer
14-May-2002, 16:06
EDIT: Does anyone know if the fragment technique itself and stencil buffering is generally mutually exclusive, or would it be a limitation of Matrox's current implementation?
Very good question. And here's another question: Do you need stencil buffering to do shadows? Or is it just the smartest way of doing it (I remember that John Carmack/Mark Kilgard [nVidia dev] has hyped this).
Kristof
14-May-2002, 16:19
Hmm, did not think about stencil tests before. Since with FAA you have 2 depth values (the background and the fragment Z) it kinda becomes non-trivial on whcih of these 2 to do the stencil compare test. Do you compare the background or do you compare the fragment, since both determine the color of the final pixel its not possible to make a right choice... so with stencil it does seem to fail. Stencil should not stop MSAA implementations AFAIK since they still do the full Z test per subpixel. SSAA should have no problem with alpha test or stencil since you do all the work no matter what.
K-
Joe DeFuria
14-May-2002, 16:20
Please forgive my ignorance on the subject. ;) Just trying to get a handle on the real-world implications of this.
So then as far as we know, you can enable FAA on a scene that utilizes the stencil buffer, and it should not introduce new artifacts....it's just that any pixel that "utilizes" stencil information will have the effect of no AA applied?
For a game like Doom3, where stencil is apprently used globally, would this mean that FAA would potentially have little or no effect? Generally, when stencil is used to test for shadows, etc, does that mean that "every" pixel won't have AA applied? Or is more like alpha...where only specific pixels in the scene are effected, such as those that have a shadow applied...
MfA...I'll try and find the link....
edit:
Found it: http://www.tomshardware.com/graphic/02q2/020514/parhelia-06.html
Matrox claims that this process allows for an image quality that corresponds to 16x FSAA quality, while minimizing the loss in performance at the same time. However, there are cases where fragment anti-aliasing doesn't work, for example, in scenes where the stencil buffer is used.
Kristof
14-May-2002, 16:31
So then as far as we know, you can enable FAA on a scene that utilizes the stencil buffer, and it should not introduce new artifacts....it's just that any pixel that "utilizes" stencil information will have the effect of no AA applied?
For a game like Doom3, where stencil is apprently used globally, would this mean that FAA would potentially have little or no effect? Generally, when stencil is used to test for shadows, etc, does that mean that "every" pixel won't have AA applied? Or is more like alpha...where only specific pixels are effected.
Stencil testing is enabled for each pixel of a polygon, but you only have a difficult case if a pixel is effectively a fragment pixel (where you have multiple, I assume 2, Z values). Ignoring the edge cases will probably create weird effects so they probabaly disable the effect when stencils are used. You could execute the stencil test but you would also need to have 2 stencil buffers... hmmm... maybe it should work with stencils but it does get quite complicated. I think they just decided it too complex for the hardware to handle... come to think about it, its probabaly possible to handle stencil you'd just have to execute it for both background and fragment buffer and influence the color of each respective buffer... it does sound complicated so maybe there are some combis that would not work at all. Or maybe they compressed the Z buffer of the Fragment buffer :
32 bit color + 28 bit Z + 4 bit Area Mask
or
32 bit color + 24 bit Z + 4 bit stencil + 4 bit Area Mask
K-
Kristof
14-May-2002, 16:35
AARGH... gotta love Tomshardware :
It's also interesting to compare quality. For example, Parhelia's fragment antialiasing (FAA) does not recognize lines that are rendered in the form of textures. Supersampling and multisampling, by comparison, smooth out these edges as well.
I don't think so... :roll:
Joe DeFuria
14-May-2002, 16:41
Yeah...I saw that quote too... ;)
In any case, it looks like it's about time for a Beyond3D Technology Article to explore Matrox's FAA technique....pros, cons, etc. ;)
Dave B(TotalVR)
14-May-2002, 16:54
Lol, he thinks multi-sampling is an optimised version of supersampling.
Please forgive my ignorance on the subject. ;) Just trying to get a handle on the real-world implications of this.
So then as far as we know, you can enable FAA on a scene that utilizes the stencil buffer, and it should not introduce new artifacts....it's just that any pixel that "utilizes" stencil information will have the effect of no AA applied?
For a game like Doom3, where stencil is apprently used globally, would this mean that FAA would potentially have little or no effect? Generally, when stencil is used to test for shadows, etc, does that mean that "every" pixel won't have AA applied? Or is more like alpha...where only specific pixels in the scene are effected, such as those that have a shadow applied...
In Doom3 you'll likely get nice AA on all polygon edges, but you wont get any AA on shadow edges.
Entropy
14-May-2002, 22:07
[quote="Humus
In Doom3 you'll likely get nice AA on all polygon edges, but you wont get any AA on shadow edges.[/quote]
Hmm. Not perfect, but it wouldn't be too bad either. It is not as if other forms of AA is completely perfect.
Speculation is fun and all, but until cards are in the hands of testers, there will be many questions unanswered.
Entropy
Joe DeFuria
14-May-2002, 22:12
Are the Doom3 Shadows "soft" or "hard"? The edges of "Soft" shadows wouldn't have as big (any?) need for AA as hard edges would.
Is there a concensus on whether normal multi-sampling AA would impact shadow edges?
Hmm. Not perfect, but it wouldn't be too bad either. It is not as if other forms of AA is completely perfect.
I believe that super sampling AA is in fact perfect in the sense of quality, because it just uses a larger buffer and therefore has no problems with stencil or alpha testing. Of course, that's also why it is quite slow compared to the other algorythms.
Kristof
14-May-2002, 22:25
Is there a concensus on whether normal multi-sampling AA would impact shadow edges?
Stencils are a per pixel effect not a texture effect so both MS and SSAA will see soft edges. With MSAA the backend is quadrupled (for NVIDIA) meaning that the Z test and everything linked to it is done in parallell this includes, I assume, the stencil operations. So the stencil operations are done at high res and downsampled later in the process thus anti-aliasing them.
FAA does not increase the number of pixels (resolution of the buffers stays low) and thus does not impact the stencil effects.
Alpha testing is a texture sampling effect and would thus not work in MSAA since it uses the same texture data (and thus alpha test) for all subsamples (not work means it behaves just like when rendering at normal resolution except when you use larger filter kernels in which case you get some bleeding effects that might create a bit of a softening effect on the alpha tested pixels, but it would be blur rather than anything else, then again AA is usually blur). SSAA samples the texture data for all subsamples and thus the alpha test would work. FAA has no impact on Alpha Tests it should work as using not AAed rendering (no gain, no loss).
Maybe we should have explained what Alpha Testing and Stencil really is so people understand where it fits in :)
K-
Maybe we should have explained what Alpha Testing and Stencil really is so people understand where it fits in :)
I think I had some flash animations about this somewhere, but I don't seem to be able to find them in the pile of disorganized folders, zips and files that is my harddisk.
I dont know what FAA does, but in principle a fragment when combined with a pixel could just create a new fragment ... no reason why a fragment buffer could not store destination alpha and stencil values.
No, but because Parhelia does only oversample polygon edges, edges created by stencil or alpha testing can not be antialiased.
Ugh, wasnt thinking for a moment.
Geeforcer
15-May-2002, 00:21
http://ixbt.com/video2/parhelia512/faa16x/3dmark_faa16x.jpg
On the FAA-16 shot above, the searchlight on the aircraft is not AA-ed. Is it done with alpha texture?
You could keep the advantages of FAA without its problems by using a modification of Z3.
In this case you would segregate pixels that had a partially occupied front sample mask as AAed pixels rather than use fragment edges as the determining factor. You would save all AAed pixels to the AA store and non-AA pixels to the frame buffer much the same as FAA. To improve performance further you would also modify Z3 by not waiting for the masks in a pixel to fill before combining fragments into a common mask. Instead you would combine the fragments whenever the AND of their masks is null (meaning no overlap) and (nearly) matching Zs which indicate the fragments are most likely part of the same surface.
This should have better performance than FAA and should easily extend to 32x or better AA (simply more bits in the mask) without much additional performance hit. It would also have significantly better performance and lower memory requirements than standard Z3.
As a side benefit you could get order-independent antialiased transparency in the same manner as standard Z3. Stenciling would also not pose a problem.
BTW, I would definitely keep the spare matrix sampling pattern that Z3 uses as well.
My memory is what it used to be, so I forget ... did Z3 store the Z slope in some way for fragments?
MfA, yes i believe it stores dz/dx and dz/dy along with z, color and the coverage mask.
SA, what are your thoughts on the R-buffer?
paper:
R-Buffer: A Pointerless A-Buffer Hardware Architecture
Craig M. Wittenbrink
I can't find a link to the pdf unfortunately and my hard drive recently died so i no longer have a copy :(.
You could keep the advantages of FAA without its problems by using a modification of Z3.
In this case you would segregate pixels that had a partially occupied front sample mask as AAed pixels rather than use fragment edges as the determining factor. You would save all AAed pixels to the AA store and non-AA pixels to the frame buffer much the same as FAA. To improve performance further you would also modify Z3 by not waiting for the masks in a pixel to fill before combining fragments into a common mask. Instead you would combine the fragments whenever the AND of their masks is null (meaning no overlap) and (nearly) matching Zs which indicate the fragments are most likely part of the same surface.
This should have better performance than FAA and should easily extend to 32x or better AA (simply more bits in the mask) without much additional performance hit. It would also have significantly better performance and lower memory requirements than standard Z3.
As a side benefit you could get order-independent antialiased transparency in the same manner as standard Z3. Stenciling would also not pose a problem.
BTW, I would definitely keep the spare matrix sampling pattern that Z3 uses as well.
What makes you think this would be faster or different than FAA?
Also, someone asked if stencil shadows would be antialiased. It sounds like the answer is no, because stencil shadows are really intersecting objects. They are made by the intersection of the shadow volume and scene geometry.
I was a bit "worried" about Matrox FSAA implementation, many of us right now use FSAA not only to remove jaggies. But also to reduce shimmering textures due to Aniso and high LOD. (And in ATIs case it actually improves texture clarity and also removes those horrible mip map borders due to Aniso + Bilinear.)
Seems like Matrox FSAA will only remove jaggies, no more, no less.
I just hope their texture filtering methods are better than current videocards..
Nvidia's multisampling doesn't improve texture clarity either. That's the tradeoff for better performance. Matrox did say they would support 4x supersampling as well as 16x FAA.
A presentation on the R-buffer is available in the ACM digital library if you have access (http://www.graphicshardware.org/previous/www_2001/presentations/wittenbrink.pdf>here</A>).
It would be interesting to be able to compare this to IMG's implementation of sort independent transparancy.
Z3 indeed stores the dz/dy and dz/dx slopes. They do this to handle implicit edge intersections for interpenetrating triangles properly (one of the major problems with FAA as Kristoff pointed out). Thus it AAs stenciled shadows properly as well.
They also store the stencil values for each mask level to handle stenciling correctly apparently another problem with FAA.
In addition it performs order independent transparency correctly which immediate mode renderers using techniques like FAA don't handle properly.
The reason that it is faster than FAA is because Z3 only stores a 2 or 4 byte mask, 1 byte stencil, 3 byte (compressed z and z slopes) and 4 byte color for each level for 3 levels. For various sampling rates this gives:
For modified Z3:
(2+1+3+4)*3 = 30 bytes per AAed pixel ---- 16x AA
(4+1+3+4)*3 = 36 bytes per AAed pixel ---- 32x AA
FAA (I am assuming from the description) stores the (4 byte) z value and (4 byte) color for each sample (and it seems without z compression). Thus it requires:
For FAA:
(4+4)*16 = 128 bytes per AAed pixel ---- 16x AA
(4+4)*32 = 256 bytes per AAed pixel ---- 32x AA
which is 4 times the memory bandwidth and memory size for 16x AA and about 7 times the bandwidth and memory size for 32 AA compared to using the modified Z3.
The modified Z3 is faster and more memory efficient than the standard Z3 for the same reason that FAA is faster and more memory efficient than standard AA, you only store and do AA where necessary. Its major advantage here is that it correctly handles all the cases where AA is necessary unlike FAA, while requiring much less memory and memory bandwidth.
One last difference is the sampling pattern. Z3 uses a sparse matrix pattern. I don't know the FAA pattern, but it seems to be an ordered grid. Sparse matrix sampling gives you N levels of color for Nx AA at all edge angles, including near horizontal and near vertical. Ordered grid and rotated grid patterns only give you sqrt(N) levels of color at certain angles, greatly reducing the effectivness of the antialiasing at those angles. For example when using 16x ordered grid AA you only get 4 levels of color gradation for near horizontal or near vertical edges with FAA. With Z3 you would get a constant 16 gradations of color regardless of the edge angles.
The R buffer seems a bit expensive with an average of 8 to 10 passes being recirculated. I would need to run some experiments and compare it to Z3.
Testing with Z3 indicates that 3 to 4 levels is sufficient for Z3 to get results with imperceptable differences from adding additional levels due to the effective insertion criteria it uses for new fragments. So the results should closely match those of the R buffer without Z3 having the recirculation overhead.
Thanks MfA :)
I was thinking about these AA methods with coverage masks and their impact on occlusion culling...
How would you do something like an early z-reject or HZ under these conditions? If each pixel potentially has multiple z-values, and maybe z slopes, a depth compare operation is going to be expensive.
I suppose you could store a z-max per pixel at frame-buffer resolution (which changes from the z clear value only when a pixel is completely covered by primitives), and do early z using that...
SA, the method you describe sounds good.
I have one question however which applies to both FAA, your method, 3dlabs superscene, but not Z3...
How are you addressing the AAed fragments? All samples for a pixel are stored together, but the pixels (set of 3 Z3 samples or whatever) in the fragment buffer are no longer layed out in a grid. In fact it seems like they would have to be stored in somewhat random order since a pixel in the frame-buffer can transition to the fragment buffer at any time during rendering.
AFAICS (but I'm not an expert at all), aren't you paying some extra bandwidth for transferring pixels between frame and fragment buffers, and extra compute time (or the storage cost of a pointer) to find a pixel's fragments?
Kristof
15-May-2002, 09:20
FAA (I am assuming from the description) stores the (4 byte) z value and (4 byte) color for each sample (and it seems without z compression). Thus it requires:
For FAA:
(4+4)*16 = 128 bytes per AAed pixel ---- 16x AA
(4+4)*32 = 256 bytes per AAed pixel ---- 32x AA
which is 4 times the memory bandwidth and memory size for 16x AA and about 7 times the bandwidth and memory size for 32 AA compared to using the modified Z3.
WHooowww... I am confused here. Do you actually assume that FAA stores out 16 color and Z values per pixel ? Or did you mean to compare with normal SuperSampling AA Rendering ?
I was assuming that FAA stored (1 + x) buffers : the framebuffer (or background buffer) which is 4bytes color and 4 bytes Z, and the fragment buffer(s) (I assumed they only have a single one) which is 4 byte color and 4 byte Z and 4 bits for the Coverage Mask (they might even reduce color/Z depth to fit in the 4 bits so that they have a nice 4byte number that alligns). So a total storage of only 16.5 bytes per pixel. Additional fragment buffers would costs 8.5 bytes each. The wole problem is with merging the masks from pixels that are covered by multiple connected triangles, and I think this is actually the biggest risk of failing the technique (e.g. strips, when they do not match up properly the system might fail to combine the mask and gaps appear).
Sigh... I guess Beyond3D needs to get up an article with one way the system might work and then discuss from there on how to improve the technique or how to handle it differently :)
http://ixbt.com/video2/parhelia512/faa16x/3dmark_faa16x.jpg
On the FAA-16 shot above, the searchlight on the aircraft is not AA-ed. Is it done with alpha texture?
I don't think so, you don't even need a texture to create that effect, you can do it with primary color alone. Further, it's alphablended, not alphatested, so there shouldn't be a problem. Plus that it's the edges that aren't AA'd, not internally in the polygon.
Sorry about the confusion, just read the white paper on FAA. They use coverage masks rather than individual Z's and colors for each sample. In this case there are two possibilities. Either the number of mask levels is left unbounded as in the A buffer or the number of levels is bounded and they merge levels as in Z3.
If the number of levels is unbounded as you mentioned then they must link additional levels as they are needed for a pixel which requires a link pointer for each level in a pixel. If you assumed 4 bytes for each pointer and 2 bytes for each mask, the amount of storage required would be:
(4+4+2+4)*L = 14 bytes per mask level * L unbounded levels
This is not only unpredictable in the maximum storage as you point out but in some scenarios can even exceed the amount of storage needed by directly storing color and z for each sample (this is one of the major problems with the A buffer). In addition, managing the dynamic list of levels is almost as much work as Z3 without any of the benefits.
If the number of levels is bounded, say at 3 as in Z3, then they must merge mask levels. In this case they again are doing as much work as Z3 without any of the benefits.
As to the addressability of the pixels. The frame buffer contains a pointer to the pixel in AA memory. Since AAed pixels require something on the order of 30 bytes or more each, and since they tend to be allocated per primtive along edges the memory accesses should cluster reasonably well so a simple array of AAed pixels would do. In the case of Z3, the bounded number of levels would allow you to preallocate enough memory to handle the worse case where every pixel was AAed. If you used the unbounded technique instead, you would need to stop the AA when you ran out of memory which will result in additional artifacts.
Overall, I think that the FAA approach to anti-aliasing of providing AA only where it is needed is a good one for immediate mode renderers in the near term. With some additional work it should be able to provide high sample AA without the artifacting it now has.
Bambers
15-May-2002, 13:08
http://ixbt.com/video2/parhelia512/faa16x/3dmark_faa16x.jpg
On the FAA-16 shot above, the searchlight on the aircraft is not AA-ed. Is it done with alpha texture?
There are a few other parts of that shot which are not AA'ed. Theres the intersections on the wheel hub and where the black bumber goes through the wheel. However the whole of the slope on the right hand side has not been AA'ed or, rather, it seems to have been aa'ed wrongly as it has a black outline (most noticable on the top of the road in front of the walkers feet).
Jerry Cornelius
16-May-2002, 01:00
Kristof,
Where do you get 4 bits? I counted 20 different kinds of pixel coverage easily. As far as I can see they would at least require the fifth bit to do it properly.
They probably use simple linked lists where they can vary the node type to include a couple of fragments with low precision information.
Gunhead
16-May-2002, 01:55
Maybe we should have explained what Alpha Testing and Stencil really is so people understand where it fits in :)
It's not too late to do that, you know :wink:
I understand (I guess?) the purpose of the alpha channel (to get partially transparent or translucent surfaces) in textures, but the actual alpha testing and blending methods -- what really goes down in the hardware -- yeah, what are they like? I don't understand stencil except that it's some kind of mask (channel? in textures?) which you can use to make pretty aeroplane cockpits and shadows for people with rocket launchers :wink:
Hey Kristof, BTW, your TBR paper years ago is still a true gem: it had the sufficient LOD. Whereas many attempts fall just a wee bit too abstract to make the reader go "Ah yes, now I get it"... But others are welcome to explain too, of course :)
Hellbinder
16-May-2002, 01:59
slightly, off topic question...
I know the above pics where taken under very "beta" conditions. However this FAA is supposed to be Matrox high speed AA solution, right? What kind of speed do you think the final product will kick out. What kind of hit on performance?
Also, Being that they fall back to Supersample 4x mode, Do you guys see their "normal" 4x FSAA being faster or slower than the current GF4? and what kind of hit do you see here...
Thank you..
Kristof
16-May-2002, 09:17
Kristof,
Where do you get 4 bits? I counted 20 different kinds of pixel coverage easily. As far as I can see they would at least require the fifth bit to do it properly.
They probably use simple linked lists where they can vary the node type to include a couple of fragments with low precision information.
I assume they only keep track of the percentage number, not the actual pattern of coverage. Keep tracking of how % of the pixel is covered is enough. To merge you just check the current fragment Z and color, if the same its probably a shared edge and you merge the percentages until you get 100% in which case you move it to the background buffer. I can not see any need to keep track of the actual coverage pattern :-?
Linked lists... you mean dynamic storage... could this not create very ugly memory access patterns for the final merging of the fragment and background data ? You want to avoid scatter reads as much as possible especially if your memory interface is 256bits DDR (512 bits in a long line).
K-
darkblu
16-May-2002, 10:22
I assume they only keep track of the percentage number, not the actual pattern of coverage. Keep tracking of how % of the pixel is covered is enough. To merge you just check the current fragment Z and color, if the same its probably a shared edge and you merge the percentages until you get 100% in which case you move it to the background buffer. I can not see any need to keep track of the actual coverage pattern :-?
for what it's worth, the number of occurences of the word 'mask' in matrox faa16 paper (as PR-level as it is) is zero. OTH, on figure 4 of the same doc one can slearly see a 4*4 pixel subdivision.
BTW, one thing i'm curious re this FAA thing (which i have to admit i like a lot as i have a soft spot for edge antialiasing) is does it get applied in a deferred manner (i.e. once all the fragment data for the scene get captured) or is it applied immediately?
Kristof
16-May-2002, 10:45
Fragment to Fragment mergingin can be done immediatly assuming the fragments are actually part of the same larger surface (e.g. a strip, or a plane formed out of many smaller triangles).
Fragment to final pixel mergin has to be done at the very end, since a different background color can be inserted behind the current edge fragment (draw white poly, draw red edge poly in front, insert blue poly between white and red poly... if you recombine too early your red fragment would have been merged with white while it should merge with blue).
K-
darkblu
16-May-2002, 12:11
Fragment to Fragment mergingin can be done immediatly assuming the fragments are actually part of the same larger surface (e.g. a strip, or a plane formed out of many smaller triangles).
well, i see no reason why fragment-fragment merging would not be done immedialtely regardless of fragments surface belonging, i.e. same surface or not.
Fragment to final pixel mergin has to be done at the very end, since a different background color can be inserted behind the current edge fragment (draw white poly, draw red edge poly in front, insert blue poly between white and red poly... if you recombine too early your red fragment would have been merged with white while it should merge with blue).
or, alternatively, fragments may get re-blended each time they happen to fall withing the footprint of a new poly: once you insert the blue poly whose footprint covers the fragments which used to blend w/ white before, re-blend them and you'd get blue-blended fragment pixels.
my question of whether fragment blending gets deferred or not stems from the fact that though it could be nicely deferred, parhelia is an IMR after all, and it would be more consistent if it did not wait for a flip() in order to carry out its AAing.
Bambers
16-May-2002, 14:57
]slightly, off topic question...
I know the above pics where taken under very "beta" conditions. However this FAA is supposed to be Matrox high speed AA solution, right? What kind of speed do you think the final product will kick out. What kind of hit on performance?
Also, Being that they fall back to Supersample 4x mode, Do you guys see their "normal" 4x FSAA being faster or slower than the current GF4? and what kind of hit do you see here...
Thank you..
According to the matrox white paper, they think it will be a 20-30% hit.
Kristof
16-May-2002, 16:24
Fragment to Fragment mergingin can be done immediatly assuming the fragments are actually part of the same larger surface (e.g. a strip, or a plane formed out of many smaller triangles).
well, i see no reason why fragment-fragment merging would not be done immedialtely regardless of fragments surface belonging, i.e. same surface or not.
To handle multiple fragments per pixel you either need multiple fragment buffers (Z, color and coverage) or else things "can" go wrong. As said the order of things is very important, if something is inserted between the 2 fragments you still need all the detail information to be able to blend things correctly. You could just blend things and hope for the best... I think they might actually be doing this and hoping it won't be too bad in most games.
Fragment to final pixel mergin has to be done at the very end, since a different background color can be inserted behind the current edge fragment (draw white poly, draw red edge poly in front, insert blue poly between white and red poly... if you recombine too early your red fragment would have been merged with white while it should merge with blue).
or, alternatively, fragments may get re-blended each time they happen to fall withing the footprint of a new poly: once you insert the blue poly whose footprint covers the fragments which used to blend w/ white before, re-blend them and you'd get blue-blended fragment pixels.
my question of whether fragment blending gets deferred or not stems from the fact that though it could be nicely deferred, parhelia is an IMR after all, and it would be more consistent if it did not wait for a flip() in order to carry out its AAing.
To be able to re-blend you still need to have all the subelements stored. So where do you store all this ? The moment you blend things you destroy the data, not sure if you can unblend things :)
K-
darkblu
16-May-2002, 17:12
To handle multiple fragments per pixel you either need multiple fragment buffers (Z, color and coverage) or else things "can" go wrong. As said the order of things is very important, if something is inserted between the 2 fragments you still need all the detail information to be able to blend things correctly. You could just blend things and hope for the best... I think they might actually be doing this and hoping it won't be too bad in most games.
ok, the faa doc explicitly mentiones that they keep data for all fragments per pixel. i was wrong on this account. btw, the case when fragments belong to the same surface does not allow their immediate merging either - it takes that they specifically be on a shared edge.
To be able to re-blend you still need to have all the subelements stored. So where do you store all this ? The moment you blend things you destroy the data, not sure if you can unblend things :)
i believe you missed my point - there's no need for unblending whatsoever.
let's say we have a red fragment @(x, y). at the time of its birth, the framebuffer pixel @(x, y) was white. so the hw blends red with white and updates the frame. then, at a later moment, @(x, y) goes a blue primitive which stands between the original white primitive and the red fragment. so the hw checks its fragment buffer @(x, y), finds the old red fragment data and carries out the blending again, this time between red of the fragment and blue of the new primitive. again, framebuffer gets updated with the newly-produced pixel.
edit: wrong example and a couple of ambiguities
edit': doh, there's a serious problem with what z value goes to the fragmented pixel. guess that leaves no options but deferring the whole process.
Colourless
16-May-2002, 17:45
You could just do a cheap hack. You store 2 fragments (colour and z) and the coverage.
If a new fragment is between the 2 older fragments, you replace the 'far' fragment with the new one and keep the old coverage.
Or if the new fragment is infront of both, you replace the old far with the old near, and make the new fragment the near, and update the coverage with the coverage of the new near.
Jerry Cornelius
17-May-2002, 00:08
I assume they only keep track of the percentage number
Ack, I hope not. I doubt they would go to all the trouble of edge AA and not do it properly. Without some kind of specific coverage information there's no way to properly manage the z values (nothing new I know).
Linked lists... you mean dynamic storage... could this not create very ugly memory access patterns
Sure, in a very ugly scene where it probably wouldn't really matter. Thinking about it more, as long as the first node could hold two fragments, that should cover 90% of the fragmented pixels. For complex pixels, if you've already textured it three times or more with at least two textures per pass, a bit of memory latency probably won't factor in too big. Besides, if the combining is deferred to the end of the scene rendering, the cache will do a pretty decent job of sorting things out (shouldn't it?) and if the scene is too complex, again it's it's game over anyhow.
To do it properly, I don't think you're going to be able to discard any colour or Z information for each fragment, so all you can do is revert to a 4 by 4 buffer if the list get's too big.
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.