AA/AF enhancements

BTW, I agree that FAA is one of the best AA algorithms currently available, without its current shortcomings of course.

What is missing is sparse grid sampling, intesections using z slopes, and a fixed number of levels per pixel with fragment merging, which would put it directly in the realm of Z3. By allocating a separate buffer for the AAed pixels the way FAA does, you could substantially reduce the storage requirements needed by Z3 while increasing the maximum number of levels per pixel for better AA. By using fragment merging the way Z3 does, you correctly handle order independent transparency, and put a cap on the memory requirements for worst case scenarios. Of course, using z slopes provides high quality AA at implicit intersections.
 
Chalnoth said:
But the sample patterns indicate that there is one sample taken (at 6x, as an example) at each row and column in a 6x6 grid, for a total of 6 samples.
That may be a SAMPLE CHOICE pattern, not some deep relation to underlying architecture.
Sparse sampling is simply the most efficient means of sampling, AFAIK.
 
Chalnoth said:
Well, as far as I know, the only other way to do it (aside from minor deviations from the method I laid out above) would be the much less efficient method of having totally programmable sample positions. With the above method, it's still relatively easy to use simplistic calculations on the linear interpolations for each pixel subsample.

Of course, since it is MSAA, I suppose that much efficiency may not be necessary: the only thing that needs to be interpolated for each subsample is the z value. So, it may be possible to have totally arbitrary pixel allocation.
I'm happy to hear that! I was beginning to think you were going to disprove the R300's existence! ;)
But the sample patterns indicate that there is one sample taken (at 6x, as an example) at each row and column in a 6x6 grid, for a total of 6 samples.
Well, that's one possibility, but not the only (or correct) one.
 
My big question is how we get non-edge spatial antialiasing in a world where more and more often color values are calculated by shader programs rather than by slapping on textures.

(note: all of the following is as I understand it, which may be wrong)

Up 'til now the combination of MSAA for edge values and AF for non-edge values has served us very well; but AF only filters textures, not fragment shader outputs. Of course AF can filter texture inputs to fragment shaders, but as fragment shader programs get used less as low pass-count replacements for multitexturing and more to generate actual procedural effects, using AF on the inputs becomes not only ineffective but often quite wrong.

Of the tools we have now, only supersampling can correctly antialias the outputs of procedural shaders, but obviously at a ridiculous performance cost. It seems to me that the "correct" solution can come in one of two forms:

1) in-driver support for a method to do adaptive anisotropic calculation and blending of fragment shader outputs

2) explicit in-shader support for antialiasing the shader output, presumably incorporating some overcalculation in an anisotropic direction; and presumably at various levels of quality, selectable via in-game settings

Or possibly some combination of the two: supported in the API (like multisampling and anisotropic texture filtering now), but called from the game only for those shaders that need it. (After all, when AFing the texture inputs will produce the proper results, it should be used instead as it's much more efficient.)

In any case, oversampling the shaders in a brute-force isotropic fashion, ala supersampling, is inefficient (for the realized IQ) compared to oversampling on an adaptive anisotropic basis; and given that we're talking about running complex shaders here, it would obviously be pretty important to get the most bang for the buck possible.

Well at least that's the way I understand it. Am I missing something? If not, when will we see this sort of functionality?
 
OpenGL guy said:
But the sample patterns indicate that there is one sample taken (at 6x, as an example) at each row and column in a 6x6 grid, for a total of 6 samples.
Well, that's one possibility, but not the only (or correct) one.
Then perhaps you just don't understand what I'm attempting to say. I shortened the definition as much as possible, and did leave quite a lot out. But the end result of the Radeon 9700's FSAA coincides with sparse sampling (whose definition I shortened, perhaps too much, in the above).

Btw, here's an edited pic that I pulled off of this post:

sparse.jpg


The picture is from a test of the sampling pattern of ATI's 6x FSAA. Any selection of six samples would have given the same picture: six columns, six rows, with one pixel sample in each column, one in each row (separated as much as possible). The pattern would have been different with a different selection of six samples, of course, but the idea is what's important.
 
Mariner said:
You need to upgrade your memory Simon - it's very cheap these days, don't you know. ;)
What's the point when I'm downgrading it with the occasional Cab-Sauv from Hardy's.

Ostsol said:
Is there any performance advantage to using certain sample patterns over others, such as order-grid rather than rotated grid? It doesn't seem to make sense that NVidia would keep on using OG for 4x, which looks only slightly better than 2x RG (most of the time), unless there was a clear benefit. If there's no performance benefit, is it simply easier to implement?
The hardware to perform the maths for an ordered grid is likely to be slightly cheaper than a fixed sparse/rotated grid as more of the calculations could be shared.

RussSchultz said:
Walt, how many times are you going to say that?

It just isn't true. Saying it over and over again won't make it true either. Please stop spreading your misinformation.

Russ, don't you know "The Hunting of the Snark"?
The proof is complete, if only I've stated it thrice.
:)
 
DemoCoder said:
It will be done the same way it is done in RenderMan today.

Aha. I did a quick search on shader antialiasing in Renderman, and found (along with a lot of chapter titles in tables of contents for offline textbooks) an interesting writeup based on a SIGGRAPH presentation. The solution presented there is to take care of the antialiasing in the shader itself; not, however, by doing extra point sampling of the shader output, as I proposed, but by a couple approaches to removing sources of aliasing from the shader inputs.

The first is essentially to analytically compute the integral of the input function (actually, the convolution with a filter kernel) and use that instead of point sampled values. This method works well as long as the function is well-behaved enough to easily take the integral.

Unfortunately, many functions are not. However, it is often easier to compute the average value of the function, and also compute the "featuresize" of the function at a given point. As aliasing will only occur when the featuresize is too small compared to the sample granularity (i.e. the feature frequency is above the Nyquist limit), they recommend that you figure out when that occurs and simply replace the function with its average value in those spots. (Actually, they recommend blending in the average value as the feature frequency approaches the Nyquist limit.)

The problems with that are somewhat obvious: if the featuresize of your shader function is too-often out of whack with your sample density, then you've just replaced your shader with solid gray (or whatever). Of course, if that was the case in the first place than you would have got horrible aliasing otherwise; obviously the shader was poorly chosen.

Still, I have to say these approaches present some problems when moving from the non-realtime world of Renderman shaders--where output resolution and rendering performance are known ahead of time, and where the conditions under which a shaded object will be viewed (particularly, the distances from which it will be viewed) are controllable and probably known ahead of time--to realtime games run on hardware of widely varying performance. That is, I wonder how well these methods scale both in terms of screen resolution (and hence sample density of the shader output) and in terms of allowing various levels of trade-off between performance and antialiasing quality.

Although, just thinking about it, it seems that these problems shouldn't be terribly difficult to solve. Thanks for the tip, DC.

And a question: are these sorts of techniques in use in current/near-future games?
 
Chalnoth said:
OpenGL guy said:
But the sample patterns indicate that there is one sample taken (at 6x, as an example) at each row and column in a 6x6 grid, for a total of 6 samples.
Well, that's one possibility, but not the only (or correct) one.
Then perhaps you just don't understand what I'm attempting to say.
I know exactly what you're saying. But you are incorrect.
I shortened the definition as much as possible, and did leave quite a lot out. But the end result of the Radeon 9700's FSAA coincides with sparse sampling (whose definition I shortened, perhaps too much, in the above).
I know what a sparse grid is and I know that we are using one.
Btw, here's an edited pic that I pulled off of this post:

sparse.jpg
And your picture is incorrect.

Why are you arguing with me?
 
OpenGL guy said:
Why are you arguing with me?

Maybe we want to make you confused until you say everything? ;) j/k

More seriously though, let me summarize this...
ATI *is* using a sparse grid
That picture is incorrect.
ATI's goal was not to have one sample at each row and column in a 6x6 grid for a total of 6 samples, while it may still be the case ( is it, actually? Hmm... )


Uttar
 
My guess:
1. The per pixel "grid" being used is nxn where n is a power of two.
2. n is bigger than 4 or even 8. Using a larger grid (say 64 x 64)
gives you more flexibility to position the AA samples.
 
keegdsb said:
It's possible that Matrox has improved upon FAA in the new P-LX, (P6/750 cards) still waiting for confirmation. (Or something to disprove it for that matter)
FAA has been improved, but it still isn't perfect. Basically there should be less artifacts than before, but it still doesn't antialias intersecting objects, etc.
 
SA said:
BTW, I agree that FAA is one of the best AA algorithms currently available, without its current shortcomings of course.

What is missing is sparse grid sampling, intesections using z slopes, and a fixed number of levels per pixel with fragment merging, which would put it directly in the realm of Z3. By allocating a separate buffer for the AAed pixels the way FAA does, you could substantially reduce the storage requirements needed by Z3 while increasing the maximum number of levels per pixel for better AA. By using fragment merging the way Z3 does, you correctly handle order independent transparency, and put a cap on the memory requirements for worst case scenarios. Of course, using z slopes provides high quality AA at implicit intersections.
I'm surprised most companies are taking the brute force approach to AA and just using a lot of memory. At some point it must make sense to spend some transistors on finesse in order to save money on memory. Memory might be cheap, but saving 64 MB or more per board will add up.
 
I guess I’m easy to satisfy because I’m quite happy with the 4xFSAA on the R9700 Pro. Overall 8xAF looks good too as I haven't noticed any problems with [massive] blur at odd angles.

That probably makes me an Image Quality Hillbilly (I.Q.H.), so just give me some speed and I'm all happy. :eek:
 
I actually like the maskable multisampling approach more than Matrox' FAA. However the number of samples is still a bit restricting and it probably won't be until a TBR with multisampling comes up that we'll have a useful number of samples.


It probably would be hard for R300 to support evenly-spaced sparse 4x multisampling if it used a 6x6 grid internally ;)
 
Ostsol said:
Is there any performance advantage to using certain sample patterns over others, such as order-grid rather than rotated grid? It doesn't seem to make sense that NVidia would keep on using OG for 4x, which looks only slightly better than 2x RG (most of the time), unless there was a clear benefit. If there's no performance benefit, is it simply easier to implement?
Ordered grid has the advantage of fewer transistors for calculations, but possibly more important is there are fewer bits of precision to pipe from the setup engine. Depending on the number of clock stages this could add up. Of course this all depends on the specific implementation.

Also, maybe OG works better with Nvidia's z compression algorithm.
 
Back
Top