8x OGAA - what's the point?

Discussion in 'General 3D Technology' started by BoddoZerg, Nov 19, 2002.

  1. SA

    SA
    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    100
    Likes Received:
    2
    Better than an order grid or rotated grid is a sparsely sampled grid. For best results you use different patterns for different pixels. The patterns should be based on the pixel screen location and shouldn't vary from frame to frame for a given pixel (to eliminate pixel flashing). Sparsely sampled grids are easiest to implement using a lookup table for the patterns and a small library of programmable patterns. Note that the sparse sampling guarantees that near horizontal and near vertical edges have N gradations in intensity where N is the number of samples (this is easiest to see by noticing there is one sample on every row of the grid, the same is true for columns).

    There are two reasons why sparsely sampled grids are better than rotated grids. First, they allow different patterns across nearby pixels. This breaks up patterns across pixels that would otherwise result in aliasing. The second is that it more effectively guarantees that you get N gradations for near horizontal and near vertical edges for larger sample sizes.

    Some sample patterns:


    For 4x

    --------x----
    x------------
    ------------x
    ----x--------


    For 6x

    --------x------------
    ----------------x----
    x--------------------
    ------------x--------
    ----x----------------
    --------------------x


    For 8x

    ------------------------x----
    --------x--------------------
    ----------------x------------
    x----------------------------
    ----------------------------x
    ------------x----------------
    --------------------x--------
    ----x------------------------
     
  2. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    Hmmm...

    That looks remarkably like the sample pattern analysis of the Radeon 9700 AA.

    http://www.beyond3d.com/reviews/ati/radeon9700pro/index.php?page=page15.inc

    4X is obvious:
    [​IMG]


    For the 6X shot, it appears the Dave's guesses for the "single set" of sample points (in red) might be wrong. Below, I have re-done them (in blue), and note the similarity to the "sparse" pattern as mentioned by SA:

    6X:
    [​IMG]

    (I'll flip SA's pattern along the vertical axis:)

    ------------x--------
    ----x----------------
    --------------------x
    --------x------------
    ----------------x----
    x--------------------

    We also know that ATI claims to be able to adjust sample patterns per pixel, so ATI may very well have implemented more or less exactly what SA is suggesting. From the same page linked above:

     
  3. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,298
    Likes Received:
    137
    Location:
    On the path to wisdom
    The "word" also has it that ATI's gamma-corrected 6x MS looks better than 16xFAA. And it will be quite complicated for matrox to find intersection edges without a higher-res Z buffer. It's not impossible, but hard to do.
     
  4. EasyRaider

    Regular

    Joined:
    Oct 1, 2002
    Messages:
    431
    Likes Received:
    2
    Location:
    Norway
    What's your point? Both suggestions are sparse patterns.

    On a different note, it's funny how this one feature suddenly changes my preference from NVidia to ATI.
     
  5. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    Yes, I just thought it was kinda creepy that they (SA's and my R-300 example) were pretty much the same sparse pattern. ;)
     
  6. Althornin

    Althornin Senior Lurker
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,326
    Likes Received:
    5
    Well, it IS likely that some research has been done on what the "best" (overall) sparse sample pattern is. So i find the coincidence less than amazing.
     
  7. SA

    SA
    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    100
    Likes Received:
    2
    I might add that another advantage of sparsely sample grids is a very simple 1d lookup table with simple edge crossing evaluation for setting a sample mask. For example in the cases above you have 3 1d lookup tables as follows (assuming the leftmost position starts at 0):


    For 4x

    2
    0
    3
    1


    For 6x

    2
    4
    0
    3
    1
    5


    For 8x

    6
    2
    4
    0
    7
    3
    5
    1

    To evaluate an edge crossing the pixel at any angle you calculate the x position of the edge at each sample row and compare it to the value in the lookup table. If less than the lookup value then the sample is on the right of the edge, otherwise it is on the left and you set your mask bit accordingly (this is all done in parallel of course). Of course you handle the case where the edge enters and leaves the sides of the pixel by which rows you start and end your evaluation of the lut.
    The nice thing is that it can be completely programmable, just by changing the values in the 1d lut.
     
  8. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    Well excuuuuuuussseeee me. ;)

    Let me change my statement then...

    I find it amazing that with all this research having been done, and all the knowledgeable types on this board, that no one had put 2 + 2 together and drew the conclusion (that apears rather obvious after SA's posts) that Radeon 9700's AA is based on "programmable sparse patterns."

    Up to this point, no one had explained with any reasoning, what ATI was doing with their AA.
     
  9. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    AFAIK, the common way to rasterize a triangle is to use three edge functions of the form a*x+b*y+c. If all the edge functions for a sample give a positive value, then the sample lies within the triangle. If one of the edge functions gives a negative value, then the sample lies outside. If some of the edge function return zeros, we look at the signs of a and b to determine if the point is inside or outside.

    The point? Edge functions are the only way to perform accurate rasterization (they don't introduce roundoff errors). Of course, this leads to the problem: how do we find the x position at the edge of a sample row? If we do division anywhere at all, like x = -(b*y+c)/a, we introduce a rounding error, which makes the rasterization imprecise (=bad, bad thing). So how do we avoid the division?

    One possible way: For each scanline, scan left and right, for each possible x position in each line of samples check if we are still inside the polygon or not and mark sample line ends once we run off them. Do for for both left and right ends of the line. This could be sped up by e.g. doing the division method to estimate endpoint position and then scanning left and right from that position to find the actual endpoints (bad: nondeterministic search time), or by doing some sort of binary search, homing in on each edge/sample row crossing in logarithmic time (better: deterministic time), etc. The per-scanline workload for this kind of solution will be the same for rotated and sparse grids, and much smaller for ordered grids.

    OK... once the x endpoints are computed correctly (which is the hard part of rasterization), then you can easily test actual sample points from a LUT or whatever method you use to store the sample locations.

    You may want to do tile-based rasterization instead of scanline-based, which of course adds a few problems of its own ...
     
  10. SA

    SA
    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    100
    Likes Received:
    2
    In addition to the sample pattern, the AA technique is probably even more important. Fragment AA methods with coverage masks offer many advantages. Since most pixels only have one or two fragments, then storing and processing the data for just one or two fragments will allow the processing of the majority of pixels regardless of how many samples there are per pixel, without any loss of quality.

    For two fragments (the most common AAed pixel) you can precisely compute the pixel coverage for an 8 bit pattern (8x AA) using only two colors, two zs, and an 8 bit mask. That is only 17 bytes for 32 bit color and 32 bit zs. If you stored a color and z at each sample it would require 64 bytes or about 4 times the data and memory bandwidth. For 16x AA a two fragment pixel requires 2 colors, 2 zs, and a 16 bit mask, or only one more byte than 8x AA with almost the same performance. In this case you compress the amount of memory and memory bandwidth by a factor of about 8.

    An interesting problem is what to do when 2 fragments are not enough.

    1. The A buffer technique simply adds more fragments to the pixel dynamically as needed. This method has a large memory requirement that is also highly variable and unpredictable.

    2. You can also fall back to one color and z per sample. It is probably simplest to preallocate all this memory up front, even though only a small portion of it will likely be used.

    3. Lastly you can merge the fragments, keeping the total number per pixel fixed. This technique minimizes the amount of buffer storage needed and the storage is fixed.

    Solutions 1 and 2 are lossless, since they always compute the precise coverage down to the sample. Solution 3 is potentially lossy, since it can discard coverage information in the pixel. Although solution 3 is potentially lossy, the differences will most likely only be noticable in areas of the scene with highly complex, finely detailed geometry such as dynamic 3d geometric hair and fur etc.
     
  11. Althornin

    Althornin Senior Lurker
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,326
    Likes Received:
    5
    Heh. didnt mean to come off in any negative light.
    You are right, previously we all speculated where ATI got the smaple pattern for 6x from...
     
  12. arjan de lumens

    Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,274
    Likes Received:
    50
    Location:
    gjethus, Norway
    Actually, the A-buffer technique does sound like it could work as a compression method for Multisampling, which would reduce the amount of memory that would need to be read/written per pixel a great deal, although the worst-case memory usage is slightly worse than for plain Multisampling.

    Could the R300 (and NV30?) multisample compression schemes be said to be a form of A-buffering?

    Solution 3 requires a somewhat intelligent algorithm for merging fragments to work well .... wonder if we will see it in 3d hardware (or whether Parhelia is implementing just that scheme).
     
  13. SA

    SA
    Newcomer

    Joined:
    Feb 9, 2002
    Messages:
    100
    Likes Received:
    2
    The last problem is the computation of z values at the samples. The problem with all three techniques above is that there is only one z value per fragment across the entire pixel. This assumes all the samples have the same z value for a fragment, sometimes a poor assumption.

    There are several solutions to this problem.

    1. Do nothing.
    2. Revert to one color and z per sample if this problem is detected.
    3. Keep z slope information for each fragment and compute the zs per sample.

    Solution 1 will cause artifacts whenever edges join at steep angles (such as room corners, etc.) and for interpenetrating triangles.

    Solution 2 works well with solution 2 above since the storage and calculation method are already accounted for. However, it might cost more in performance.

    Solution 3 works well with solutions 1 and 3 above and is the basis for the Z3 algorithm.
     
  14. Randell

    Randell Senior Daddy
    Veteran

    Joined:
    Feb 14, 2002
    Messages:
    1,869
    Likes Received:
    3
    Location:
    London
    aah

    /mre quickly digs out Z3 .pdf again
     
  15. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    Joe:
    While SAs posts are (as always) good compilations of knowledge. I wonder what part of them (in this thread) is something that "no one had put 2+2 together" and understood. Or even what part that not the majority of the "knowledgeable types" already understood.

    SA:
    I hope you don't take my comment as something negative. You sum it up real good, and it's always interesting to read your posts. I just thought that Joe put a lot of other people in a too bad light.
     
  16. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    Basic,

    I think you're misunderstanding what I'm saying. I'm really not meaning to put ANYONE in a bad light.

    It's just that from what I can tell until SA's posts and my follow-up, no one had publically made the speculation that Radeon 9700's AA was "programmable sparse sample AA." If someone already had, then my bad.

    It's quite possible for all I know that everyone else who's much more knowledgable about this stuff than me had already made that association, but hasn't said anything about it.

    Again, this isn't some world-beating revelation or anyhthing like that of course, but my point is, I haven't seen anyone come out and say it, that's all! I know I did not make the association until SA made his first post in this thread with the sample patterns...
     
  17. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    Joe, the people that can put "2 and 2 together" that visit this forum posts much less than, say, you for example. Sometimes, there isn't even a need to.
     
  18. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    Um, Rev...that's exactly what I was saying.

    But you know, for the rest of us who don't put 2+2 together as fast as others, maybe they could "share the wealth" a bit more often when it comes to "divulging the obviousness" of Radeon 9700's methods of AA....seeing that there's an article on this very site that doesn't profess to know.

    So excuse me again for stating the obvious for us poor "intellectually challenged yet gifted in verbosity" souls.
     
  19. Laa-Yosh

    Laa-Yosh I can has custom title?
    Legend Subscriber

    Joined:
    Feb 12, 2002
    Messages:
    9,568
    Likes Received:
    1,452
    Location:
    Budapest, Hungary
    Quick AA-related question... I understand that the GeForce4 has 4 Z-sample units per pixel pipe, so that it can do 4x MSAA without a loss of fill rate.
    How many such units does the Radeon 9700 have; based on it's 6X MSAA mode, I'd say 6, am I right?
     
  20. Bambers

    Regular

    Joined:
    Mar 25, 2002
    Messages:
    781
    Likes Received:
    0
    Location:
    Bristol, UK
    Yes, the r300 has 6 z units/pipe.

    (atis quoted max AA sample fillrate of 15.6G/325M/8 = 6. geforce fx has 16G/500M/8 = 4)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...