Parhelia - FAA & OGSS Performance hit

Discussion in 'Architecture and Products' started by Randell, Jun 21, 2002.

  1. Randell

    Randell Senior Daddy
    Veteran

    Joined:
    Feb 14, 2002
    Messages:
    1,869
    Likes Received:
    3
    Location:
    London
    Those screenies being posted at the MURC forum show impressive edge-aliaising & 16xFAA is only meant to have a 25% performance hit. On games that are incompatible there is 4xOGSS to fall back to which is meant to have a bigger hit than FAA. Can anyone speculate what they will be, I assum it will be nowhere near the same as the 8500 takes for 4xSmoothvision in modern games.
     
  2. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    I'd guess that it will have a similar performance hit to ATI's method. The only way I can see it being faster is if Matrox can use the extra texture pipes to accellerate supersampling when only dual texturing is being done. Although maybe the extra memory bandwidth of Parhelia will help. I've been curious about this too. Hopefully someone will benchmark supersampling.
     
  3. Randell

    Randell Senior Daddy
    Veteran

    Joined:
    Feb 14, 2002
    Messages:
    1,869
    Likes Received:
    3
    Location:
    London
    Well Matrox have hinted at several memory controllers within the 256bit bus, so I hoped the perfromance drop off at say 1024x768x32bit wouldn't be as great as with the 8500.
     
  4. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    I don't see how more effective memory bandwidth is going to help the Parhelia any with supersampling performance...most of its performance hit comes from a loss in fillrate.

    Anyway, the big question about FAA is whether or not it fully-filters all samples. Unfortunately, I have a suspicion that it does.

    That is, 16x FAA could simply take 16 point samples, instead of 16 fully-filtered samples. The result would be a reduction in required fillrate of at least 4x.

    However, I would suspect that if Matrox did this, they could probably also do multisampling as a fallback.

    So, if Matrox does fully-filter all 16 samples, then we're talking about halving the performance with around 6% fragment coverage. I personally don't really like this as it would probably put a very noticeably larger performance hit on very complex scenes (trees, grass, etc.).

    If Matrox uses point sampling, and each pixel pipeline can put out four point samples per clock, then we're talking about only a 12.5% performance hit at the same coverage, and I doubt it would get much higher.

    Of course, this doesn't take into account the added overhead of sorting and storing all of the fragments, but we can hope that that overhead will be relatively small.
     
  5. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    Chalnoth, I'm not sure I follow you. Can you define what you mean by point sample vs fully filtered?
     
  6. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    Point sample = one texture sample.

    Fully-filtered = at least four texture samples (8 for trilinear, possibly more for anisotropic).

    If any pixel pipeline is used to its maximum efficiency, the performance limitation lies in the number of texture samples the pixel pipeline can generate and average each clock.

    For example, all GeForce cards can be considered 32-texel pipelines (at least, to my knowledge...I do know the GF1/2 follow this...benchmarks seem to indicate the GF3/4 are the same...). That is, they can filter 32 "texture pixels" per clock, or four trilinear-filtered textures per clock, and, at least in the case of the GeForce2, eight bilinear-filtered textures per clock.

    So, if, when FSAA is enabled, the hardware doesn't bother to do the texture filtering, but instead uses the same bilinear pixel pipelines to handle four individual pixel outputs instead of one filtered pixel, very good performance could be achieved. The image quality difference shouldn't be noticeable...

    But, it also stands to reason that if the Parhelia could do this, then it could also do multisampling, which it cannot. So, this whole line of reasoning may be pretty shaky, and we may, unfortunately, be forced to expect a 25%-50% performance hit, depending on the scene.
     
  7. Freon

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    38
    Likes Received:
    0
    Does it only take a 25% performance hit, or does it just start off "slow" in the first place?

    I think it is a bit of both.

    Ya know back when it was first released, people lambasted Geforce 3 cards for taking up to a 40% hit for anisotropic. But they started off so bloody fast than 40% hit wasn't such a big deal. And when newer games came out, it was easily to just backoff on anisotropic by a notch or two and regain 20-30% performance and still be getting 60+fps.
     
  8. K.I.L.E.R

    K.I.L.E.R Retarded moron
    Veteran

    Joined:
    Jun 17, 2002
    Messages:
    2,952
    Likes Received:
    50
    Location:
    Australia, Melbourne
    FAA = Full Anti Aliasing?
     
  9. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    fragment, I think
     
  10. Foodman

    Newcomer

    Joined:
    Feb 7, 2002
    Messages:
    51
    Likes Received:
    0
    Location:
    I'm Lost
    I'm actually pretty happy with how the card is performing. I don't even see the point of playing a game without FSAA anymore.
     
  11. Randell

    Randell Senior Daddy
    Veteran

    Joined:
    Feb 14, 2002
    Messages:
    1,869
    Likes Received:
    3
    Location:
    London
    Well now I know. I had assumed wrongly extra bandwidth would help, I didnt appreciate it was purely fillrate.

    FAA looks great, but where it doesn't work, the 4xOGSS looks unusable, worse than the 8500 because the raw numbers are worse.
     
  12. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    Well, actually the FSAA method being "completely fillrate" is somewhat misleading.

    The truth is that it has the exact same hit on both fillrate and memory bandwidth, meaning that increasing one will not improve the performance hit.
     
  13. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    I'm quite sure that Parhelia 16xFAA uses something like multisampling with a coverage mask. (Like a stripped down version of 3Dlabs SuperScene.)
     
  14. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    In the way it's performed, FAA is nothing like multisampling. It's basically edge anti-aliasing in hardware (previously, edge AA has always been largely software-based). But yes, in its results, it is somewhat like multisampling, as in both, only edges are anti-aliased.
     
  15. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    At http://www.matrox.com/mga/products/tech_info/pdfs/parhelia/faa_16x.pdf page 5
    I didn't mean multisamplig like in GF4. I read the above quote as a coverage mask and one color per edge. It should also have a z-value, probably one per edge.

    There is little use to supersample the texture at the edges. Yes, supersampling the texture can increase the texture quality. But if they have detected that the pixel needs AA since it's at an edge, then the color difference between the surfaces are likely much greater than the variance from the texture SS. So why spend time and memory on colors for individual sub-pixels?
     
  16. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    The performance results seem to coincide more closely with using pure super-sampling on the edges.

    And yes, I certainly agree that there is no need for using supersampling on edge pixels with FAA in use.

    Additionally, the only significant thing that multisampling FSAA needs in order to work is multiple z-checks per pixel pipeline. Since we know that the Parhelia needs to do 16 z-checks for each AA'd pixel with FAA, if the Parhelia does do something similar to multisampling, then they'd need extra z-check pipelines. If the Parhelia could do multiple z-checks per pixel, then the Parhelia could almost certainly also do multisampling FSAA.
     
  17. Randell

    Randell Senior Daddy
    Veteran

    Joined:
    Feb 14, 2002
    Messages:
    1,869
    Likes Received:
    3
    Location:
    London
    OK thanks for the clarification :)
     
  18. Basic

    Regular

    Joined:
    Feb 8, 2002
    Messages:
    846
    Likes Received:
    13
    Location:
    Linköping, Sweden
    In the same document
    If 5%-10% of a scene would take 16 times longer, then the total frame would take 1.75-2.5 times longer. Or in other words 43%-60% framerate hit. And then I'm not counting the extra work from the irregular memory access and list management. And for the pixels that don't get AAed, they still have to keep track that they don't.

    I don't think that matches the 17%-35% performance hit I've seen in reviews.


    One thing I said in the last post were one z-value per edge and pixel. If they want to detect intersections, then they could add slopes. If they take that aproach, then it's understandable that it's difficult to reuse it for GF4 style multisampling.
     
  19. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,435
    Likes Received:
    263
    Ahh. I didn't realize you were talking about textures. I believe that Basic is correct in his assumption that FAA stores one color per edge. This assumption is based on the fact that Matrox said they don't mess with textures.
     
  20. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    But I don't believe it is close to 10% of the scene in current games, let alone 5%.

    Don't forget that Matrox' FAA attempts to only AA object edges, not poly edges (So that internal model tris that wouldn't make any difference when AA'd aren't touched). I believe I heard them state that current games have around 3%-5% edge coverage. That would coincide well with the performance hits we've been seeing, if the FAA uses super-sampling for the fragment pixels. Additionally, given that most games today aren't all that detailed, a 3% number would be pretty good for most games today...

    A good way to test would be to find a benchmark that has the largest performance hit from enabling FAA, and compare the performance hit for enabling FAA when aniso is enabled and disabled. If supersampling were to be used for the fragment pixels, the performance hit should be similar or identical. If only one color is used per triangle is contributed, then there should be a noticeably smaller performance hit when FAA is enabled (as seen in the GF3/4).
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...