Parhelia - FAA & OGSS Performance hit

Randell

Senior Daddy
Veteran
Those screenies being posted at the MURC forum show impressive edge-aliaising & 16xFAA is only meant to have a 25% performance hit. On games that are incompatible there is 4xOGSS to fall back to which is meant to have a bigger hit than FAA. Can anyone speculate what they will be, I assum it will be nowhere near the same as the 8500 takes for 4xSmoothvision in modern games.
 
I'd guess that it will have a similar performance hit to ATI's method. The only way I can see it being faster is if Matrox can use the extra texture pipes to accellerate supersampling when only dual texturing is being done. Although maybe the extra memory bandwidth of Parhelia will help. I've been curious about this too. Hopefully someone will benchmark supersampling.
 
Well Matrox have hinted at several memory controllers within the 256bit bus, so I hoped the perfromance drop off at say 1024x768x32bit wouldn't be as great as with the 8500.
 
I don't see how more effective memory bandwidth is going to help the Parhelia any with supersampling performance...most of its performance hit comes from a loss in fillrate.

Anyway, the big question about FAA is whether or not it fully-filters all samples. Unfortunately, I have a suspicion that it does.

That is, 16x FAA could simply take 16 point samples, instead of 16 fully-filtered samples. The result would be a reduction in required fillrate of at least 4x.

However, I would suspect that if Matrox did this, they could probably also do multisampling as a fallback.

So, if Matrox does fully-filter all 16 samples, then we're talking about halving the performance with around 6% fragment coverage. I personally don't really like this as it would probably put a very noticeably larger performance hit on very complex scenes (trees, grass, etc.).

If Matrox uses point sampling, and each pixel pipeline can put out four point samples per clock, then we're talking about only a 12.5% performance hit at the same coverage, and I doubt it would get much higher.

Of course, this doesn't take into account the added overhead of sorting and storing all of the fragments, but we can hope that that overhead will be relatively small.
 
Chalnoth, I'm not sure I follow you. Can you define what you mean by point sample vs fully filtered?
 
Point sample = one texture sample.

Fully-filtered = at least four texture samples (8 for trilinear, possibly more for anisotropic).

If any pixel pipeline is used to its maximum efficiency, the performance limitation lies in the number of texture samples the pixel pipeline can generate and average each clock.

For example, all GeForce cards can be considered 32-texel pipelines (at least, to my knowledge...I do know the GF1/2 follow this...benchmarks seem to indicate the GF3/4 are the same...). That is, they can filter 32 "texture pixels" per clock, or four trilinear-filtered textures per clock, and, at least in the case of the GeForce2, eight bilinear-filtered textures per clock.

So, if, when FSAA is enabled, the hardware doesn't bother to do the texture filtering, but instead uses the same bilinear pixel pipelines to handle four individual pixel outputs instead of one filtered pixel, very good performance could be achieved. The image quality difference shouldn't be noticeable...

But, it also stands to reason that if the Parhelia could do this, then it could also do multisampling, which it cannot. So, this whole line of reasoning may be pretty shaky, and we may, unfortunately, be forced to expect a 25%-50% performance hit, depending on the scene.
 
Does it only take a 25% performance hit, or does it just start off "slow" in the first place?

I think it is a bit of both.

Ya know back when it was first released, people lambasted Geforce 3 cards for taking up to a 40% hit for anisotropic. But they started off so bloody fast than 40% hit wasn't such a big deal. And when newer games came out, it was easily to just backoff on anisotropic by a notch or two and regain 20-30% performance and still be getting 60+fps.
 
I'm actually pretty happy with how the card is performing. I don't even see the point of playing a game without FSAA anymore.
 
Well now I know. I had assumed wrongly extra bandwidth would help, I didnt appreciate it was purely fillrate.

FAA looks great, but where it doesn't work, the 4xOGSS looks unusable, worse than the 8500 because the raw numbers are worse.
 
Well, actually the FSAA method being "completely fillrate" is somewhat misleading.

The truth is that it has the exact same hit on both fillrate and memory bandwidth, meaning that increasing one will not improve the performance hit.
 
I'm quite sure that Parhelia 16xFAA uses something like multisampling with a coverage mask. (Like a stripped down version of 3Dlabs SuperScene.)
 
Basic said:
I'm quite sure that Parhelia 16xFAA uses something like multisampling with a coverage mask. (Like a stripped down version of 3Dlabs SuperScene.)

In the way it's performed, FAA is nothing like multisampling. It's basically edge anti-aliasing in hardware (previously, edge AA has always been largely software-based). But yes, in its results, it is somewhat like multisampling, as in both, only edges are anti-aliased.
 
At http://www.matrox.com/mga/products/tech_info/pdfs/parhelia/faa_16x.pdf page 5
Matrox said:
...The fragment buffer maintains fragment lists, which contain information about a particular fragment pixel. Specifically, the fragment list stores sub-pixel coverage and color information for each of the edges that intersect the pixel. ...

I didn't mean multisamplig like in GF4. I read the above quote as a coverage mask and one color per edge. It should also have a z-value, probably one per edge.

There is little use to supersample the texture at the edges. Yes, supersampling the texture can increase the texture quality. But if they have detected that the pixel needs AA since it's at an edge, then the color difference between the surfaces are likely much greater than the variance from the texture SS. So why spend time and memory on colors for individual sub-pixels?
 
The performance results seem to coincide more closely with using pure super-sampling on the edges.

And yes, I certainly agree that there is no need for using supersampling on edge pixels with FAA in use.

Additionally, the only significant thing that multisampling FSAA needs in order to work is multiple z-checks per pixel pipeline. Since we know that the Parhelia needs to do 16 z-checks for each AA'd pixel with FAA, if the Parhelia does do something similar to multisampling, then they'd need extra z-check pipelines. If the Parhelia could do multiple z-checks per pixel, then the Parhelia could almost certainly also do multisampling FSAA.
 
Chalnoth said:
Well, actually the FSAA method being "completely fillrate" is somewhat misleading.

The truth is that it has the exact same hit on both fillrate and memory bandwidth, meaning that increasing one will not improve the performance hit.

OK thanks for the clarification :)
 
In the same document
Matrox said:
Fragment pixels typically account for less than five to ten percent of the total number of pixels in a scene.

If 5%-10% of a scene would take 16 times longer, then the total frame would take 1.75-2.5 times longer. Or in other words 43%-60% framerate hit. And then I'm not counting the extra work from the irregular memory access and list management. And for the pixels that don't get AAed, they still have to keep track that they don't.

I don't think that matches the 17%-35% performance hit I've seen in reviews.


One thing I said in the last post were one z-value per edge and pixel. If they want to detect intersections, then they could add slopes. If they take that aproach, then it's understandable that it's difficult to reuse it for GF4 style multisampling.
 
Chalnoth said:
Point sample = one texture sample.

Fully-filtered = at least four texture samples (8 for trilinear, possibly more for anisotropic).

Ahh. I didn't realize you were talking about textures. I believe that Basic is correct in his assumption that FAA stores one color per edge. This assumption is based on the fact that Matrox said they don't mess with textures.
 
Basic said:
If 5%-10% of a scene would take 16 times longer, then the total frame would take 1.75-2.5 times longer.

But I don't believe it is close to 10% of the scene in current games, let alone 5%.

Don't forget that Matrox' FAA attempts to only AA object edges, not poly edges (So that internal model tris that wouldn't make any difference when AA'd aren't touched). I believe I heard them state that current games have around 3%-5% edge coverage. That would coincide well with the performance hits we've been seeing, if the FAA uses super-sampling for the fragment pixels. Additionally, given that most games today aren't all that detailed, a 3% number would be pretty good for most games today...

A good way to test would be to find a benchmark that has the largest performance hit from enabling FAA, and compare the performance hit for enabling FAA when aniso is enabled and disabled. If supersampling were to be used for the fragment pixels, the performance hit should be similar or identical. If only one color is used per triangle is contributed, then there should be a noticeably smaller performance hit when FAA is enabled (as seen in the GF3/4).
 
Back
Top