SSAA vs. MSAA debate

poly-gone

Newcomer
Moderator: This, and many of the following posts in this thread, have been moved from the 3DMark06 discussion. N00b's original comment was made here:

http://www.beyond3d.com/forum/showpost.php?p=679234&postcount=563

N00b said:
Adding MSAA with a pixel shader on the other hand is not so simple. I wonder if it can be done and how? You could probably do SSAA, which would not really be comparable and performance would suck.
SSAA is superior to MSAA, and gives much better IQ at lower "x". And it's not exactly slow(er) if done right, of course it might not be viable to get 8x (or even 4x) SSAA on the current generation cards, but 2x SSAA is certainly possible, and you'd get better quality than 4x MSAA.
 
poly-gone said:
SSAA is superior to MSAA, and gives much better IQ at lower "x". And it's not exactly slow(er) if done right, of course it might not be viable to get 8x (or even 4x) SSAA on the current generation cards, but 2x SSAA is certainly possible, and you'd get better quality than 4x MSAA.
Er, supersampling is always going to be much slower than multisampling, because supersampling requires full-pixel computation for all subsamples. And I would definitely beg to differ that 2x supersampling would be better quality than 4x multisampling. Yes, you get slightly better texture quality, but worse edge anti-aliasing. And we've got transparency AA to deal with alpha test surfaces these days.
 
Chalnoth said:
Er, supersampling is always going to be much slower than multisampling, because supersampling requires full-pixel computation for all subsamples. And I would definitely beg to differ that 2x supersampling would be better quality than 4x multisampling. Yes, you get slightly better texture quality, but worse edge anti-aliasing. And we've got transparency AA to deal with alpha test surfaces these days.
Er, my personal tests differ from your opinion ;).
 
poly-gone said:
Er, my personal tests differ from your opinion ;).
Then your personal tests aren't taking something into account. I mean, it's a simple fact that more computation is required for supersampling. It's just an inherently-expensive operation. So it's going to be slower if you're limited by processing power, which GPU's of today usually are (and I don't see this changing in the future, either).
 
Chalnoth said:
Then your personal tests aren't taking something into account. I mean, it's a simple fact that more computation is required for supersampling. It's just an inherently-expensive operation. So it's going to be slower if you're limited by processing power, which GPU's of today usually are (and I don't see this changing in the future, either).
Nope, my code runs at about 5 fps lesser with 2x SSAA than it does with 4x MSAA, with the former giving better quality.
 
poly-gone said:
Nope, my code runs at about 5 fps lesser with 2x SSAA than it does with 4x MSAA, with the former giving better quality.
I hate to be rude, but your "personal tests" are irrevelant because they make neither theorical nor practical sense. You most obviously tested a scenario that has completely unrealistic performance characteristics and does not represent average GPU utilization. Furthermore, saying that 2xSSAA systematically gives better quality than 4xMSAA is ridiculous, and presenting that as an objective fact anyone would agree with. Using a FPS difference, and not a performance drop percentage, as a discussion basis is even more laughable. Please, go hide yourself.

Uttar
 
Chalnoth said:
Then your personal tests aren't taking something into account. I mean, it's a simple fact that more computation is required for supersampling. It's just an inherently-expensive operation. So it's going to be slower if you're limited by processing power, which GPU's of today usually are (and I don't see this changing in the future, either).

If in a bandwidth limited situation, shouldn't SSAA and MSAA perform about the same?
And does a TBDR give free SSAA?
 
Fox5 said:
If in a bandwidth limited situation, shouldn't SSAA and MSAA perform about the same?
And does a TBDR give free SSAA?
No on both counts.

The output for multisampling is highly-compressible. The output for supersampling is not. So the bandwidth required for multisampling can be much lower, given framebuffer compression.

TBDR only has unlimited bandwidth (for framebuffer usage). This makes multisampling AA mostly free (not completely: you still need to calculate more samples in pixels that share two or more triangles). Supersampling will approach the theoretical limits of 1/2 the performance for 2x AA, for example, much more closely on a TBDR than on a traditional renderer: that is, you may notice the performance hit even more.
 
Fox5 said:
If in a bandwidth limited situation, shouldn't SSAA and MSAA perform about the same?
Not likely. Since MSAA only requires one pixel shader loop per pixel, it will use less bandwidth than SSAA which effectively has to do n loops per pixel. Comparing the performance of 2x SSAA vs. 4x MSAA, I expect that MSAA would be faster on current chips in the majority of cases. If your tesselation is very high (say subpixel sized triangles) then MSAA can go very slowly as their would be a lot of edges, but that's an unlikely scenario.
And does a TBDR give free SSAA?
Not likely. Quite possible to get "free" MSAA however, but SSAA costs fillrate no matter what architecture you have. If you design your chip to give "free" SSAA it would have to come at the expense of a lot of silicon and would mean that all of that power would go to waste when not doing SSAA.
 
poly-gone said:
Nope, my code runs at about 5 fps lesser with 2x SSAA than it does with 4x MSAA, with the former giving better quality.
Absolute performance drops don't mean much. Also, doesn't this show that 2x SSAA is slower than 4x MSAA? Edge quality in the MSAA case will be much improved, assuming that you are using a rotated grid.
 
Chalnoth said:
No on both counts.

The output for multisampling is highly-compressible. The output for supersampling is not. So the bandwidth required for multisampling can be much lower, given framebuffer compression.

TBDR only has unlimited bandwidth (for framebuffer usage). This makes multisampling AA mostly free (not completely: you still need to calculate more samples in pixels that share two or more triangles). Supersampling will approach the theoretical limits of 1/2 the performance for 2x AA, for example, much more closely on a TBDR than on a traditional renderer: that is, you may notice the performance hit even more.

I remember 3dfx's antialiasing usually had an exactly half performance hit for 2xAA, and a quarter for 4x. I remember that was a bullet point for the voodoo5 over the geforce 2.
 
Fox5 said:
I remember 3dfx's antialiasing usually had an exactly half performance hit for 2xAA, and a quarter for 4x. I remember that was a bullet point for the voodoo5 over the geforce 2.
Both used supersampling, though the Voodoo5 was more efficient because it was designed with AA in mind.
 
Fox5 said:
I remember 3dfx's antialiasing usually had an exactly half performance hit for 2xAA, and a quarter for 4x. I remember that was a bullet point for the voodoo5 over the geforce 2.
Xmas said:
Both used supersampling, though the Voodoo5 was more efficient because it was designed with AA in mind.
I seem to remember that both the GF2 and the Voodoo5 had similar performance hits with AA, but the Voodoo5 had to decrease texture LOD to retain its performance (relative to a higher-resolution image with the same number of samples but no AA).

The primary benefit of the V5's AA over the GF2 was in quality, not performance.
 
Chalnoth said:
I seem to remember that both the GF2 and the Voodoo5 had similar performance hits with AA, but the Voodoo5 had to decrease texture LOD to retain its performance (relative to a higher-resolution image with the same number of samples but no AA).

The primary benefit of the V5's AA over the GF2 was in quality, not performance.
Lower LOD doesn't imply higher quality. However, if you avoid going to a higher LOD then you are assured of getting rid of nearly all texture aliasing. Of course, the result can be overfiltered, but that's the price you pay for not going to the higher LOD.

From my recollection, each VSA-100 chip rendered the scene with a slightly different offset. Thus, since each chip was only rendering in normal, non-AA mode, they used the default LOD. When combined, you got the same amount of detail as with normal rendering, but very good filtering results.

With GF2's SSAA, since everything was being rendered at a higher resolution, LOD was naturally increased. This gave more detail, but didn't remove aliasing as effectively in all cases. Also, GF2 used an ordered grid and 3dfx's parts were using a rotated grid.

Weren't there two types of SSAA on the GF2: One with higher LOD and one not? That's what I recall at least.
 
Chalnoth said:
I seem to remember that both the GF2 and the Voodoo5 had similar performance hits with AA, but the Voodoo5 had to decrease texture LOD to retain its performance (relative to a higher-resolution image with the same number of samples but no AA).

The primary benefit of the V5's AA over the GF2 was in quality, not performance.
The V5 didn't waste fillrate on downsampling, IIRC. In the benches I've seen it generally didn't drop as much in performance as the GF2. The V5 kept the LOD at the same level as without AA because it rendered separate images and a negative LOD bias decreases the texture cache hit rate.


OpenGL guy said:
With GF2's SSAA, since everything was being rendered at a higher resolution, LOD was naturally increased. This gave more detail, but didn't remove aliasing as effectively in all cases. Also, GF2 used an ordered grid and 3dfx's parts were using a rotated grid.
Weren't there two types of SSAA on the GF2: One with higher LOD and one not? That's what I recall at least.
Well, supposed the LOD is "correct", then using the next mip level with 2x2 SSAA should be correct as well. Yes, the early detonators had such a mode, IIRC there even were three different 2x2 modes (one called "special"), 1x2, 1.5x1.5 (OpenGL), and even 3x3 and 4x4. Though the latter obviously only worked in 512x384 and lower.
 
Xmas said:
The V5 didn't waste fillrate on downsampling, IIRC. In the benches I've seen it generally didn't drop as much in performance as the GF2. The V5 kept the LOD at the same level as without AA because it rendered separate images and a negative LOD bias decreases the texture cache hit rate.
Right, the DAC did the downsampling. I agree that this performance difference makes lots of sense. And I still claim that because it made so much sense, that I looked for evidence of it, and didn't find any.

Granted, there may be more evidence of such a performance drop at higher framerates. But I think most programs at the time couldn't run the FSAA settings at high enough framerates for there to be much of a difference.
 
Chalnoth said:
Right, the DAC did the downsampling. I agree that this performance difference makes lots of sense. And I still claim that because it made so much sense, that I looked for evidence of it, and didn't find any.

Granted, there may be more evidence of such a performance drop at higher framerates. But I think most programs at the time couldn't run the FSAA settings at high enough framerates for there to be much of a difference.

Drop to 16-bit color (when most games still weren't even using primarily 16 bit textures... many games still go with primarily 8 bit textures, like the gta series), add 2x AA, and get slightly better dithering than a stock voodoo, which was already equal to matrox and way better than ati and nvidia. (I don't think nvidia even implemented any form of dithering, not sure about ati). Back in the day most games were still memory limited (maybe due to lack of anything like hyper-z and LMA?) so the performance hit from 2x AA and going to 32 bit color were about the same.
 
Uttar said:
I hate to be rude, but your "personal tests" are irrevelant because they make neither theorical nor practical sense. You most obviously tested a scenario that has completely unrealistic performance characteristics and does not represent average GPU utilization. Furthermore, saying that 2xSSAA systematically gives better quality than 4xMSAA is ridiculous, and presenting that as an objective fact anyone would agree with. Using a FPS difference, and not a performance drop percentage, as a discussion basis is even more laughable. Please, go hide yourself.

Uttar
Ooooooooh, calm down :LOL:! You have no idea about how my code works, so the last thing you should be talking about is it making sense (theoritical or practical). OK, I must admit that I should've added "5 fps, on a base of 48 fps", but that's no reason fior you to burst a vein in the head :LOL:.
 
poly-gone said:
Ooooooooh, calm down :LOL:! You have no idea about how my code works, so the last thing you should be talking about is it making sense (theoritical or practical). OK, I must admit that I should've added "5 fps, on a base of 48 fps", but that's no reason fior you to burst a vein in the head :LOL:.
You probably have no idea about it either, considering you fail to properly explain what your code is limited by and why that implies SSAA behaves as it does (hint: your code is not fillrate limited). I can make a test proggy having a 50% SSAA performance loss in 5 minutes if need be. Does that mean a 50% drop is the norm? No, it's the theorical maximum, with the average in realworld scenarios being much nearer to that than your ridiculous numbers.
I'm not saying you're intentionally giving random BS numbers, but I do believe you don't understand how the graphics pipeline works, and why your numbers are irrevelant for realworld scenarios.

Uttar
 
Uttar said:
You probably have no idea about it either, considering you fail to properly explain what your code is limited by and why that implies SSAA behaves as it does (hint: your code is not fillrate limited). I can make a test proggy having a 50% SSAA performance loss in 5 minutes if need be. Does that mean a 50% drop is the norm? No, it's the theorical maximum, with the average in realworld scenarios being much nearer to that than your ridiculous numbers.
I'm not saying you're intentionally giving random BS numbers, but I do believe you don't understand how the graphics pipeline works, and why your numbers are irrevelant for realworld scenarios.

Uttar
On the contrary, I wrote the code so I perfectly know what's going on and what it's limited by. And in fact, IT IS FILLRATE LIMITED because I'm using both HDR Rendering and performing soft-edged shadow mapping (1024x1024 SM).
 
Back
Top