Alternative AA methods and their comparison with traditional MSAA*

Ported FXAA2 this morning to our engine. Did some Xbox 360 microcode ASM optimizations to the tfetches (to remove some ALU instructions), but nothing else. It looks really good actually. Way better than 2xMSAA or our hacky temporal AA (i'd say it's very much comparable to 4xMSAA in quality). Textures stay sharp, and it properly antialiases all our foliage/trees (we have a lot of vegetation) and noisy specular highlights.

For 1280x672 (93% of 720p pixels) it runs at 1.2ms. It's tfetch bound, so I will likely integrate it to our huge post process shader (that is currently ALU bound). It would balance the load nicely. Total cost for AA would be around 1ms then :)

Another way to make it run faster is to make it read more than one luminance value by one tfetch. Sadly gather instruction is not available on consoles, making the straightforward R8 luminance sampling (4 neightbour texels to rgba in one instruction) not possible. That would make it faster than 1.0ms.
 
Great news Sebbi!Could you tell how much was 2xMSAA cost in comparison?Does it soften image a bit? Oh,and if some of you developer guys know if there will be MLAA code in 360 XDK I guess they could release code in coming months?
 
Great news Sebbi!Could you tell how much was 2xMSAA cost in comparison?Does it soften image a bit?
We have a fully deferred renderer, so 2xMSAA would cost a lot. Naive implementation would double the lighting cost (currently our lighting is around 25% of our frame time). There are some clever tiled deferred renderers (for example in Black Rock's Split Second) that do sample frequency lighting only to tiles that have MSAA edges, but even with really small 4x4 tiles huge amount of pixels require double lighting (one edge pixel in tile requires the whole tile to be lighted by sample precision). With DX11 (compute shaders) you can do even more fine grained lighting, but that's quite hard to do for consoles. And in addition to the double lighting cost, you need memory space (and bandwidth) to resolve the 2xMSAA deferred buffers, and naturally (on Xbox) you need to multipass some geometry, as 2xMSAA deferred buffers take a lot of EDRAM space. Post process antialiasing doesn't require any additional memory, as you already have your g-buffers in memory, and you can reuse those memory locations after lighting is complete.

So, basically 60 fps deferred renderer with 2xMSAA is pretty much impossible to do on current consoles. Post process AA is pretty much the only choice. Fortunately FXAA2 looks better than 4xMSAA on average case, and it doesn't blur the image as much as other techniques such as DLAA do, and still does very good job in removing all the jaggies. And unlike MSAA, it also antialiases all transparencies and can be easily done after tone mapping (antialiasing before tone mapping is just wrong). Surprisingly it looks pretty good even on one pixel wide geometry (such as power lines and fences), but doesn't help with subpixel geometry (like other post antialias filters such as MLAA).
 
Last edited by a moderator:
Preliminary support for FXAA in the latest beta-drivers (Rel275) from NV:

nn1gtrcb48h.gif
 
We have a fully deferred renderer, so 2xMSAA would cost a lot. Naive implementation would double the lighting cost (currently our lighting is around 25% of our frame time). There are some clever tiled deferred renderers (for example in Black Rock's Split Second) that do sample frequency lighting only to tiles that have MSAA edges, but even with really small 4x4 tiles huge amount of pixels require double lighting (one edge pixel in tile requires the whole tile to be lighted by sample precision). With DX11 (compute shaders) you can do even more fine grained lighting, but that's quite hard to do for consoles. And in addition to the double lighting cost, you need memory space (and bandwidth) to resolve the 2xMSAA deferred buffers, and naturally (on Xbox) you need to multipass some geometry, as 2xMSAA deferred buffers take a lot of EDRAM space. Post process antialiasing doesn't require any additional memory, as you already have your g-buffers in memory, and you can reuse those memory locations after lighting is complete.

So, basically 60 fps deferred renderer with 2xMSAA is pretty much impossible to do on current consoles. Post process AA is pretty much the only choice. Fortunately FXAA2 looks better than 4xMSAA on average case, and it doesn't blur the image as much as other techniques such as DLAA do, and still does very good job in removing all the jaggies. And unlike MSAA, it also antialiases all transparencies and can be easily done after tone mapping (antialiasing before tone mapping is just wrong). Surprisingly it looks pretty good even on one pixel wide geometry (such as power lines and fences), but doesn't help with subpixel geometry (like other post antialias filters such as MLAA).
Thank you Sebbi for big explanation,it seems like this is your way to go with AA on your games,right?I'm quite a bit surprised it is twice cheaper than DLAA while still giving arguably better results.
 
So it appears they are planning on rolling out FXAA to fight MLAA. I was expecting them to use SRAA there but the real question is what cards will be able to use it.
 
Apparently forced FXAA in the driver is working only for OpenGL applications. As a post-process filter screen-shots are not possible, similar to MLAA. GUI elements in applications are being blurred too, just like MLAA.
 
You should be able to capture screenshoots with the snipping tool from vista/7. Also do we have any idea which series of cards it is available for yet? Obviously the 5xx/4xx at the very least.
 
You should be able to capture screenshoots with the snipping tool from vista/7. Also do we have any idea which series of cards it is available for yet? Obviously the 5xx/4xx at the very least.
Thanks for the tip!

Here is the result: NoAA - 4xMSAA - FXAA

p.s.: The filter works in Photoshop too, with enabled GPU acceleration. ;)
 
Last edited by a moderator:
So it appears they are planning on rolling out FXAA to fight MLAA. I was expecting them to use SRAA there but the real question is what cards will be able to use it.
You can't really force SRAA in the drivers without per-application cleverness. It requires rendering additional sub-samples for at least position and usually normal as well.

Here is the result: NoAA - 4xMSAA - FXAA
FXAA also makes it brighter! :p
 
We have a fully deferred renderer, so 2xMSAA would cost a lot. Naive implementation would double the lighting cost (currently our lighting is around 25% of our frame time). There are some clever tiled deferred renderers (for example in Black Rock's Split Second) that do sample frequency lighting only to tiles that have MSAA edges, but even with really small 4x4 tiles huge amount of pixels require double lighting (one edge pixel in tile requires the whole tile to be lighted by sample precision).

A stencil mask can eliminate much of the overhead. Both consoles have very effective Hi-Stencil. The main bottleneck is generating the mask.
 
A stencil mask can eliminate much of the overhead. Both consoles have very effective Hi-Stencil. The main bottleneck is generating the mask.
That's true. The most efficient mask generation I know uses centroid sampling trick (subtract centroid interpolated value from center interpolated to detect an edge). This however only works properly for 2xMSAA on current hardware, as 4xMSAA returns center value for all 3/4 subsample patterns, and it doesn't detect transparency clip edges (or shader specular aliasing). Of course you need extra space in your g-buffer to store the edge bit (the extra 2 bits of 10-10-10-2 RT are good for this). On consoles you can write directly from the pixel shader to the stencil buffer (set your color RT memory address to point to the DS-buffer), so you can copy the mask bits later to the stencil buffer pretty painlessly. However Hi-stencil works in 4x4 or larger blocks, so this is not any more efficient than 4x4 based tiled deferred. One edge pixel causes 16 pixels to be lighted at sample precision. And you need to resolve (copy) all the g-buffers at sample precision from the EDRAM (with 4xMSAA that alone is over 1ms extra... you could do whole FXAA2 processing in the same time as copying the samples to the main memory). And tiling adds cost also, since some of the geometry needs to be drawn twice (or tree times for 4xMSAA). 2xMSAA is already behind before the lighting step even begins. And it depends entirely on the scene contents how many 4x4 blocks have edges, and require sample precision lighting (field of grass can be really bad for example). So the extra lighting cost varies from frame to frame. FXAA2 perf hit is always the same.I tend to prefer techniques with constant performance hit (to achieve good minimum frame rate).

And then there is the tonemapping (and gamma) issue as well. You have to keep the edge blocks at sample frequency until you do tone mapping and gamma. So basically you need to do your post processing also with stencil masked sample frequency (bloom combine, low res particle buffer combine, etc, should be done before tone mapping). Doing this properly adds extra cost. So basically you cannot do 2xMSAA in under 3ms (3ms is 18% of your frame time if you aim at 60 fps). And 2xMSAA antialiasing quality isn't worth that big sacrifice. And it doesn't do anything to transparency edges and specular aliasing.

Here is the result: NoAA - 4xMSAA - FXAA
I don't like how PC drivers apply post process AA after UI rendering. It makes text look blurry. Post AA should be applied before UI rendering.
 
Last edited by a moderator:
Last edited by a moderator:
Back
Top