Screen-space AA on GPUs

PeterT

Regular
Since the recent use of screen-space AA techniques in Saboteur has generated a lot of interest (original console technology thread) I thought it would make sense to create a thread dedicated to the techniques, and particularly to implementing them on GPUs (eg. using OpenCL). I've actually been toying with ideas in this space for a long time, with the goal of getting the best filtering of 720p images that is possible in less than 10ms.

First, here are the phases I'd consider in such an approach:
  • edge detection - this can be done based on RGB, luminance, Z, or any combination of those. This step only consists of finding areas of large gradients, those aren't necessarily always ones we want to filter
  • "jaggy" detection - based on information from the previous step, find the start and end of staircase patterns (aliasing)
  • slope calculation - find out the length of line fragments identified by start and end points (can also be dependent on surroundings if you want high quality)
  • weight calculation - based on the slope and start/end points, calculate blending weights for in-between pixels
  • blend - actually perform the blending

Of course, these can be integrated or done separately. Interestingly, all of the steps except for slope and weight calculation can be done quite effectively with traditional pixel shaders, no fancy GPGPU-specific stuff needed. (the problem with weight calculation is that there is an arbitrary number of output values)

Here's a high level view of this:
highlevel.png

Of course, it might make sense to combine the blending into the blend calculation step and never explicitly store the weights, or to start from a luminosity buffer for the edge detection instead of an image if you already have one for some purpose, but those are implementation details.

The interesting stuff clearly happens in the jaggy detection and slope calculation step. Here, I know of one particular method that seems like it could be very successful on GPUs:
"Double line scanning" as proposed here. This would require two passes: one for vertical edges and one for horizontal ones. The drawbacks I can see on GPUs is that you "only" get a degree of parallelism equal to half the number of lines/columns, and that quite a bit of dynamic branching is involved.

That's it for now, any comments are welcome (particularly if you actually implemented any method like this or similar, or are planning to!).
 
Until people start actually developing DX10.1 games it's all a bit arbitrary. IMO it's not worth giving up MSAA for and to combine it with MSAA you need 10.1/11.
 
Well, what I'm actually planning to do is post-AA console games that don't have AA, so in that case it's not a matter of giving up anything. The problem with that use case is that you only get the final output image, so it's important to take care not to blur/deform the UI. And if the game has (even just bad) built-in AA or scales in any way it's very likely to fail.
 
Until people start actually developing DX10.1 games it's all a bit arbitrary. IMO it's not worth giving up MSAA for and to combine it with MSAA you need 10.1/11.
Deferred sharing using MSAA has been doable on gaming consoles for a while (you can directly access the MSAA samples by aliasing a bigger non-MSAA render target over the old MSAAd target). However only a few developers have chosen to use MSAA in their deferred renderers, because the performance hit is much larger compared to MSAA hit on forward renderers. DX10.1 has offered the same possibility for PC, but so far MSAA+deferred rendering hasn't caught up on PC either. Reasons likely being both performance, and low amount of DX10.1 compatible hardware adaptation.
 
Back
Top