Alternative AA methods and their comparison with traditional MSAA*

Think about deferred renderers. In any case, the algorithm used in GoW is not something easily ported to a GPU. I've been giving it some thought over the months and I don't see how you would do that efficiently, especially on Xenos. Of course, someone might surprise me. :)

What exactly in AAA made it possible on Xenos, if you think it can't do MLAA (within the window) ?
[size=-2]Sounds like the MLAA edge detection is taking up too much time.[/size]

EDIT: Thanks nightshade.
 
MLAA sounds like a pain for 360 cpu and i dont know would it be feasible for Xenos(Joker and somebody else mentioned it could).I feel like AAA is way to go for 360.4A gained performance boost after dropping 2xMSAA for AAA and it looked very comparable.On PC it was AAA~4xMSAA,am i right?So maybe with some improvements you could get similar results on Xenos plus you wont have to tile,and performance will be better as will IQ.
 
What exactly in AAA made it possible on Xenos, if you think it can't do MLAA (within the window) ?
[size=-2]Sounds like the MLAA edge detection is taking up too much time.[/size]

I assume by 'AAA' you mean the technique used by Metro 2033 and not actual analytical anti-aliasing (which is something not even remotely related). My understanding from looking at the images is that for every pixel, they look at a small pixel neighborhood, find edges in there and blur (or maybe even cleverly) blend those to give you the final pixel colour.

So every GPU thread processes a single pixel in this approach. In the MLAA algorithm however, pixels are not independent, but have a rather strict order in which they need to be processed. In other words, MLAA is not embarrassingly parallel and thus hard to implement on a GPU. Edge detection is not the issue.

I'll go back into my cage now. ;)
 
Last edited by a moderator:
So every GPU thread processes a single pixel in this approach. In the MLAA algorithm however, pixels are not independent, but have a rather strict order in which they need to be processed. In other words, MLAA is not embarrassingly parallel and thus hard to implement on a GPU.
That's what I was thinking, and it'd thus need a completely different GPU core architecture to be able to apply MLAA. Unless, as you say, the process can be reengineered for a GPU's structure.
 
MLAA sounds like a pain for 360 cpu and i dont know would it be feasible for Xenos(Joker and somebody else mentioned it could).I feel like AAA is way to go for 360.4A gained performance boost after dropping 2xMSAA for AAA and it looked very comparable.On PC it was AAA~4xMSAA,am i right?So maybe with some improvements you could get similar results on Xenos plus you wont have to tile,and performance will be better as will IQ.

So what exactly is that AAA you feel like it is the way to go?
 
So what exactly is that AAA you feel like it is the way to go?

Well i was reading the interview with 4A Games about their engine and tech and it seemed to me that they got rather good results and performance with AAA in comparison with 2xMSAA(I was wrong,it was not MSAA,checked it again and it was running deferred rotated grid super-sampling).Then again I was totally wrong since they are doing deferred rendering they already had to tile so that kinda contradicts with what i have said in previous post.I guess i thought that AAA could help alot since you would then bypass tiling.Also I got impression that maybe MSAA is not really the way to go since you will have to tile thus have additional geometry overhead.
 
That's what I was thinking, and it'd thus need a completely different GPU core architecture to be able to apply MLAA. Unless, as you say, the process can be reengineered for a GPU's structure.

It has been a common mantra that for parallel computing and especially that required by the SPEs developers need to think outside the box and re-think solutions to known problems. Maybe it is time some publishers begin cracking the Xbox developer whip and expect the same. Maybe we would see more ingenious solutions. Or maybe MSAA is "good enough" for most of them.
 
Ah ! What are you doing here ?!

You should be chained to a R* desk or a GG desk by now, [size=-2] and cancel all vacations.[/size]


See? That's Sony fanboys for you. You toil day and night, and they cancel your vacations ;-)
 
MLAA sounds like a pain for 360 cpu and i dont know would it be feasible for Xenos(Joker and somebody else mentioned it could).I feel like AAA is way to go for 360.4A gained performance boost after dropping 2xMSAA for AAA and it looked very comparable.On PC it was AAA~4xMSAA,am i right?So maybe with some improvements you could get similar results on Xenos plus you wont have to tile,and performance will be better as will IQ.

I'm going to have to disagree with this. Metro's AAA really wasn't a very good solution at all. Even in those still pics you can see that anything approaching a straight vertical or horizontal line gains no edge smoothing and the lack of any sub pixel rendering was a serious issue. I'd have took bog standard 2xmsaa over it, it really didn't help the overall image quality much at all and may have introduced some side effects as well.
 
Ah ! What are you doing here ?!

You should be chained to a R* desk or a GG desk by now, [size=-2] and cancel all vacations.[/size]

See? That's Sony fanboys for you. You toil day and night, and they cancel your vacations ;-)

:LOL: That's what's happening to me now and for the rest of my 2010.
Not related to fanboyism at all. It's called "Sh*t Happens". ^_^


EDIT:
I assume by 'AAA' you mean the technique used by Metro 2033 and not actual analytical anti-aliasing (which is something not even remotely related). My understanding from looking at the images is that for every pixel, they look at a small pixel neighborhood, find edges in there and blur (or maybe even cleverly) blend those to give you the final pixel colour.

So every GPU thread processes a single pixel in this approach. In the MLAA algorithm however, pixels are not independent, but have a rather strict order in which they need to be processed. In other words, MLAA is not embarrassingly parallel and thus hard to implement on a GPU. Edge detection is not the issue.

Okie, I understand the issue on hand better now. Thanks for the info. What's a good name for Metro's AA scheme if it's not AAA technically ?

I'll go back into my cage now. ;)

*hugs* T.B.
 
So every GPU thread processes a single pixel in this approach. In the MLAA algorithm however, pixels are not independent, but have a rather strict order in which they need to be processed. In other words, MLAA is not embarrassingly parallel and thus hard to implement on a GPU. Edge detection is not the issue.

I'll go back into my cage now. ;)

Reading the MLAA paper, it notes that the first step is to find the "edges", where only the longest are considered (primary edge), which then are split up into L-shape structure to apply the color averaging using a connecting triangle (or its respective area)!
Now, when you want to make a parallel version of this algorithm to fire up all SPUs...for instance with domain decomposition technique:

-Considering 4 SPUs, one should split the image at least in for equal pieces to process each piece independently.

-If you use this patern detection indepentently for each piece of the image...the number of pattern and especially their shapes ('longest primary edge') could change, right?
-Especially at the 'artificial' boundaries of the single sub-domains...

-If the number and form of the pattern changes, the triangle you use to determine the new color of the pixels differs compared to the single SPU case, thus the resulting color differs, thus the anti-aliasing of the image differs

-Typically, if you want good load balancing, you should split the image in more than four pieces, which exaggerates this problems.

-The problem with respect to load balancing I see is that in theory it could well be that one SPU detects no edges in its sub-domain, thus sitting around while the others do their hard averaging work, if no special care is taken in such sitations (i.e. dynamic load balancing!)

What interests me:
- Can one generally say, that the shorter primary edges due to the domain decomposition yield a worse IQ when using the triangles to average, compared to the single SPU case?

If this is right, this could be a major drawback of the algorithm...because the only alternative I see with respect to a parallel version of this algorithm is to somehow communicate with neighbor domains to find the unique pattern - this smells like a difficult "quality verus SPU time" quest!
 
The more general question is, besides MLAA, are there alternate algorithms/subsytems that are not embarrassingly parallelizable in the entire graphics pipeline ?

EDIT:
If this is right, this could be a major drawback of the algorithm...because the only alternative I see with respect to a parallel version of this algorithm is to somehow communicate with neighbor domains to find the unique pattern - this smells like a difficult "quality verus SPU time" quest!

In some problems, you may overlap the problem space, and have the SPUs recalculate the results in the overlapped areas. That way one reduces the amount of communication between the SPUs.

Another common trick is to rearrange the data (e.g., stash the intermediate results somewhere convenient/shared), so that the SPUs fetch them together with the input data.

Not sure if these tricks will work in MLAA since I have not studied it. :p
 
The more general question is, besides MLAA, are there alternate algorithms/subsytems that are not embarrassingly parallelizable in the entire graphics pipeline ?

EDIT:


In some problems, you may overlap the problem space, and have the SPUs recalculate the results in the overlapped areas. That way one reduces the amount of communication between the SPUs.

Another common trick is to rearrange the data (e.g., stash the intermediate results somewhere convenient/shared), so that the SPUs fetch them together with the input data.

Not sure if these tricks will work in MLAA since I have not studied it. :p

Overlapping of domains is a valid option. Although it decreases parallel efficiency the more domains (i.e. the more domain boundary) you have...but more domains could be desired due to performance/load balancing...we here typically try to come up with an algorithm which (at least in theory) scales ideal :smile:
Another problem I see is that you don't know how much domain overlap you need a priori (as the pattern, hence the extension into the neighbor domain, are unknown when you decompose the image)

I don't understand what you mean with your data rearrange...maybe you could be more specific?
 
Last edited by a moderator:
The more general question is, besides MLAA, are there alternate algorithms/subsytems that are not embarrassingly parallelizable in the entire graphics pipeline ?

The thing is, we haven't put in the graphics pipeline things that are not embarrassingly parallelizable; this doesn't mean there aren't other domains interesting for computer graphics, which fall into that category.

Classical radiosity, for example, where every patch of the scene interacts with every other, is not embarrassingly parallelizable. I thing the updating of hierarchies needed for raytracing of dynamic scenes isn't, either.
 
I don't understand what you mean with your data rearrange...maybe you could be more specific?

I can't ! (without knowing the MLAA calculations)

The issue is: We need to partition the data but they depend on data in other partitions. In some problems, we can allocate the SPUs and organize the data in such a way that a worker SPU can get partial result from other SPUs first. Then when the depended variable arrives, resolve the rest.

The thing is, we haven't put in the graphics pipeline things that are not embarrassingly parallelizable; this doesn't mean there aren't other domains interesting for computer graphics, which fall into that category.

Classical radiosity, for example, where every patch of the scene interacts with every other, is not embarrassingly parallelizable. I thing the updating of hierarchies needed for raytracing of dynamic scenes isn't, either.

Yeah, that's what I meant. Also, if we revisit some of the existing solutions, will we find new approaches. Embarrassingly parallel algorithms are low hanging fruits. Computer graphics probably has a lot of it. In addition, were there mathematical approximations formulated to exploit the early SIMD GPU architecture for instance ?
 
Has anyone tried MLAA with cel-shaded graphics? I feel IQ is about the only area where cel-shaded loses out to real drawn artwork. If the edges could be AA'd, they'd look spectacular. I'm thinking of DQVIII here on PS2. Lose the jaggies and it'd be close to cartoon quality. Drawing the edges to a separate edge buffer and applying MLAA to that is all it'd take.
 
Has anyone tried MLAA with cel-shaded graphics? I feel IQ is about the only area where cel-shaded loses out to real drawn artwork. If the edges could be AA'd, they'd look spectacular. I'm thinking of DQVIII here on PS2. Lose the jaggies and it'd be close to cartoon quality. Drawing the edges to a separate edge buffer and applying MLAA to that is all it'd take.
MLAA should work great until the edge is too thin.

In cel shading you usually use fins and such for black lines, it would be preferred to do antialiasing while rendering those.
You would still have sub-pixel accuracy and ability to tweak lines when they are smaller than pixel.

Actually, fins might be quite nice solution to get 'cheap' antialiasing for most games.
 
Back
Top