Alternative AA methods and their comparison with traditional MSAA*

Christer our Director of Technology tweeted this about MLAA, hope it answers your questions: There's a paper that describes the MLAA algorithm, and the Saboteur effect is probably only a subset of the technique described. AFAIK the version used on GoW3 goes beyond the original paper. The #gow3 AA technique saved 5ms from the GPU, costs ~20ms on 5 SPU's (~4ms Latency), its very pretty and only on #ps3.

http://forums.godofwar.com/t5/God-o...opers-on-Forums-Right-Here/td-p/30061/page/17 (post 5)
 
It's interesting that the "close to mlaa"/"custom AA/AAA(analytical anti-aliasing) used in the in Metro 2033 is done by xenon which backup earlier Joker454's claims on the matter :)
 
This is from the Metro 2033 article:
The closest explanation of the technique I can imagine would be that the shader internally doubles the resolution of the picture using pattern/shape detection (similar to morphological AA) and then scales it back to original resolution producing the anti-aliased version.

It's a little hard to understand, but what I took away from this is that it only identifies regions where there could be edges that need AA and then renders those regions at double resolution before resizing.

I guess you could call it selective 2xSSAA.
 
What interests me: it should be also possible to use the GPU to do this MLAA stuff, right?

A GPU should potentially be faster compared to CELL in number crunching stuff?

If CELL needs 20ms...and they save 5ms GPU time:
How many ms would MLAA cost on RSX in comparison...

In other words...is MLAA in general a faster algorithm compared to MSAA (with even better quality), or is its advantage solely that the CPU can be used to do it, freeing up GPU?!
 
I think it's as much a matter of balancing your resources as anything. RSX is likely already shader bound so moving the process to the CPU where you have a bunch of idling SPUs makes lots of sense. Besides that the RSX may simply not have the level of programmability needed for this technique. It's a DX9 device with dedicated pixel and vertex shaders.
 
/To our dear mods
Don't you think it could be possibly better to close this thread and this one too and start a more generic one about the analytical anti anti aliasing? (blending both thread could prove difficult and time consuming).
It appears that AAA becomes more and more popular among developers and while implementations diverge a unified/singular thread may avoid information scattering on a per game/article basis
//
 
What interests me: it should be also possible to use the GPU to do this MLAA stuff, right?

A GPU should potentially be faster compared to CELL in number crunching stuff?

If CELL needs 20ms...and they save 5ms GPU time:
How many ms would MLAA cost on RSX in comparison...

In other words...is MLAA in general a faster algorithm compared to MSAA (with even better quality), or is its advantage solely that the CPU can be used to do it, freeing up GPU?!
Developers have given their opinion on the matter some pages ago if my memory serves right.
Search this thread for Repi and Joker posts ;)
 
From what I understand from TDMoss's twitter, they implemented MLAA successful with the help of SCEE. So I guess, Killzone 3 is gonna have it instead of the outdated quincunx AA.
 
What interests me: it should be also possible to use the GPU to do this MLAA stuff, right?

A GPU should potentially be faster compared to CELL in number crunching stuff?

If CELL needs 20ms...and they save 5ms GPU time:
How many ms would MLAA cost on RSX in comparison...

In other words...is MLAA in general a faster algorithm compared to MSAA (with even better quality), or is its advantage solely that the CPU can be used to do it, freeing up GPU?!

You have your calculation wrong buddy, Cell does not need 20ms. A single SPU needs 20ms, 5 SPUs cost just 4ms.
 
Developers have given their opinion on the matter some pages ago if my memory serves right.
Search this thread for Repi and Joker posts ;)
----
;)
thanks for the useless answer...
It is the first time we here definitive numbers from the (GOW) devs of a specific game...that is why I am interested in knowing the missing numbers...which I will not find some pages back.
----

Another remark:
It is also the first time one can directly compare the two different AA strategies for the same game and heck, even for the same scenes: the demo uses 2xMSAA and the retail game uses MLAA.
I hope that DF makes a cool comparison to show the difference of the AA methods!
 
You have your calculation wrong buddy, Cell does not need 20ms. A single SPU needs 20ms, 5 SPUs cost just 4ms.

I just followed the classification given by T.B. one page ago:

T.B. said:
That's just how we do it on Cell, TBH. You have 6 cores and while a GPU always runs the same program on all "cores", that's just not true for the SPUs. So if you have a properly parallelisable problem, it makes sense to measure performance in "1 SPU time".

"Example": I have 100ms of SPU time at 60Hz and I budget up to 20ms for a piece of code. Maybe I just run it on 2 SPUs and get 10ms latency. Or I put it on 5 and get 4ms. That decision will depend on scheduling needs, but I still know how much SPU time I've committed.
 
I'm not sure my answer is useless but may be the post I was thinking about were not in this thread (informations are split on multiple threads).
I don't know much either but analytical AA is a software thing with thus various implementations, Saboteur doesn't use MLAA for example (MAA is INtel implementation thus an implementation of AAA), so gap in perfs between MSAA and AAA varies (ceteris paribus) depending on the type on AAA used. It looks like AAA is highly "parrallelizable" thus it should do well on the GPU. Point is in real word there are bottlenecks, who cares if processing takes a bit longer on SPU if you have cycle in spare AND that on the GPU side you need to free some cycles for some tasks that the Cell can't swallow @reasonable performances. My understanding is that for GoW3 on the PS3 it makes sense. Joker454 stated it's possible to do on the GPU, Barbarian too (here, here and here all in this thread by the way).

We now also have some data from the 4A team (more may be to come I dunno if the actual article has been already updated or not).
 
Last edited by a moderator:
Joker454 stated it's possible to do on the GPU, Barbarian too (here, here and here all in this thread by the way).

If you read my post again..you will find out that my intended question was not if it is possible on a GPU (I used the fact that it is possible on a GPU as the reason for my question!), but how well it works on a GPU compared to a SPU implementation.

Just to make it clear...if I use the available numbers, I get the following picture in case of GOW3:

2xMSAA on RSX = 5ms
MLAA on CELL = 20ms SPU (or 4ms netto time, if you can use 5 of em, and if the algorithm scales)

But when a SPU can do the work, a GPU usually can do it as well...so what is missing:

MLAA on RSX = ... > 5ms or < 5ms ?

That is my question, and I am not aware of an available answer in this or another thread, but I would appreciate a link if I missed the post!
 
If you read my post again..you will find out that my intended question was not if it is possible on a GPU (I used the fact that it is possible on a GPU as the reason for my question!), but how well it works on a GPU compared to a SPU implementation.

Just to make it clear...if I use the available numbers, I get the following picture in case of GOW3:

2xMSAA on RSX = 5ms
MLAA on CELL = 20ms SPU (or 4ms netto time, if you can use 5 of em, and if the algorithm scales)

But when a SPU can do the work, a GPU usually can do it as well...so what is missing:

MLAA on RSX = ... > 5ms or < 5ms ?

That is my question, and I am not aware of an available answer in this or another thread, but I would appreciate a link if I missed the post!

It's the point that in a 60fps situation, removing 5ms of workload from the GPU is giving it 33% more time to do something else? It's thus using a different hardware resource to better balance the whole job?
 
I don't know much either but analytical AA is a software thing with thus various implementations, Saboteur doesn't use MLAA for example (MAA is INtel implementation thus an implementation of AAA)

I'm beginning to develop a pet-peeve here.

Analytical AA has nothing to do with post process AA.

Nothing at all. AAA is a method where you find the true color contribution for each primitive touching a pixel. So in effect you take the pixel to have an extend, clip all primitives against it, figure out occlusion and then integrate over the resulting polygons. This is not something you can do in a rasterizer, to my knowledge.

Now back to our regular scheduled discussion.

/PSA
 
I don't understand why others software house like DICE prefers no AA to MLAA at this point...at least on the ps3 where I guess give 'some' results.
 
patsu said:
In this case, the RSX would take 9ms or so to complete.
I don't know where you guys are getting this stuff, all they said is:
GPU render scene with 2MSAA = Xms
GPU render scene = (X-5)ms

There's a very obvious implication that they aren't CPU limited, so extra time spent on SPUs is "free" in terms of framerate.
 
Well it appears that I may have misunderstand you post a bit :)
I would put my bet on faster.
While answering I was actually more interested by this part of your post:
In other words...is MLAA in general a faster algorithm compared to MSAA (with even better quality), or is its advantage solely that the CPU can be used to do it, freeing up GPU?!
It's not only about some ms or do it on GPU or CPU.
Actually that's why I ask the mods to consider a new thread a merge or a rename to something more generic about analytical AAA.
Epic for instance is considering AAA for their next engine (but @ higher precision 10/11 bits so I guess they plan to work on the color buffer only). AAA has a big advantage vs MSAA: memory print and bandwidth requirement. 4A does some form of AAA using the GPU so AAA is useful no matter where you run it (gpu/cpu).

Either way I give up on answering you others should do better :)

EDIT
Forget the part about Epic I got mislead.
 
Last edited by a moderator:
Back
Top