Alternative AA methods and their comparison with traditional MSAA*

Shifty Geezer · Apr 19, 2010

Billy Idol said:
Thanks for the link! Great read. But I also think that it clearly states that it is/was not so easy to incoporate this tech in an existing engine...which kind of decrease the euphoric assumption that from now on, every PS3 game uses this AA.

I'm not reading that. It was hard, but they were pushing new ground. As to how readily the finalised method could be added to a new title depends on the scheduling methods I think. If the target game can accomodate the scheduling, or even the added latency of a frame wait to post-process the finalised framebuffer, it shouldn't be any harder to add than any other post-process effect. The question at this point is what state the final code could be modularised into a Sony library for distribution and inclusion.

AlStrong said:
Probably the most useful information there is that it's not a fixed cost...

Indeed, requiring some significant budgetting to maintain a stable framerate. Although 20ms SPUtime over 5 SPUs would be about a 2ms variation overall time (you're tying up the SPUs for 1-3ms per frame).

Shifty Geezer · Apr 19, 2010

2real4tv said:
What if you implement it from the beginning maybe you can reduce latency even more?

The limiting factor in latency is going to be how quickly you can produce workable data from RSX. A TBDR GPU might offer much better options with per-tile processing, but then if we're talking future hardware, a GPU solution tailored for this AA technique would be the ideal solution!

2real4tv · Apr 19, 2010

But wouldn't have to consist of multiple cores to take advantage of it? Stupid question but do current GPU's have multiple cores?

AlNom · Apr 19, 2010

2real4tv said:
But wouldn't have to consist of multiple cores to take advantage of it? Stupid question but do current GPU's have multiple cores?

"shader cores"

Oh marketing...

Billy Idol · Apr 19, 2010

Shifty Geezer said:
I'm not reading that. It was hard, but they were pushing new ground. As to how readily the finalised method could be added to a new title depends on the scheduling methods I think. If the target game can accomodate the scheduling, or even the added latency of a frame wait to post-process the finalised framebuffer, it shouldn't be any harder to add than any other post-process effect. The question at this point is what state the final code could be modularised into a Sony library for distribution and inclusion.

He stated that it is not like you do everything, and at the end you make the AA on the SPUs like a post processing step!
He states that they had due to the latencies incorporate the AA into the normal process...which seems to be a lot of "hand made" engine depending stuff, which could be easier or harder for an other engine?!

Laa-Yosh · Apr 19, 2010

Nice to see a developer putting that much work behind antialiasing tech. Although I still prefer UC2's look to GOW3

but Naughty Dog is just the kind of dev to give this approach a serious try.

I don't fully get what the tech does, though. It probably involves serious edge detection, also taking the Z-buffer into account; but after that, it's certainly not just a simple blur operation. More like a full reconstruction / re-drawing of the poly edges using an AA-ed line drawing algorithm and sampling the color from the original image?

Also, the cost is probably dependent on overall scene geometry complexity, but not just poly count - a 1 million triangle sphere would still be nothing compared to a thousand 1000-poly characters.
This means that the next gen consoles would most likely get an even higher cost, especially when using displacement mapping (think about the spiky dragon in that tech demo), in fact there might be a point where MSAA becomes faster again?
Also, it is now definitely time to spend some serious effort on shader supersampling as well, to get rid of that kind of artifacting...

T.B. · Apr 19, 2010

Billy Idol said:
But I also think that it clearly states that it is/was not so easy to incoporate this tech in an existing engine...which kind of decrease the euphoric assumption that from now on, every PS3 game uses this AA.

Santa Monica did an awesome job on getting the integration as tight as possible. As I understand it, Cedric and Jim put a *lot* of time and effort into getting the final frame latency down as much as possible.
That said, if your SPUs are not that busy and latency is not that much of an issue, it's just a few function calls. I've integrated it into existing samples within minutes.

It all depends on how optimal you want everything to be, and the Santa Monica guys don't mess around.

AlNom · Apr 19, 2010

Well, up to 20ms of frame time is certainly a big consideration! I'm rather curious still if it is something most games can use all considering.

Laa-Yosh · Apr 19, 2010

I'm rather curious if the X360 has the processing power and means to reduce latency to do it, and on the GPU or the CPU or some combination of both...

Shifty Geezer · Apr 19, 2010

We need to be clear on that. I understand them to mean 20ms of total SPU time out of an available 6x frame time. A 30fps title has available 6x33ms or 200ms total SPU time. 20ms of AA processing leaves 180ms totoal SPU time for other jobs, or putting it another way, shared over 5 SPUs 20ms SPU time is 4ms on each SPU, 4ms of frame time.

upnorthsox · Apr 19, 2010

Shifty Geezer said:
We need to be clear on that. I understand them to mean 20ms of total SPU time out of an available 6x frame time. A 30fps title has available 6x33ms or 200ms total SPU time. 20ms of AA processing leaves 180ms totoal SPU time for other jobs, or putting it another way, shared over 5 SPUs 20ms SPU time is 4ms on each SPU, 4ms of frame time.

Or under 7ms on 3 SPU's, the important thing is to get it running on more than one SPU.

Billy Idol · Apr 19, 2010

T.B. said:
Santa Monica did an awesome job on getting the integration as tight as possible. As I understand it, Cedric and Jim put a *lot* of time and effort into getting the final frame latency down as much as possible.
That said, if your SPUs are not that busy and latency is not that much of an issue, it's just a few function calls. I've integrated it into existing samples within minutes.

It all depends on how optimal you want everything to be, and the Santa Monica guys don't mess around.

Thanks T.B for the clarification!!
So it really depends on the actuall case...(I suppose that you are not aware of the situation over at Polyphony Digital? :mrgreen:

)

patsu · Apr 19, 2010

T.B. said:
That said, if your SPUs are not that busy and latency is not that much of an issue, it's just a few function calls. I've integrated it into existing samples within minutes.

It all depends on how optimal you want everything to be, and the Santa Monica guys don't mess around.

It sounds like the key innovations are:

(A) How the workflow is scheduled/aligned and communicated back and forth (between RSX and Cell) to maximize available time slot

(B) The filter algorithm itself, which can use up to 5 SPUs (or more ?)

As long as you can figure out (A) and you have enough spare SPU left for (B), the method can be used. Since Santa Monica studio has probably tried a lot of different integration approaches, it'd be relatively simpler for you guys to re-apply the system to another game (if applicable). Some sort of scenario-based training may be done to transfer the knowledge to another team to tackle (A). (B) can be optimized separately, or may be doesn't require any changes ?

Billy Idol · Apr 19, 2010

Can some of you guys clarify "latency" for me please?

In the context I use "latency" is the following: when doing massive parallel (MPI) stuff calculations, when you submit a message via MPI, then it needs some time before you can send a new message - i.e. we have some latency of the communication system, which overall degrades the efficiency of the parallelization - so what I can do to reduce this drawback: during communication latency, do some stuff which needs only local data in mean time - just use the waiting time to do something else!

Is this the same "latency" people here are talking about?

Or is there another "latency" aspect in the sense of "dependencies"?! Dependencies means: SPU_X has to wait fode-alias an image, before you even have one?

patsu · Apr 19, 2010

I suspect they are using "latency" casually to mean "enough time between frames" or "time before the frame is done rendering".

How does the final quality compare to 2 or 4xMSAA ?

Billy Idol · Apr 19, 2010

Another question someone here rises: in the interview it states that the CPU time of the AA method varies from scene to scene.

Here are some of my assumptions:

-So if I assume that there is some adaptation going on, e.g. some form of edge detection:
you have the fixed cost of the detection algorithm which must be the same for each image, scaling with the number of pixels

-I assume that you even have some parameters, which you can tune to determine what actually is an edge (I don't believe that you guys found a pure automatic algorithm, as this is basically one of the tasks we try to do here all day long

) - this parameter gives you the possibility to exchange CPU time for higher quality or vis versa (more edges detected -> better quality -> more CPU time needed)

-you have an additional cost depending on how much edges got detected: because then in a case where you detect an edge, you do your magic AA stuff.

Concluding based on this wild assumptions:
-This implies that the stated CPU time heavily depends on the actual game, for instance if you have a lot of edges to be de-aliased, you need more CPU time as more edges get detected...on the other hand, you can reduce the CPU time by setting a parameter, which decides what is an edge or not!

-If indeed a parameter is used to tune the edge detection ...one can use this again adaptively to determine the quality of AA depending on the available CPU time, i.e. depending on the actual complexity of the scene: "I have a lot of spare CPU time , so I can really turn up the edge detection sensitivity and improve IQ"

-A better way would be: use a rather low sensitivity, so that almost all edges got detected. Now sort them appropriate: "worst" edges first or something like this. Then you get a sorted list of detected edges. Determine now how many CPU time you have left. This gives you automatically the maximum number of edges you can de-aliase. Then just process the list of edges from the worst until you have no CPU time left...

nightshade · Apr 19, 2010

patsu said:
How does the final quality compare to 2 or 4xMSAA ?

From what I have observed by reading all the discussion & stuff regarding MLAA...the final quality should be quite comparable to something as high as 16*MSAA in still pictures, while moving its comparable to 4*MSAA or more.

Rolf N · Apr 19, 2010

Billy Idol said:
Can some of you guys clarify "latency" for me please?

In the context I use "latency" is the following: when doing massive parallel (MPI) stuff calculations, when you submit a message via MPI, then it needs some time before you can send a new message - i.e. we have some latency of the communication system, which overall degrades the efficiency of the parallelization - so what I can do to reduce this drawback: during communication latency, do some stuff which needs only local data in mean time - just use the waiting time to do something else!

Is this the same "latency" people here are talking about?

Or is there another "latency" aspect in the sense of "dependencies"?! Dependencies means: SPU_X has to wait fode-alias an image, before you even have one?

I thought he was using latency as a synonym for processing time. It would appear as latency to the player -- if it were excessive -- because it would lead to delays between input and response. That doesn't mean that the system is idling however.

TheWretched · Apr 19, 2010

If it indeed uses the depth buffer for analyzing the picture too, couldn't we then circumvent the subpixel problem with this approach too? I mean, for example in Gran Turismo (I was told, such an approach would look worse, as it destroys sub pixel detail with stuff like power lines), if the depth buffer has a far away line across the screen, couldn't the algorithm just not process it instead of destroying it? Or could one dial the software in a way, that is does filter the line correctly?

Arwin · Apr 19, 2010

TheWretched said:
If it indeed uses the depth buffer for analyzing the picture too, couldn't we then circumvent the subpixel problem with this approach too? I mean, for example in Gran Turismo (I was told, such an approach would look worse, as it destroys sub pixel detail with stuff like power lines), if the depth buffer has a far away line across the screen, couldn't the algorithm just not process it instead of destroying it? Or could one dial the software in a way, that is does filter the line correctly?

After reading the Eurogamer.net 3D article that went up just now, it's clear that new kinds of AA related to this one are going to be coming to the forefront - the article mentions that AA is much more important in 3D than native resolution.

Alternative AA methods and their comparison with traditional MSAA*

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

2real4tv

AlNom

Moderator

Billy Idol

Laa-Yosh

I can has custom title?

T.B.

AlNom

Moderator

Laa-Yosh

I can has custom title?

Shifty Geezer

uber-Troll!

upnorthsox

Billy Idol

patsu

Billy Idol

patsu

Billy Idol

nightshade

Wookies love cookies!

Rolf N

Recurring Membmare

TheWretched

Arwin

Now Officially a Top 10 Poster

Similar threads