Alternative AA methods and their comparison with traditional MSAA*

MJP · Jun 10, 2010

The recursive doubling isn't a bad idea, but it's probably too many passes to pull off on the Xbox 360 (which is the most useful place for a GPU-based MLAA implementation). I started looking through the shader code and some of it's an absolute mess, so I didn't bother to analyze it too much. However it's safe to say that it's not particularly optimized (which is standard for academic papers of this nature), so I wouldn't put too much stock in the performance numbers.

Either way I'd expect you could do better with a compute shader implementation...maybe one day I'll get around to trying it myself. :smile:

patsu · Jun 10, 2010

TheD said:
The E4500 is not a laptop CPU, it is a low end desktop part.

The paper only mentions Core2Duo (not E4500), which can be found in some laptops (one of my old ones).

Neb · Jun 10, 2010

TheD said:
The E4500 is not a laptop CPU, it is a low end desktop part.

Might be so but it is highly used for laptops and is a stripped down Core2Duo comparad to the higher models.

patsu said:
The paper only mentions Core2Duo (not E4500), which can be found in some laptops (one of my old ones).

COre2Duo is the series, then there are different revisions and models within series. A 2.2GHz C2D is a C2D E4500 else it is an even slower one that is overclocked.

assen · Jun 10, 2010

T.B. said:
Their performance numbers are curious, however. The algorithm should scale worse than linear, because it has some O(n*log(sqrt) components and a whole lot of O components. Still their 295GTX version only takes 11% longer for 1.5x the pixels. So they are either massively bound by some large resolution-independent term, or their measurement failed. The 8600GT numbers scale much more reasonably and perfectly in line with their CPU implementation, which seems to rule out the resolution-independent term. So if someone with a 285/295 could verify the numbers and the methodology used to measure them, that would be interesting.

Last night they admitted via Twitter that the 0.49ms number in the paper is a mistake, and it should read 3-5 ms on the GTX295; also, they;ve made some changes to the algorithm that add 17 ms to the 8600 number.

TheD · Jun 10, 2010

The E4500 is a LGA 775 socket desktop CPU, it is not made to be used in laptops.

The T5900, T7500, T6600 and T6670 are the only 2.2 Ghz laptop C2Ds.

Shifty Geezer · Jun 10, 2010

assen said:
Last night they admitted via Twitter that the 0.49ms number in the paper is a mistake, and it should read 3-5 ms on the GTX295.

Which shows even moreso that Cell is offering something special here, providing better quality (at the moment anyway

) with better efficiency. And this in turn points to the advantages of fully programmable rendering hardware and software renderers.

assen · Jun 10, 2010

Shifty Geezer said:
Which shows even moreso that Cell is offering something special here, providing better quality (at the moment anyway ) with better efficiency. And this in turn points to the advantages of fully programmable rendering hardware and software renderers.

Or, just as with exclusive titles, shows even moreso than months and months of optimization by experienced, high-level developers at Santa Monica and SCEA (vs a small team of researchers) are offering something special here, providing better quality with better efficiency; and this in turn points to the advantages of huge corporations with deep pockets for funding show-off projects?

Shifty Geezer · Jun 10, 2010

To a degree that's got to be true, and a proper comparison will have to wait until a GPU solution has been decently optimised.

Prophecy2k · Jun 10, 2010

assen said:
Or, just as with exclusive titles, shows even moreso than months and months of optimization by experienced, high-level developers at Santa Monica and SCEA (vs a small team of researchers) are offering something special here, providing better quality with better efficiency; and this in turn points to the advantages of huge corporations with deep pockets for funding show-off projects?

What the heck?!?! :-S

Last time i heard, wasn't the MLAA implimentation in SSM's engine implimented by ONE guy and optimised over a relatively short space of time?...

I don't see how an army of coders, large corporations or deep pockets has any real life relevance to how an MLAA algorithm is implimented on differing hardware.

Regardless, it does indeed show (or at least indicate) some positives of MLAA implimentation on CELL over GPUs....

I find your comment a little inane.

Shifty Geezer · Jun 10, 2010

The investment in GOW3's AA has been nicely documented showing how perseverence got them from a lacklustre IQ taking a long time to render, to an extremely effective optimised solution achieved fairly close to the end of development. The sense of this very short paper is more that they have a GPU solution that runs, rather than an ideal solution, and it needs investent of brainpower to develop further. As you say though, it's not like SCEA funded millions on developing MLAA just for Cell's sake. It was a choice by Santa Monica to give it a try and the dedicated effort. Let's not get into politics here though as it's an interesting thread about an interesting tech!

assen · Jun 10, 2010

Prophecy2k said:
What the heck?!?! :-S

Last time i heard, wasn't the MLAA implimentation in SSM's engine implimented by ONE guy and optimised over a relatively short space of time?...

You heard wrong last time, then.

http://www.realtimerendering.com/blog/more-on-god-of-war-iii-antialiasing/

Cedric Perthuis said:
It was extremely expensive at first. The first not so naive SPU version, which was considered decent, was taking more than 120 ms, at which point, we had decided to pass on the technique. It quickly went down to 80 and then 60 ms when some kind of bottleneck was reached. Our worst scene remained at 60ms for a very long time, but simpler scenes got cheaper and cheaper. Finally, and after many breakthroughs and long hours from our technology teams, especially our technology team in Europe, we shipped with the cheapest scenes around 7 ms, the average Gow3 scene at 12 ms, and the most expensive scene at 20 ms.

In term of quality, the latest version is also significantly better than the initial 120+ ms version. It started with a quality way lower than your typical MSAA2x on more than half of the screen. It was equivalent on a good 25% and was already nicer on the rest. At that point we were only after speed, there could be a long post mortem, but it wasn’t immediately obvious that it would save us a lot of RSX time if any, so it would have been a no go if it hadn’t been optimized on the SPU. When it was clear that we were getting a nice RSX boost ( 2 to 3 ms at first, 6 or 7 ms in the shipped version ), we actually focused on evaluating if it was a valid option visually. Despite of any great performance gain, the team couldn’t compromise on quality, there was a pretty high level to reach to even consider the option. And as for the speed, the improvements on the quality front were dramatic. A few months before shipping, we finally reached a quality similar to MSAA2x on almost the entire screen, and a few weeks later, all the pixelated edges disappeared and the quality became significantly higher than MSAA2x or even MSAA4x on all our still shots, without any exception. In motion it became globally better too, few minor issues remained which just can’t be solved without sub-pixel sampling.

...
Integrating the technique without adding any latency was really a major task, it involved almost half of the team, and a lot of SPU optimization was required very late in the game.”

Prophecy2k said:

I don't see how an army of coders, large corporations or deep pockets has any real life relevance to how an MLAA algorithm is implimented on differing hardware.

Click to expand...

You can't compare the merits of two architectures - as Shifty did - by two implementations, one of which took probably an order of magnitude more effort than the other.

patsu · Jun 10, 2010

Shifty Geezer said:
To a degree that's got to be true, and a proper comparison will have to wait until a GPU solution has been decently optimised.

I am more interested in their quality difference and "shortcuts", if any. It may come in handy even for Cell in tight situations.

We want a deep as well as a variety of MLAA implementations.

Shifty Geezer · Jun 10, 2010

assen said:
You can't compare the merits of two architectures - as Shifty did...

Not a direct comparison, no, but we've discussed the algorithm and the nature of the processing architectures, and GWAA as expalined with the limited info we had isn't a good fit to GPUs. Then we get an implenmentation, we have a chance to consider it. This isn't about coming to conclusions, but following the progression of development and as an intellectual exercise trying to predict what can and can't be done, while also reviewing the hardware choices.

Basically, either we get some info and start talking about it, or ignore every single bit of info because it's not final and things may change. If we're not going to compare implementations and hardware suitabilities, what's the point fo the thread?!

TheD · Jun 10, 2010

It is very clear that you can not compare the cell 4ms number to the GTX295 5ms number due to the fact they are not running the same algorithm and most likely the algorithm is not a very fast one.

Just take a look at the time it takes to get the CPU to do the algorithm from the paper that claims the GTX295 5ms vs the intel paper's claims of 5ms on a C2Q.

A C2D 2.20Ghz is not so slow that it is over 10x slower than a C2Q 3Ghz!

The only C2Qs at 3Ghz are the Core 2 Quad Q9650, Core 2 Extreme QX6850 and the Core 2 Extreme QX9650

The differences between the only desktop 2.2Ghz core 2 (the E4500) and the 3Ghz quads are half the cores, 2 banks of 6MB L2 cache (2x 4MB for the QX6850) vs one bank of 2MB on the E4500, SSE4.1 on the Q9650 and QX9650 vs only SSSE3 on the E4500 (and on the QX6850) and a lack of VT on the E4500.

Nothing has ever shown a gap of even close to 10x between the CPUs!

It is not even close to fair to compare till we have an algorithm on each setup that looks about the same and is using the hardware in the best way you can (within reason).

patsu · Jun 11, 2010

Do you know in what ways are the algorithms different ? What is this auto-determination of continuity factor mentioned in their future work ?

Trejser · Jun 12, 2010

Digital Foundry: The adoption of the MLAA form of anti-aliasing is a huge improvement for the image quality, and people are excited to see this technique appearing in more titles. So, how hard has it been to include? Did it slot nicely into LBP's post-processing step or like Santa Monica with GOW3, did you have to massage the engine to free up SPU cycles in the middle of each frame to keep latency down?

Alex Evans: We really got to ride on their shoulders there - when we got the MLAA code from Sony, it was already in a pretty usable and fast state. We dropped it in during an afternoon, I believe, and it did save us a little GPU time. As with any change, there are knock-ons, a bit of SPU rescheduling etc, but it's definitely a net win.

http://www.eurogamer.net/articles/digitalfoundry-lbp2-tech-interview?page=3

That's nice

Gitaroo · Jun 12, 2010

Sony better add that in their libraries soon for 3rd parties........... Just a question, does MLAA works on shadows or low res effects? Like those explosion from Infamous

London Geezer · Jun 12, 2010

wow...

Alex Evans: It's early days but Sony has always been incredibly inclusive of Media Molecule. The fact that I can now regularly sit in a room with the tech directors of Uncharted 2, God of War III, MAG and Killzone, among others, and listen to them all debating every aspect of game development, is exhilarating.

Shifty Geezer · Jun 12, 2010

Trejser said:
That's nice

Indeed! It confirms once and for all that, if you ahve the SPU cycles spare, it's now basically a drop-in feature. It needs to be provided for all developers to give PS3 the best possible IQ for the machine. The only reason not to I guess is because they don't want the ATG/Santa Monica competitive advantage to become common knowledge, but that's not normal given the spirit of development and open papers.

Dregun · Jun 15, 2010

MLAA in its current state seems to be the final nail in the coffin when it comes to debating Sony's inclusion of the CELL over a more conventional and stronger GPU. As smart as these developers seem to be they will probably find a way to get MLAA to work on the GPU and at that point its back to the debate again. For now though it seems consoles can still benifit greatly from an extremely strong CPU.

However for the time being I just want to relish in the fact that so many people said that this generation was diminishing returns, that we had already hit a wall and image quality wasn't going to be able to increase without prohibitive costs. Now we have AA that removes a vast amount of jaggies/imperfections allowing us to achieve levels of detail none of us could have predicted for this generation.

Alternative AA methods and their comparison with traditional MSAA*

MJP

patsu

Neb

Iron "BEAST" Man

assen

TheD

Shifty Geezer

uber-Troll!

assen

Shifty Geezer

uber-Troll!

Prophecy2k

Shifty Geezer

uber-Troll!

assen

patsu

Shifty Geezer

uber-Troll!

TheD

patsu

Trejser

Gitaroo

London Geezer

Shifty Geezer

uber-Troll!

Dregun

Similar threads