Alternative AA methods and their comparison with traditional MSAA*

That's just how we do it on Cell, TBH. You have 6 cores and while a GPU always runs the same program on all "cores", that's just not true for the SPUs. So if you have a properly parallelisable problem, it makes sense to measure performance in "1 SPU time".

"Example": I have 100ms of SPU time at 60Hz and I budget up to 20ms for a piece of code. Maybe I just run it on 2 SPUs and get 10ms latency. Or I put it on 5 and get 4ms. That decision will depend on scheduling needs, but I still know how much SPU time I've committed.

Still sounds dumb? ;)
5 stars, not only for this post but some others you wrote in this thread.

So, if my understanding of AAA is correct, it actually makes reference to another definition for morphological AA. I'd like to thank grandmaster and Alstrong for providing an explanation and information on this kind antialiasing technique. Now I understand more about it, it's advantages and weaknesses. There are many methods developers can try but I think my favourite has to be SSAA, or supersampling AA, not taking into account it isn't the predominant antialiasing method most games use, sadly. I think a moderator -SG- of this forum would agree with me on this.
 
Supersampling would be nice, but it's impossible given processing and bandwidth requirements. 'Morphological' AA seems the future, identifying the edges that need filtering and applying dedicated filtering only there. Shader aliasing will need to be solved in the shaders themselves. Add procedural geometry distribution and, using cleverness, we may get enough power to produce jaggies free, smooth renderings next gen.
 
Supersampling would be nice, but it's impossible given processing and bandwidth requirements. 'Morphological' AA seems the future, identifying the edges that need filtering and applying dedicated filtering only there. Shader aliasing will need to be solved in the shaders themselves. Add procedural geometry distribution and, using cleverness, we may get enough power to produce jaggies free, smooth renderings next gen.
What about the issues with sub pixels that MLAA has ?
 
They'll need to solved in other ways. One method may be to not render them with the traditional renderer, instead to have a small-geometry rendering method. eg. The powerlines of a racer could be drawn with a line-drawing algorithm instead of using geometry. We've maybe a couple of years to prepare next-gen hardware for these new methods, but the potential is definitely there got the most significant improve in rendering quality we've ever had.
 
From what I have observed by reading all the discussion & stuff regarding MLAA...the final quality should be quite comparable to something as high as 16*MSAA in still pictures, while moving its comparable to 4*MSAA or more.

Well to my eyes it's a lot better of 4xMSAA on the characters, really really impressive (I have gow 3). For the scenario pretty comparable to 4xMSAA but some edges are less detected when in motions, or too much distant or too angled camera views (very rarely however), but definetely the best AA of this generation properly used. I just hope to see more first parties games with MLAA (my dream next killzone or uncharted :LOL: ). In sony first parties I believe
 
So a selective MSAA for sub pixel with MLAA for rest won't work out as a solution ?
4xMSAA doesn't offer anything like the quality of a proper line-drawing algorithm. It offers a simple board-spectrum solution, but alternative approaches for sub-pixel geometry would match the IQ of the MLAA and give that 'perfect' feel.
 
Could it not be used alongside 2xMSAA to circumvent the subpixel problem? You would lose the extra GPU time you gain from moving all AA to CPU but the gain in IQ could be well worth it if you can spare the SPU time.
 
About MLAA SMS update more explanations (source gaf):
'“It was extremely expensive at first. The first not so naive SPU version, which was considered decent, was taking more than 120 ms, at which point, we had decided to pass on the technique. It quickly went down to 80 and then 60 ms when some kind of bottleneck was reached. Our worst scene remained at 60ms for a very long time, but simpler scenes got cheaper and cheaper. Finally, and after many breakthroughs and long hours from our technology teams, especially our technology team in Europe, we shipped with the cheapest scenes around 7 ms, the average Gow3 scene at 12 ms, and the most expensive scene at 20 ms.

In term of quality, the latest version is also significantly better than the initial 120+ ms version. It started with a quality way lower than your typical MSAA2x on more than half of the screen. It was equivalent on a good 25% and was already nicer on the rest. At that point we were only after speed, there could be a long post mortem, but it wasn’t immediately obvious that it would save us a lot of RSX time if any, so it would have been a no go if it hadn’t been optimized on the SPU. When it was clear that we were getting a nice RSX boost ( 2 to 3 ms at first, 6 or 7 ms in the shipped version ), we actually focused on evaluating if it was a valid option visually. Despite of any great performance gain, the team couldn’t compromise on quality, there was a pretty high level to reach to even consider the option. And as for the speed, the improvements on the quality front were dramatic. A few months before shipping, we finally reached a quality similar to MSAA2x on almost the entire screen, and a few weeks later, all the pixelated edges disappeared and the quality became significantly higher than MSAA2x or even MSAA4x on all our still shots, without any exception. In motion it became globally better too, few minor issues remained which just can’t be solved without sub-pixel sampling.

There would be a lot to say about the integration of the technique in the engine and what we did to avoid adding any latency. Contrarily to what I have read on few forums, we are not firing the SPUs at the end of the frame and then wait for the results the next frame. We couldn’t afford to add any significant latency. For this kind of game, gameplay is first, then quality, then framerate. We had the same issue with vsync, we had to come up with ways to use the existing latency. So instead of waiting for the results next frame, we are using the SPUs as parallel coprocessors of the RSX and we use the time we would have spent on the RSX to start the next frame. With 3 ms or 4 ms of SPU latency at most, we are faster than the original 6ms of RSX time we saved. In the end it’s probably a wash in term of latency due to some SPU scheduling consideration. We had to make sure we could kick off the jobs as soon as the RSX was done with the frame, and likewise, when the SPU are done, we need the RSX to pick up where it left and finish the frame. Integrating the technique without adding any latency was really a major task, it involved almost half of the team, and a lot of SPU optimization was required very late in the game.”

“For a long time we worked with a reference code, algorithm changes were made in the reference code and in parallel the optimized code was being optimized further. the optimized version never deviated from the reference code. I assume that doing any kind of cheap approximation would prevent any changes to the algorithm. There’s a point though, where the team got such a good grip of the optimized version that the slow reference code wasn’t useful anymore and got removed. We tweaked some values, made few major changes to the edge detection code and did a lot of testing. I can’t stress it enough. every iteration was carefully checked and evaluated.
 
Sweet. Makes me want to buy the game just to see it live. Also confirms big change between demo and final code (which didn't quite appeal to me)
 
So what is the latency a function of....code optimization...speed of the SPU's...how many co-processors you have?

according to the article posted...

wouldnt latency be a matter of how long it takes for the rsx to send the image back to the Cell so that Cell can initiate the MLAA call to the SPUs

and then how long it takes for Cell to send the processed image back to the rsx

so code optimization? and possibly a matter of bandwidth?
 
The kind of access latency you refer to (sharing data between RSX and Cell) can be partially mitigated by the Cell's DMA to/from LocalStore. As long as they stagger the accesses, the developers should be able to hide the latency.

That's why I think when talking about "latency" here, they are actually talking about something higher level (specifically, the available "frame time").
 
I don't think latency in this case refers to periods of waiting -- one could easily imagine a blockwise pipelining process where the first piece of the MLAA'd buffer already comes back to GDDR before the original buffer has even been completely handed over to the SPUs; there'd be no waiting in this approach. Not much anyway.

I'm used to "stalls" being the term for unproductive waiting, while I read "latency" as a period busy with processing.
 
What's meant by latency is the time between the workload being kicked and finishing, i.e. the time during which RSX doesn't own the buffer.
So to answer 2rea4tv's question: d) all of the above. ;)
 
4597950662_ef757a1d27_o.png


http://farm2.static.flickr.com/1297/4598143678_8a0a93394a_o.png
http://farm2.static.flickr.com/1211/4598145688_5f3417fa53_o.png

LBP2 >>>>

Mod: Can you pelase not spam tech-threads with pics, but explain what you see in those images of interest.
 
I was just wondering how they got the fur & texture to look so perfect.
Clever texturing, and maybe some integration of shaders with their lighting engine. No need for mesh AA as the fur strands aren't polygons, so MLAA will have no benefit.
 
Back
Top