[maven] said:
I think ERP hit the nail on the head, 2xAA is for free - except for the memory overhead in the EDRAM, which requires the developer to have gotten up to speed with the tiled rendering approach implemented by Xenos, and launch games may simply not have had that "luxury".
There's more than the cost of bandwidth in the eDram, which let's face it, isn't really a cost since there's so much of it there (up to a point anyway).
The costs as far as I can see fall under a few different headings:
1) Computational cost on the daughter die, whatever it might be - intuitively, it seems to make sense that not doing any AA will be faster than doing 2xAA, and 2xAA will be faster than 4xAA.
2) Bandwidth cost on the daughter die - but again, there's so much of it there.
These are "normal" costs associated with any GPU's framebuffer. However with the eDram, and with anything from a 720p 32-bit frame with 2xAA upwards, there'll be tiling costs, which as far as I can make out, invoke further costs as per the previous conversation:
3) Some duplicate processing for polygons intersecting the division between one tile and another (those polys get processed to a point for both tiles)
4) Maintenance of display lists (large?), having the command processor checking commands that it may not be able to run etc.
5) GPU idle time during edram->memory copy of colour buffer (?)
6) A little consumption of main memory bandwidth. I figure on the lower end, FP10 720p at 30fps = 210 MB/s (or 1% of main memory bandwidth) to a higher end, FP16 1080i at 60fps = 712 MB/s. (or 3% of main memory bandwidth). If you don't have to write out the z-buffer too, that drops to 105MB/s (0.5%) on the low end and 475 MB/s (just over 2%) on the high end
(The last point may apply whether you are tiling or not, Dave's article mentions that "in some cases" the buffer can be sent straight to the DACs, I'm guessing perhaps if you aren't tiling that may be possible)
My question is what the performance costs we've seen claimed account for - all of this, or just some? Some of the above things are going to be very variable, even frame-to-frame.
edit - regarding other "effects", is it not correct that things like DOF, motion blur etc. are "done" on the parent die, in shaders? If that's the case, the daughter-die costs associated with those, then, would be fillrate and bw related, which suggests to me that the AA/tiling is more the issue. May well be wrong though..