Not confused. I assumed the downside of tiling was due to the nature of having to copy out parts of the screen from the EDRAM to normal RAM
No not really.
The amount of data copied is always the size of the backbuffer in ram. The main reason for tiling is unresolved AA requires more memory when stored in EDRAM, and the AA resolve happens on chip, so AA doesn't affect the bandwidth hit for copying back to main memory.
At minimum, the RSX needs to write out the same amount of data - the difference being overdraw, blending, AA, etc all add extra graphics memory bandwidth hit. And the memory read/writes aren't as predictable, whereas the EDRAM resolve is a giant buffer copy for each tile (I believe
). So in this sense, tiling will mean more memory copies, but the overall amount of data copied shouldn't drastically change.
It's all rather unrelated to how many buffers are actually stored in memory.
Triple buffering doesn't add any bandwidth overhead - it just uses more memory.
It works like this (forewarning, I'm not 100% exactly sure this is how it works on the 360
:
At startup, the app allocates enough space for three copies of the AA-resolved 32bit colour buffer. Typically, for 1280x720, this would be 3x1280x720x4bytes = 10.54mb (plus alignment padding).
Now, think of this as buffer A, B and C.
The console simply stores a pointer to which block of memory it is currently displaying. This is typically changed during a vblank (roughly, the point where the frame has been finished being sent the to display) or if you aren't using vsync, once a horizontal equivalent occurs (I forget the specific term). Think of it as the console being told which buffer it should be sending to the screen.
No copies to main memory occur during this - just a pointer changes (roughly).
During rendering, the GPU renders to EDRAM, and when finished, the rendered buffer gets AA-resolved and copied to a programmer specified place somewhere in main memory.
So, take the following frame sequence:
Code:
Frame number:X X 0 1 2 3 4 5 6 7 8 9 ...
Resolved to: A B C A B C A B C A B C
Display buf: X X A B C A B C A B C A
(where X is a blank frame)
If that makes sense. So when resolving to buffer A, buffer B is being displayed. The vblank occurs, now C is being displayed, and rendering starts on buffer B. After the next vblank, buffer A is displayed while rendering occurs to buffer B.
In the event that rendering takes too long and the vblank occurs early - then it's OK because there is already a complete buffer waiting, but you can only do this for so many frames before you have to display a buffer twice in a row. The opposite is that if you render too quickly, you naturally must stall and wait.