sireric said:
The MC requires the clients to have lots of latency tolerance so that it can establish a huge number of outstanding requests and pick and chose the best ones to maximize memory bandwidth (massive simplification).
One feature that hasn't got much attention so far is the Color Buffer Cache (buffer and cache?).
Presumably the render back-end has its own scheduler built-in, to take a list of incoming colour/z/stencil values and render them into the back-buffer. I'm guessing that this scheduler will gather the writes into "blocks" and then ask the MC to retrieve the corresponding areas of frame-buffer into CBC, so that the RBE only directly accesses the CBC - it never directly accesses VRAM.
Whilst the RBE is waiting for the MC to deliver the requested block into CBC, it should have received other blocks and be able to perform colour-writes/AA compares etc.
Similarly, presumably, the scheduler also has task types associated with z and stencil queries, again requiring "blocks" of back-buffer to be read into the CBC. Although the diagram for R520 implies that z/stencil (buffer cache) are outside of the RBE - but nevertheless are utilised by RBE.
Finally, of course, the scheduler must deal with purging CBC back into VRAM to make way for other blocks.
Is it right to assume that each of R520's "pixel units" integrates the texture and shader engines with the RBE, so that there are four separate RBE's in X1800XT? Each with a localised CBC?
It seems to me that if the GPU splits the back-buffer into "screen tiles", e.g. of 16x16 pixels, then each RBE has guaranteed ownership of "blocks" in the back-buffer - so avoiding any risk of contention by multiple RBEs over individual pixels in the back-buffer.
The only remaining problem is to ensure that colour-write operations that are dependent on write order are processed in write order - so the scheduler needs to be able to differentiate between un-ordered writes and in-order writes, when it schedules RBE tasks.
Jawed