Jawed
Legend
The other side of the coin: can the RBEs provide more than 16 pixels per second of samples? Additionally, the colour fillrate is 16 pixels per second, so an MSAA resolve shader that runs in only one cycle doesn't actually run any faster than the shader I've described.Jawed: your shader to resolve a 4X render target can probably be much simpler, assuming bilinear filtering is enabled it can be carried on in one clock cycle (and just one tex2D instruction)
RV635 has one quad RBE (4 pixels per clock). A TU-based MSAA resolve would run at 8 pixels (32 samples) per clock. My MSAA resolve shader would run at 6 pixels per clock. Both techniques are bottlenecked by the single quad-RBE.
RV620 can perform 4 pixels (16 samples) per clock through the TUs, matching the RBEs. The MSAA resolve shader I wrote would run at 1 pixel every 2 clocks. So in this case TU-bilinear averaging would be fast enough while my shader would run at half-rate.
So, I wonder if there are any benchtests of RV620 (RV610) 4xMSAA. The behaviour of this in comparison with higher spec GPUs might provide some clues as to the basic form of MSAA resolve. If bandwidth doesn't get it first...
Jawed