Not only are the rates not the same
The AA sample creation rate is the same...
(R600 definately has a big edge in real shader throughput, and R580 will be BW limited on occasion),
And R600 has double-rate Z with AA off ...
but R600 isn't always faster per clock with AA.
When this happens it's prolly a driver bug. Yet people rely on this as evidence of the failure of shader AA-resolve or as evidence of a hardware bug.
However, I did just think of a reason that the bigger drop with R600 could be decieving: it has twice the Z-only rate of R580 when AA is disabled, but the same rate when AA is enabled. Right? That throws a wrench into the comparison.
guess I should have been more explicit about this fundamental problem, but then it's no different from the fact that R600 also has more bandwidth, fp16 texture filtering, better texture caches, better hierarchical-Z, independent hierarchical-stencil or more ALU throughput - "the no-AA case on R600 has a squillion fps"...
Yup, and a compressed framebuffer requires you to fetch a block as well. I don't know how big it is, but maybe a 4x4 block is stored together. A compressed tile needs 16 bytes read (for 32bpp), and an uncompressed tile needs 64.
I'm still struggling to understand the patent application in terms of an int8 formatted render target.
For example, in one embodiment the patent application describes how compression is based upon 2x2 pixel blocks, but that 2x2 blocks are aggregated into 4x4 blocks that are stored in memory. Here it seems that 2x2 blocks are a cache optimisation trick within the RBEs, so that when the RBEs are manipulating samples they do the minimum work. So the nature of compression for 4x4 blocks is different...
So, I'm still trying to get my head round it.
I don't think you'll ever have to keep track of multiple unresolved buffers floating around. You should be able to use the compression flags stored on chip.
http://forum.beyond3d.com/showpost.php?p=1021653&postcount=867
There's quite a few useful posts by OpenGL guy there, so make sure to check.
Additionally, D3D10 allows the programmer to access MSAA'd render targets in the form of a "texture resource" where each texel corresponds with a sample - it is no longer a render target. The programmer might choose to access this "MSAA texture" ages after it was generated (in the next frame, for example). So there's no possibility of the compression tags inside the RBEs being kept for some indeterminate time, waiting for the programmer to access "MSAA samples".
My
understanding
of the patent currently indicates that
all the compression information corresponding to the block size is stored
in the render target, in memory. The compression tags inside the RBEs give the GPU a fast path to the compression information (i.e. they're just a "copy" of what's in the render target), so that it doesn't have to read memory in order to find out the compression status of each block.
Jawed