Reputator, those are pretty good points.
For the decoupling, I wasn't so much referring to physical layout as much as the dependencies within the pipeline. In G7x the arithmetic units are used to (partially?) calculate the texture address, and if a texture load is stalled due to bandwith or other reasons, the pipeline will stall irrespective of non-dependent math ops that could be done. At least that's the picture painted by GPUBench.
You could still have this type of dependency with strange TEX:ALU ratios for the GTS if the scheduler was up to it, and from the similarity between G80 and G71 in results across the shader tests (aside from that beefy 2-2.5x scale factor
), I'm guessing that's indeed the case. More tests are needed, though, as it's not very solid proof.
Nonetheless, I definately think that the TEX:ALU ratio is approximately the same as G71. We'd definately see at least
some differences from test to test in Shadermark. Just look at how varied the per-test improvements were from R520 to R580.
(EDIT: Wait, I made a mistake in looking at the Archmark numbers. The bilinear numbers aren't much higher on G80. But the texture laden tests in Shadermark double with G80. Hmmm...)
I considered 64 dual issue vector shader pipes as well, but if they were running at 1350 MHz (which I admit isn't confirmed yet), G80 would be around 5x the speed of G71. But at 575 MHz it makes perfect sense. I'm skeptical about the fully unified architecture (i.e. VS/PS). It doesn't seem to mesh with their general philosophy. Another possibility is that there are 32 MAD+MUL vector shader pipes for the PS, 24 in the VS, and 8 in the GS. If all ran at 1350 MHz, you'd get similar performance to what we're seeing, and it'd be 128 total vector ALUs.
We'll see soon enough...