True. I didn't mean to nitpick, just to point out that while GT200 has 2x the bandwidth of G92, it generally has less than 2x as much of everything else, so it may be more accurate to say that if (when) G92 is b/w limited, GT200 may also be, but to a lesser extent.
But I don't know how a GPU functions at a low level (specifically, texture caching), so I'm probably wrong or oversimplifying when thinking that if G92's TF units are bandwidth limited and b/c GT200 gains +25% TFs but +155% b/w, then its TF units aren't (as b/w limited). I'm probably forgetting/ignoring that ROPs are the main b/w consumers, and that GT200's increased ROP count tracks closer (than TF) to its increased b/w. So, if ROPs are the main b/w consumer, then GT200 is pretty close to ~2x G92, as you said.
For instance, G92's texture power is indeed "overwhelming," but does it overwhelm available bandwidth or are you just saying it's wasted relative to the power/speed of the rest of the chip?
I was guessing that ALUs don't require too much bandwidth, and are therefore not bandwidth "limited," b/c something like 3DMark's Perlin Noise test shows the 8800GT
is faster than the 9600GT by the exact percentage of its theoretical FLOPS advantage: 62%. And if GT200 is scoring
300 while G92GT's scoring
155, then we're also seeing an improvement that tracks about 1:1 with the increase in ALUs (if not with FLOPS, but I don't know if the GT200 is better at flogging that extra MUL, or if this specific test will even give it the chance).
Yeah, a GPU is the sum of its parts, but the word bottleneck exists for a reason, doesn't it?