leoneazzurro
Regular
True. I didn't mean to nitpick, just to point out that while GT200 has 2x the bandwidth of G92, it generally has less than 2x as much of everything else, so it may be more accurate to say that if (when) G92 is b/w limited, GT200 may also be, but to a lesser extent.
What I tried to say is that, if G92 is bandwidth limited, there's no chance to see in normal conditions (i.e. where the framebufer does not become the limit) a "more than 2x" scaling with respect to G92, as some seem to suggest.
But I don't know how a GPU functions at a low level (specifically, texture caching), so I'm probably wrong or oversimplifying when thinking that if G92's TF units are bandwidth limited and b/c GT200 gains +25% TFs but +155% b/w, then its TF units aren't (as b/w limited). I'm probably forgetting/ignoring that ROPs are the main b/w consumers, and that GT200's increased ROP count tracks closer (than TF) to its increased b/w. So, if ROPs are the main b/w consumer, then GT200 is pretty close to ~2x G92, as you said.
GT200 is not +155% bandwidth, is +112%. Yes, the TF units are not so bandwidth limited as the GT92's ones, but are only +25% more in count (and a little less taking in account the frequencies), so they cannot sustain double the throughput.
For instance, G92's texture power is indeed "overwhelming," but does it overwhelm available bandwidth or are you just saying it's wasted relative to the power/speed of the rest of the chip? I was guessing that ALUs don't require too much bandwidth, and are therefore not bandwidth "limited," b/c something like 3DMark's Perlin Noise test shows the 8800GT is faster than the 9600GT by the exact percentage of its theoretical FLOPS advantage: 62%. And if GT200 is scoring 300 while G92GT's scoring 155, then we're also seeing an improvement that tracks about 1:1 with the increase in ALUs (if not with FLOPS, but I don't know if the GT200 is better at flogging that extra MUL, or if this specific test will even give it the chance).
Yeah, a GPU is the sum of its parts, but the word bottleneck exists for a reason, doesn't it?
I'm saying it's in a certain way wasted in G92 (and Nvidia agrees, it seems, because they designed a chip that should outperform G92 based cards by a factor of 2 with only +25% of the TU). Perlin noise is a very ALU intensive shader, but normally shaders in the games are not so ALU intensive - otherwise R600 architecture had fared better in comparison to G8X/9X . So Perlin noise (not being BW limited) shows the improvements in the shading power, whereas in the vast majority of the real gaming cases the improvements are not 100% but much, much lower. Then I ask, why? And this comes to the bottleneck argument, leading me (and not me alone) to believe that this is a bandwidth issue in these cases (except when it's more a framebuffer issue, like in high resolutions +AA).