Voltron said:
This is ridiculous. You are saying Kirk is wrong because the extra transistors in the 6600 went to bandwidth saving techniques? If that were the case, then why does it outperform the nv35. A simple look at benchmarks shows that the 6600 has massively improved shaders. Of course it architecturally different. Thats my point. The NV30 architecture sucked, but 128-bit is more than sufficient when the shader performance is there. Or perhaps the bandwidth saving techniques were originally intended for NV30, but didnt make it there or were broken.
A statement like "128-bit is more than sufficient when the shader performance is there" makes little sense. The faster the shader throughput is the more likely you are to be bandwidth limited, hence you need more bandwidth, not less. So you could more logically state "128-bit is more than enough when your shader performance is
not there".
I think that NV35 versus 6600GT is very apples-to-oranges anyway - too many architectural differences to draw many conclusions.
Balances change over time - back in the R300 timeframe the effective length of shaders was low (think UT2003-style rendering), so the effective throughput of pixels per clock was pretty high. As such a 256-bit bus on the slow memories of the time was by no means overkill for the predominant rendering techniques, as demonstrated by the large lead 9700 Pro typically had over 9500 Pro when antialiasing was applied.
Move forward to today and shaders become longer, typical pixels per clock for the same number of pipelines can decrease, and bandwidth requirements can therefore actually drop rather than increasing.
So to get back to the point, 128-bit was/is enough by what measurement? 256-bit clearly gave sufficient advantage at the time to be worthwhile at the high end - 9700 Pro scaled pretty well from 9500 Pro, so given a like:like architecture there was opportunity there. Most particularly 9700 was aimed to perform well with AA, and it did, but the bandwidth of the 256-bit bus was important to attaining this level of performance.
A 6600GT with 128-bit memory does pretty well against a 9700Pro with 256-bit memory at AA tests, but it should since it actually has more available bandwidth. The only way to get this level of bandwidth at that time was to double the bus width - the faster memory simply wasn't there.
Could a 6600GT go significantly faster with 256-bit memory and AA? Maybe. Does it make sense for the price/performance point you are aiming to hit? Maybe not. In the high-end at any given time the tradeoffs will be different. Saying "128-bit is enough" could just be viewed as a marketing way of saying "Uh-oh, we don't have 256-bit - My god! Look, over there, a three headed monkey!".
Going back to when the original statement about 128-bit versus 256-bit was made, if you truly believe that 128-bit is enough and that you're not bandwidth-starved compared to the competition then you wouldn't then need to clock the 128-bit memory on your high end part so much faster than the typically available memory of the time that you get appalling yields would you? Of course, if other factors are contributing to poor yields as well (like needing to massively overclock the core to be competitive) then you might be able to scrape enough fast memory together to go with the few parts that will clock at those speeds.