andypski said:
#1 No - from the accumulated information R300 has nearly twice the bandwidth available per-pipe-per-clock when compared to NV30 - both have 8 pipes, with core and memory clocks closely matched, but R300's memory bus is twice as wide. Not a difficult calculation
310Mhz memory on R300 PRO vs 500Mhz memory on NV30. 256-bit vs 128-bit. 19.8Gb/s vs 16Gb/s or only 23% more bandwidth. As I said, roughly comparable. These parts are only "unbalanced" if you consider the pathlogical single-texturing no-pixel-shader scenario, which isn't very interesting. There is no such thing as a truly balanced card. You are either fillrate limited or bandwidth limited or T&L limited or CPU limited. No one has shipped a system where all the limits line up and happen at the same time.
And please, let's not talk about a hypothetical ATI card using 1Ghz memory yet. Let's compare currently announced products.
#2 Depends on the situation, but generally speaking as shaders tend to get longer you are correct. Whether texture fetches dominate is largely down to the type of filtering applied, and the pattern of access. I agree that generally you would expect a 32-100 instruction shader to be largely calculation bound.
Point being, the "32-bits per pipe" is not the limiting factor. Yes, to write out the final fragment, you need 2 or more clocks potentially, and you also need some bandwidth up front for rejection, assuming no cache hits.
The shader itself, unless you are talking single textured pixels and no per pixel lighting of any sort, will execute in way more than 2 clocks. Even simple diffuse/specular shaders are going to eat 4-8 clocks, and so those memory accesses are going to be hidden by the shader's execution.
On old legacy games, in single/dual textured scenarios with no per-pixel anything, these cards already have such ridiculously high fillrates that even when they hit their bandwidth limits, they are well above 100fps at high resolutions and high AA, so it's a moot point.
So yes, Counter-Strike won't hit 4gigapixels of fillrate on the NV30, but when the NV30 does hit its bandwidth wall, the game is already running at ridiculous rates. The pixel fillrate isn't the important thing anymore, it's the shader fillrate.
The NV30 may hit 4+ giga-shader-ops/s, which is the more important figure for DX9 cards. Unless you want to resort to the old redherring of "no DX9 games, so who cares about shader performance. ", but then there's not much point talking about programmability at all. That will be the resort of some: "well, I only care about old single textured and dual textured game performance!"
If you think programmability is the future, then obviously, the speed at which you can execute programs is now the important figure. The Pentium4 and Athlon do not have enough bandwidth to write out one register to memory per cycle. Yes, we do not speak about CPU bandwidth limits. As things become calculation bound, the external bandwidth will be less of the determining factor in overall performance, and memory latency or bandwidth problems can be handed by pipelining and prefetching.
I mean, utilizing the "32-bit per pipe" argument, these cards are both ridiculously bandwidth limited in the 128-bit FP texture/framebuffer scenario. But the fact is, if you are using 128-bit FP buffers, you are most likely running significant shaders.