DemoCoder said:
However, If I am not using pixel shaders, then 1600x1200x32 4XFSAA 16xANISO @ 60fps is currently possible even with large depth complexities, so why should I want more raw pixel fillrate? Unless I am doing multipass, I don't want it.
There is no way 1600x1200x32 4XFSAA 16xANISO @ 60fps is sustainable on the NV30 or the R9700. Even
averages are lower, (we are talking sub 40 fps averages on UT2003 Antalus flyby, which is much lighter than actual gameplay), even with the modest demands of todays games. Traditional fillrate is not a solved problem, particularly with the growth of LCD displays that you want to run at native resolutions, and which in the future might even support decent framerates.
We want high
minimum framerates here, mind you, and at absolute numbers that ensure excellent responsiveness and precision at all times, not relating to whether a static word document would be a flickering hell. "Barely good enough as long as you don't really care" is not much of a goal to shoot for.
And, as you are well aware, there are a number of rendering techniques that might see wide use in the future if the fillrate is there to support it.
I bet to differ. The NV30 is clocked at 500Mhz so that it can execute shaders faster, not so that it can achieve a 4gigapixel single texturing fillrate. The Pentium4 doesn't have enough system bandwidth to write out one longword per cycle either. If the NV30 were clocked at 250Mhz, it could write 1 pixel from 1 pipe every cycle.
This is your main point, and it is valid. Fillrate alone is not the only measure of performance, and increasing the processing capabilities of the core increases performance for core resident tasks.
Whether such tasks will be what is limiting the overall performance of the NV30 on the programs it will run during its' lifetime is another question that needs to be considered.
It's not the 128-bit bus that that constrains the fillrate, that has nothing to do with it, it is the ratio between the core clock and the memory clock. If you just blindly divide the bus width by the pipeline width * # pipelines, you are doing the wrong calculation.
It is exactly the right calculation
assuming the ratio of core clock and memory clock is constant (as it pretty much is in the R9700 vs NV30 case). Obviously width-per-pixel-pipe and clock ratios are independently variable. There are other issues as well for that matter, such as latencies, prefetching effectiveness, buffer sizes, cache hitrates, et cetera ad nauseum.
We need benchmark data.
You might see a 1Ghz or 2Ghz GPU in the next few quarters, but you won't see a corresponding increase in bandwidth, the shader execution rate will have quadrupled. Is that so horrible? Memory bandwidth is expensive, so I maintain it is far better to be bandwidth limited since it is cheaper to increase shader op rate.
Again, this is a solid point, but there isn't data around to support it. In the talks surrounding GDDR3 (Ack! Pfft!), datarates of 1.4Gbits/pin were mentioned. That's 2.25 times what the R9700 uses today if they retain a 256-bit bus, in nVidia goes 256-bit wide we are talking a total bandwidth factor of 2.8. And this is in less than a year for sure, which I doubt is the case for 2GHz GPUs.
Besides, GPUs are basically SIMD ASICs (with very deep pipelines). As soon as you step away from problems that fit into that paradigm, you will see performance plummet. There are a lot of reasons why CPUs today run at 2-3 GHz with half a MB of full speed cache and tons of logic devoted to scheduling instruction flow. People who think GPUs should take over the tasks of CPUs have some rude awakenings in store. Your architectural comparison is invalid. The two processor classes have different jobs to do.
Entropy