BenSkywalker said:
As one could argue is any with free AA.
Wrong. You completely ignored every reason I gave why this is false, and also ignored the analogy I gave.
BenSkywalker said:
Mintmaster said:
Why would you restrict the No-SSAA rendering to one quarter the speed the hardware is capable of?
You can make the same argument for spending a hundred million transistors or so on eDRAM- that is the point I was making.
No you can't, because eDRAM speeds up No-MSAA rendering, and eDRAM does not cost 4x the area of all non-vertex hardware.
Which is the case the majority of the time.
And this affects only a part of the 25M logic on the daughter die, let alone the full 105M. Maybe 3% of total transistors.
I could have simply pointed you back to the 6600GT benches showing that a part with a 128bit memory bus becomes less bandwidth limited enabling 2x MSAA.
Yet
again you rant about the one single data point that deviates a mere 4% from the expected result.
Of course you can show benches where any part takes very little performance hit running AF, you can with AA too.
Yet that's exactly what you did with the 7800GTX to "prove" your point. Pick a benchmark with low AF hit for the 7800, pick one with a high hit for the 6600, and conclude that more ROPs mean lower hit.
We don't have NVidia chips where ROP:TMU is the only difference. The best I can show you is 6600GT vs. 6800 (
here and
here). The latter has 3 times the ROPs, but has a higher AF hit.
No, Dave isn't. Dave is saying with a different hardware configuration performance characteristics will be different.
You have a short memory.
Dave Baumann said:
BenSkyWalker said:
If the chip needed extra cycles for AF then having additional ROPs could grant them 'free' AF under most circumstances.
No. If you are limited by the texture samples then you are limited by the sampling end of the chip (texture samplers) not by the pixel output end of the chip (ROPs), so having extra ROPs isn't going to make much difference in this case. If you want cheaper AF then you want more texture sampling capabilities.
That makes absolutely no sense. The extra cycles of AF are going to stall the sampling units of a pipe- no matter how many ROPs there are.
You've obviously been paying no attention to everything I've said.
4 ROP, 8 TMU, single texturing from one mipmap.
No AF: 4 pix/clock.
2xAF: 4 pix/clock.
0% hit.
8 ROP, 8 TMU, single texturing from one mipmap.
No AF: 8 pix/clock.
2xAF: 4 pix/clock.
50% hit.
Not unless you are talking about the particular layout of the current NV4x based parts. As Marco made mention of, the sampling units right now are tied to the shader ALUs.
Nope, I'm talking about all hardware. For NV4x based parts, the
address calculation is tied to the shader ALUs. In all hardware, the filtering is elsewhere, the texture cache is elsewhere, the memory controller is elsewhere. Read the last paragraph
here (I'm talking about single cycle trilinear in that post, but the same argument applies to accelerated AF).
Also- you seem to want to compare a 16ROP part in terms of percentage performance hit with one of lesser ROPs and look at the relative performance. What you need to do is look at the absolute numbers of the 16ROP part compared to the 8ROP parts. The absolute performance enabling AF isn't close.
*sigh* That's because every 16ROP part has more TMUs than 8ROP parts. And relative performance is all that matters with respect to the thread topic: AF performance. You claim that we can speed up some non-AF pixels to increase absolute framerate. Bravo, Einstein.
-------------------------------------------------------------------------------------
BenSkyWalker, you're just arguing for the sake of arguing. You have ignored most of the things I'm telling you, and are just wasting my time. Unless you miraculously put forth an intelligent post, I'm going to ignore you from now on in this thread.