horvendile
Regular
I've discovered that I, being a layman, don't understand some basic concepts.
Premises: In a traditional 4x2 design, single texture pixel fillrate is about equal to dual texture pixel fillrate. When going to three textures, pixel fillrate drops by approximately 50% due to the extra, er, not pass, thingy (loopback?), needed. From there, there is no significant performance drop when going to four textures. And so forth. All this because of the second TMU, that sometimes is used, sometimes not.
Correct?
What I don't understand is why the performance does not drop more going from one to two textures (or three to four). Don't we still have to read the second texture from memory? Isn't memory bandwidth a limitation? (For the sake of clarity, let's limit ourselves to "yesterday" cards.)
Admittedly, I don't know which memory operations are required for rendering a pixel. But could it be that there is so much "fixed" bandwidth need (z-buffer, write to framebuffer and whatnot) that an extra texture read doesn't do that much of a difference? Some of the numbers in the 4x2 or 8x1 thread seem to indicate this, with small but noticeable performance decreases for, say, the Ti4600 where there should be none (?) from a pure TMU standpoint. Without pen and paper and just from my memory, I also think that fits quite well with how the 9700 and 9500 Pro behave.
But, I'm just guessing. Anybody care to set me right?
Premises: In a traditional 4x2 design, single texture pixel fillrate is about equal to dual texture pixel fillrate. When going to three textures, pixel fillrate drops by approximately 50% due to the extra, er, not pass, thingy (loopback?), needed. From there, there is no significant performance drop when going to four textures. And so forth. All this because of the second TMU, that sometimes is used, sometimes not.
Correct?
What I don't understand is why the performance does not drop more going from one to two textures (or three to four). Don't we still have to read the second texture from memory? Isn't memory bandwidth a limitation? (For the sake of clarity, let's limit ourselves to "yesterday" cards.)
Admittedly, I don't know which memory operations are required for rendering a pixel. But could it be that there is so much "fixed" bandwidth need (z-buffer, write to framebuffer and whatnot) that an extra texture read doesn't do that much of a difference? Some of the numbers in the 4x2 or 8x1 thread seem to indicate this, with small but noticeable performance decreases for, say, the Ti4600 where there should be none (?) from a pure TMU standpoint. Without pen and paper and just from my memory, I also think that fits quite well with how the 9700 and 9500 Pro behave.
But, I'm just guessing. Anybody care to set me right?