This is an amusing thread...
I love the title:
"NV40: 6x2/12x1/8x2/16x1? Meh. Summary of what I believe"
You've done it again, Uttar...
Among the more amusing of my observations in the thread:
*I'm not sure what the number of instructions in a shader chain has to do with the number of pixel pipes in a gpu. It seems to me that whether there's 1 instruction in the chain, or 10,000, the number of physical pixel pipes in the gpu is fixed at a static, absolute number, and describes the maximum number of pixels per clock the gpu may render to screen under any conditions. Unlike the software relative to shader instruction chains, the number of pixel pipelines in the gpu is a physical property of the gpu and is quite fixed and absolute, and quite distinct from software, it seems to me.
*Hence, how is that "6x2/12x1/8x2/16x1"...might be considered to all be descriptions of the same gpu? For instance, I cannot see how a gpu might be described as "6x2" and "12x1" at the same time, or else simultaneously be both an "8x2" or a "16x1" gpu. The term "6x2" breaks down as follows:
The first number, the "6," tells us how many pixel pipelines are in the gpu, and the second number, the "2" in this case, tells us how many texturing units are attached to each of those 6 pixel pipelines. So, "6x2" tells us the gpu has 6 pixel pipelines, and no more or less than 6, and that each of those pixel pipelines has two texturing units attached to it. This tells us in total that a 6-pixel pipeline gpu may generate, at most, 6 pixels per clock, to which either 0, 1, or 2 texels may be attached to each pixel rendered per clock.
Therefore, it is physically impossible for a "6x2" gpu to ever be "12x1", since in the first case the gpu has 6 pixel pipelines and in the second case it has 12, and a gpu cannot be both, obviously.
However, if we assume that the gpu pixel pipeline organization is actually 12x1, which is to say it physically has 12 pixel pipelines to begin with, to each of which 1 texturing unit is attached, then such a 12 pixel pipeline gpu could function thusly per clock: 12x1/12x0/6x2 (in the last case, 6 of the 12 pixel pipelines are used to render 1 pixel to which 1 texel is attached, and the other 6 pixel pipelines are used to only render a texel by way of their attached texturing units, so that 6 pixels per clock, to which 2 texels are attached to each, is rendered per clock, for 6x2.)
The relationship between 8x2 and 16x1 is exactly the same as the one described above for 6x2 and 12x1. So...what's actually being said here is that nV40 is either 12x1 or 16x1, but obviously it cannot be both. And, depending on whether it is 12x1 or 16x1, that will determine whether it is cable of either 6x2 or 8x2 when multitexturing, and again, obviously, both are not possible in the same gpu. So, the actual statement I am assuming Uttar meant to make is that he isn't sure whether the nV40 is a 12x1 or a 16x1 pixel pipeline organization, but it necessarily has to be one or the other (I lean to thinking nv40 8x1 or 8x2, but that's neither here nor there at the moment...
).
In other words, it just isn't possible to state that "it doesn't matter" what the pixel pipeline organization for nV40 is, since without knowing that number, which correlates to the physical architecture of the gpu, I don't think it would be possible to rationally discuss any of the performance characteristics of the gpu. Basically, once you have determined what the physical pixel pipeline organization of a gpu is, you can work backwards from there to factor in conditionals, such as shader instructions, trilinear filtering, texels attached per pixel, and so on, to understand what the likely impact of those conditions on performance may be. If you don't know what the fixed pixel pipeline organization is, then it seems to me you cannot figure anything else relative to performance, either...
(As an aside, this exactly corresponds to the initial frustration I felt in trying to decipher nV30's performance. Once it became clear that the organization was 4x2, instead of 8x1, the picture at last began to make sense.)
*The term "double pumped" escapes me as to how it applies to pixel pipeline organization in a gpu. In DDR ram and cpu fsb's, "double pumped" refers getting data on the rising and falling edges of the clock, instead of on a single edge, with the effect of getting 2x as much data per clock as is possible with a SDR. How is this related to pixel pipelines in a gpu?
I mean, it isn't possible to "double pump" pixels per clock and to get two pixels per clock out of a single pixel pipeline per clock, is it? So, I've no idea of what's talked about with the term "double pumped" used to describe pixel pipeline organization in a gpu. As I understand it, you can get an absolute maximum of 1 pixel per clock from each of a gpu's pixel pipelines. Hence, a gpu with 6 pixel pipes could generate a maximum of 6 pixels per clock, but never twelve, since I don't see how that would be physically possible. Eh?
This is an amusing, if confusing, thread...