MuFu said:
Remember that it has 175 million transistors and nV have seen fit to give it 50GB/sec+ memory bandwidth. Current rumours are that it can work on 4 quads in certain situations. That would seem to suggest that it's an 8x2/16x0 design (in the same way that NV35 can be thought of as 4x2/8x0) and may have approximately twice the pixel throughput of NV35, per clock. I've heard that PS performance is already well above current parts, even on the A0 samples. VS shows less of an improvement; maybe they have just incorporated a single, extra VS unit.
They might ramp with A1; it's apparently not as clock-limited as they thought it might have been.
MuFu.
I don't understand how you can think of nv30/5/8 as anything but 4x2. The information nVidia has stated plainly is that only 4 color pixels per clock can be rendered to screen--under no circumstances may 8 color pixels per clock be rendered to screen, regardless of whether or not a texel is attached to a pixel. "8x0" means to me "8 color pixels rendered to screen per clock without texels attached."
"8 black & white z-pixels per clock" rendered internally in nV3x, which is actually what nVidia claims, does not equal "8 color pixels per clock rendered to screen without texels," it seems to me. This information, coming directly from nVidia, indicates nV3x has a maximum of 4 pixel pipelines, and may not render more than 4 pixels per clock to the screen, regardless of whether there are 0,1, or 2 texels attached to those pixels. So I would label the pipeline organization of nV30/5/8 as "4x0 or 1 or 2," depending on the software demands.
R3x0, likewise is "8x0 or 1," depending on software, and in the case of multitexturing software is able to use 4 of its pixel pipes for texel generation sans pixels, and becomes 4 (pixels)x2 (texels attached to each pixel) per clock rendered to screen. I don't see how this forumula applies to nV3x, because nV3x has a ceiling of 4 pixel pieplines, and R3x0's is 8.
It seems to me that if nV4x is capable of 16x0 per clock, it must have 16 pixel pipelines. I consider 8 (pixel pipes) x2 (texel units per pipe) per clock much more likely than 16 pixel pipes. While I can't see how an unused texel unit attached to a pixel pipeline can be used for per clock, render-to-screen pixel generation, it's easy to see how a full pixel pipe may be used exclusively for texel per clock creation (since a texel is a sub unit of a final pixel, and texels are never rendered to screen independently of pixels.) IE, there's a big difference between texel units and pixel pipes, IMO.
Basically, in R3x0, multitexturing uses 4 of its total of 8 pixel pipes for the creation of 4 pixels per clock rendered to screen, but it uses all 8 of its texel units per clock, each of which is attached to a pixel pipeline. nV30/5/8 cannot do that, because they have only 4 pixel pipelines, and so can only render 4 pixels per clock to screen, whether 0, or 1, or 2 texels are attached per clock per pixel. The difference is that in single texturing, R3x0 can do 8 (pixels per clock) x1 (texel per clock per pixel), but nV30/5/8 can do only 4 (pixels per clock) x1 (texel per clock per pixel.) So I just can't see how nV30/5/8 might be accused of the "8x0" organization you mention, since that would mean they would have to be able to generate 8 pixels per clock to screen, but nV30/5/8 have only 4 pixel pipelines, so that won't work.
As to "175 million" transistors having any sort of performance bearing, I can't see raw numbers as being relevant (even assuming the current rumor is correct), except peripherally to yields/heat/power/clocking considerations. Otherwise, simply reciting the raw bulk transistor count is about as meaningful, or as accurate, as declaring that because a GM engine has "more parts" than a Ford engine, it will be the faster engine. "It's not the size of the boat, but it's the motion of the ocean that counts," as the saying goes...
Likewise, it's not the number of transistors in a chip that counts for performance--rather, it's what the transistors do, and how efficiently they do it, that makes the performance difference. IIRC, nV3x has more transistors than R3x0, and is a lot slower at many things.