I have produced an alternative picture to better-illustrate the layout as I understand it:
silent_guy said:
I see it differently. I see multiples of 6x everywhere.
Take the left strip of what you have marked as US1-4. The left side of it has 12 small rams. The right side also.
The right strip has also 12 rams per US on the left side. Only the right side of the right strip has 4 RAMs. You say that it has a 16-pipe texture unit, so this could explain the multiple of 4.
A factor of 6 is quite unusual. Connect this with the fact that there are 48 pipeline and it becomes fairly obvious to me.
As you can see from the picture above, the die looks like it's easily split into four groups of four, each block being a "quad-pipeline".
It think it's highly unlikely. Going forward, I'm sure Microsoft plans to shrink this chip to smaller technologies. A 200mm2 die in 90nm becomes 100mm2 in 65nm, resulting in much better yields. Go to 45nm and you're at 50mm2, at which point the redundancy really becomes pointless.
I doubt M$ can do those shrinks, so ATI will be contracted to perform them. There's no reason why redundancy couldn't be revisited at each node. Is there?
Even more unlikely: the math simply doesn't add up. Redundancy pays off much more when you can selectively enable or disable small portions of a die, like a row or a column of a RAM. (Or, like the PS3 cell with 1 redudant SPE out of 8.) A 33% redundancy won't help if you have 2 defects in 2 different pipelines.
Cell is partly the basis for my view of R580 redundancy - an entirely dark SPE is no different, conceptually, from an entirely dark shader array in Xenos
In Cell each SPE is 21M transistors, 14M RAM and 7M of logic. Each SPE is ~6% of the die (taken from a die photo I have here).
In Xenos each shader array is ~8% of the die.
In R580, if my estimates are reasonable, 8.3% of the die is given up in redundancy - 2M transistors per pipeline with four pipelines being redundant per shader unit, equals a total of 32M transistors (out of 384M). The 2M figure actually needs revising down - because it's based on 64M extra transistors in R580 over R520, corresponding to an extra 32 pipelines, which doesn't include the redundant 16 pipelines that I'm hypothesizing, nor other tweaks such as Fetch-4... But, anyway, 8% seems like a reasonable starting point, even if it's high.
I think it's pretty interesting that all three come out in the range of 6-8%
Will Sony revisit 1-SPE redundancy in PS3's version of Cell, in the 65/45nm future? Whilst Cell is on 90nm, what percentage of all Cells produced are going into PS3? 99%?...
Let's assume for a momenet that there were 64 shaders: it means there must be a number of dies coming out of production with all pipes operational. ATI would be crazy not to market those as some kind of ultra high performance $800 card that beats the competitor silly.
If the despatcher is architected to issue 4-phase x 12 fragment threads (48 fragments per thread), it might not have the capability to issue 4x16 threads (64 fragments per thread).
That's also why I don't think that the G70 has a dark quad, as suggested by geo and others: the majority of the dies go to the relatively high-volume 7800GT and 7800GS products which have 1 defective quad. Only the few parts with all quads working go to the GTX, which you price high enough to prevent mass demand. Makes perfect sense: you never waste silicon and get highest return per wafer.
ATI used to do this with X800. All I'm suggesting is that ATI's moved on to a yield model that tries to ensure that every die works, with redundancy hidden inside the die due to the massively parallel nature of the architecture - rather than relying upon binned (or trashed) dies.
I'm not saying every R580 on a wafer will work, simply that it's possible to cut the yield cake in different ways to get ~ the same margins.
Jawed