TSMC's explanation is simple and reasonable, just read between the lines of their latest financial CC: because of high customer (i.e. AMD/NVIDIA) demand, they tried ramping the process too fast and therefore skipped on *some* of the metrology to save time. This is classical upper management pressure on engineers, telling them they need to hit a goal even though it's not realistic and it coming back to bite them bigtime. No overly complicated theories needed.
Umm, the people I talk to say they test about once every hour or four (depends on wafer rates on the tool more than anything else). Now if TSMC took metrology to 1/10 of what it was, they should have caught it in a day at most.
On top of that, you can't ramp a process without massive metrology input and feedback. If you are trying to up yields from crap to less crap, you NEED that feedback. Even if management tells the engineers to save a very small bit of time by skipping that step, all they will do is make sure process improvement goes from science to guesswork.
To miss it for multiple months is not plausible. To not test is not plausible. To lessen tests to a degree that this would go undetected is also not plausible. If you have a good explanation for how you ramp a process and new equipment without feedback until the chip is done, let me know, we can make a lot of money on it.
If only AMD was allocated to those not-properly-tested chambers/tools (which I massively doubt), then that'd get a fair bit more suspicious, but even then it'd seem ridiculous to me because TSMC incurred large losses because of this problem. There's no way they did this voluntarily.
[conspiracy hat on] One scenario could be that someone will lose less money by paying TSMC to spike yields on the whole process than they would by their competitor eating them alive in the market.[conspiracy hat off] I am not saying this is happening, nor am I saying it is only affecting ATI, I am just saying that something is really really wrong. The explanations don't add up, or even come close.
Now if they had said, "we are ramping new lines, and during that, XYZ", that would explain why output is not going up, but not why it went DOWN from what it was. Please note I am talking yield as a percentage of die candidates, but overall number of dies coming off the line. That should not go down at all, ever, or at least not down a lot. It did.
You are making a massive conceptual mistake here. What matters is not the percentage of users that need some functionality, it is the percentage of gross profit that derives from it. What you need to compare is the total gross profit you'd get from a gaming-only chip (via higher gross margins) versus the total gross profit from a gaming+HPC chip (via lower gaming gross margins but extra HPC gross profits).
I was just referring to the graphics portion. I agree with what you say on the overall picture, for now. It will be a different game in ~6 months though, but I can't say why yet.
Based on very realistic Fermi HPC revenue predictions, I think from that (correct) point of view, GF100's area efficiency is noticeably *higher* than if it was a gaming-only chip. On the other hand, its derivatives would be noticeably less area efficient if they couldn't remove the functionality but they've indicated they could at least remove half-DP, which is probably the most important single element. GF100 would still be very slightly less power efficient for gaming, but that doesn't look like a big deal to me.
Lets see how they do that. It is going to be funny to watch them spin that one. "It is _THE_ most important thing since the invention of knee pads" said one NV spinner when asked about Fermi, "but it is only important in chips measuring over 500mm^2 because of technical reasons that are 'beyond our scientific understanding'*". Spin till ya puke.
-Charlie
* They actually used that on me when they were trying to convince me that the bad bumps were not catchable at an earlier stage. Really. The other five process/packaging people I talked too all gave me an answer that was within the understanding of then current science, and all five had the same answer too.