NVIDIA Fermi: Architecture discussion

Frankly, I can't see Nvidia making a 384-SP chip, because that would mean too many distinct DX11 dies:

GF100 with 512SPs, one with 384SPs, one with ~256SPs, one with ~128SPs, one with ~64, one with ~32, and probably one with ~16 to replace the GeForce 310. Hopefully they won't replace the 8-SP GeForce 205.

I guess they could just skip the 32-SP part, but still...
Fermi-based GPUs probably won't go below 32 SPs at all.

Leaked or not, they are possible :)
Yeah, 40 ROPs on 5870 especially.
 
Management just doesn't simply overide procedures for benefits of thier clients without justifiable reasons to do so. Its that simple. And the reason's of pressure from the client isn't enough.

ATI still has yield issues, although getting better. And where do you get 2 times the size? You know Fermi's top end chip isn't 2 times the size of AMD's top end right?

Yield issues, yes, anything less than 100% could be considered such. The yields now are fine from what I hear.

As for numbers, Cypress is ~334mm^2, Fermi is 23.x mm * 23.y mm, so depending on what x and y are, that could range from 530 to rounding error under 276mm^2. I had heard one of the dimensions was 23.8mm, but I can't hard confirm that.

If you use 23*23, 23.5*23.5 and 24*24 for low/middle/high, you have Cypress being .63, .60 and .58 the size of Fermi. Lets use .6 for the sake of round numbers, so if yields are linear WRT die size, Cypress will have notably less bad parts. If you model it using the assumption that defect rate goes up with the square of die size, things get downright ugly for Fermi if ATI has anything less than amazing yields.

Edit: Forgot the NVIO chip. That could increase costs and lower assembled end product yields a bit. It is only tangentially relevant to the discussion though.

-Charlie
 
Last edited by a moderator:
silent_guy's first paragraph in his latest reply puts it much better than I ever could, but I just want to make sure this is a typo/brainfart: you do realize that DFM is, by definition, not a *rule* - right? DFM manuals are full of *suggestions*. If you don't follow any of them, chances are your design is going to be a disaster yield/leakage/etc.-wise and if you follow every single one of them in the most conservative way possible, chances are you'll leave some area/power on the table but get extremely good yields (assuming everything else works out too ofc).

If your scenario is really that TSMC told to NV that they really should bother following these few extra DFM points and NV says they do not believe it's worth the extra engineering effort and/or area, then that's a perfectly normal thing to happen and it's ridiculous to make a big deal out of it - if it was the sole reason why they needed 2 respins then maybe, but I'm still skeptical that's true (and while you do seem to have heard that it was a problem, you seem to phrase everything as if you concluded that by logical elimination - which, as silent_guy clearly explains, doesn't work here)

Of course, you probably know this and you simply typed 'DFM rule violation' by mistake - if so, just ignore what I said here. Better safe than sorry though...

I don't know exactly what the rules violations were, but the end result is clear, the thing was horribly late for what should have been a relative no-brainer. Where i guess/try to eliminate possibilities, I say so. That said, none of the 'experts' who deny my premise can come up with a better reason.

It is easy to shoot down a theory, no question there. It is hard to find real answers. I have seen a lot of sniping, but so far, no one can explain why it took so many spins (and yes, I was wrong, it was 2, not 3. :( I don't fact check every post externally).

-Charlie
 
Yield issues, yes, anything less than 100% could be considered such. The yields now are fine from what I hear.

As for numbers, Cypress is ~334mm^2, Fermi is 23.x mm * 23.y mm, so depending on what x and y are, that could range from 530 to rounding error under 276mm^2. I had heard one of the dimensions was 23.8mm, but I can't hard confirm that.

If you use 23*23, 23.5*23.5 and 24*24 for low/middle/high, you have Cypress being .63, .60 and .58 the size of Fermi. Lets use .6 for the sake of round numbers, so if yields are linear WRT die size, Cypress will have notably less bad parts. If you model it using the assumption that defect rate goes up with the square of die size, things get downright ugly for Fermi if ATI has anything less than amazing yields.

Edit: Forgot the NVIO chip. That could increase costs and lower assembled end product yields a bit. It is only tangentially relevant to the discussion though.

-Charlie


so you are guessing its 2 times even though the numbers you just put up isn't two times and aren't actual numbers? :D

I will agree with you it is larger then Cypress but not as large as you are suggesting.
 
If you model it using the assumption that defect rate goes up with the square of die size, things get downright ugly for Fermi if ATI has anything less than amazing yields.

Your units are already mm^2 so defects are already scaling quadratically. What would be the physical mechanism behind defects scaling with the square of an area? Most defect models are bounded by a Poisson model which underestimates actual yields. I have never seen a paper on defect modeling where defects are a function of area, rather than assumed constant by some upper bound.

Perhaps you meant yield, not defect. I'd expect both AMD and NVidia chips to have the same # defects per area. The question then becomes, the probability distribution of good chips, which is influenced by how easily defects can be tolerated or dealt with. (With a simple Poisson model, assuming only good chips are those with zero defects, you end up with yield being bounded by exp(-DefectRate * Area), so YieldAMD = exp(-DefectRate * AreaAMD) and YieldNVidia = exp(-DefectRate * 1.6 * AreaAMD), so YieldAMD/YieldNVidia = exp(DefectRate * 0.6 * AreaAMD), or something like 2.7x better yields for AMD. That's roughly 1.6 * 1.6, so inline with a "square of area" model, but that's yields as a square of area, not defects as a square of area.

I should underline that it well known that a simple poisson model underestimates actual yields by a big margin.
 
It is easy to shoot down a theory, no question there. It is hard to find real answers. I have seen a lot of sniping, but so far, no one can explain why it took so many spins (and yes, I was wrong, it was 2, not 3. :( I don't fact check every post externally).

-Charlie

So, the base of your theories are false facts? :LOL:
And what is so bad about a second respin when you can defeat the physical laws of our world?

Normally, you would do what ATI did with the 3870X2 and downclock it a bit. Not much, just a little, and take the power savings. The problem is that the 260 loses already to the cheaper 4870, and two of them wouldn't be much of a fight against the 4870X2. At a minimum, you need 2 260-216s, or better yet two upclocked 260-216s to pip the 4870X2 and claim a hollow victory.
NV is in a real bind here, it needs a halo, but the parts won't let it do it. If they jack up power to give them the performance they need, they can't power it. Then there is the added complication of how the heck do you cool the damn thing. With a dual PCB, you have less than one slot to cool something that runs hot with a two slot cooler. In engineering terms, this is what you call a mess.
http://www.theinquirer.net/inquirer/news/1028302/nvidia-270-290-gx2-roll



That brings us to the GX2/dual card. Suppliers tell us that it is quite dead. We thought it was an impossible thing to pull off when we first analysed the part, and it looks like several universal physical constants agreed with us. Thermodynamics is a bitch.
http://www.theinquirer.net/inquirer/news/1018019/nvidia-270-290-deep-trouble

And there is a good chance that fermi doesn't need nvio anymore. Look at the pictures, no NVIO!
 
Your units are already mm^2 so defects are already scaling quadratically. What would be the physical mechanism behind defects scaling with the square of an area? Most defect models are bounded by a Poisson model which underestimates actual yields. I have never seen a paper on defect modeling where defects are a function of area, rather than assumed constant by some upper bound.

Perhaps you meant yield, not defect. I'd expect both AMD and NVidia chips to have the same # defects per area. The question then becomes, the probability distribution of good chips, which is influenced by how easily defects can be tolerated or dealt with. (With a simple Poisson model, assuming only good chips are those with zero defects, you end up with yield being bounded by exp(-DefectRate * Area), so YieldAMD = exp(-DefectRate * AreaAMD) and YieldNVidia = exp(-DefectRate * 1.6 * AreaAMD), so YieldAMD/YieldNVidia = exp(DefectRate * 0.6 * AreaAMD), or something like 2.7x better yields for AMD. That's roughly 1.6 * 1.6, so inline with a "square of area" model, but that's yields as a square of area, not defects as a square of area.

I should underline that it well known that a simple poisson model underestimates actual yields by a big margin.

Yes, you are right, that is what I meant.

On a related topic, I seem to recall that a month or two ago, TSMC had a statement saying they had world class defect rates, and attached a number. I can't find that release anywhere, does anyone here remember it, and have a link?

I think it was .25 defects/square cm, but I am not sure.

-Charlie
 
That said, none of the 'experts' who deny my premise can come up with a better reason.
I believe I gave 6... And they're actually plausible.

If you model it using the assumption that defect rate goes up with the square of die size, things get downright ugly for Fermi if ATI has anything less than amazing yields.
That's one more thing: the 2% yield number. It makes for a spectacular story, but it doesn't say anything about how much dies can be sold as working, lower performance parts. The moment you add redundancy to the mix, the yield story changes dramatically. It also becomes much harder to compare two different chips based just only area.
 
Last edited by a moderator:
I believe I gave 6... And they're actually plausible.


That's one more thing: the 2% yield number. It makes for a spectacular story, but it doesn't say anything about how much dies can be sold as working, lower performance parts. The moment you add redundancy to the mix, the yield story changes dramatically. It also becomes much harder to compare two different chips based just only area.

Why does anyone bother with him. He makes stuff up, admits he doesn't 100% fact check and when presented with plausible facts that could be the root causes for things he either ignores it or dismisses it. His hate for Nvidia is so rooted I dont think he cares for facts so long as he gets hits for his stories.
 
So how is A3 doing? Did the yields magically improve? Did NV work wonders and was able to re-design the chip in a few weeks to match the design recommendations of TSMC?
 
Which pics?

-Charlie
There's usually mounting holes around the NVIO chip visible on the backside of the board (G80 and GT200 SKUs), but looking at the available GF100 board pictures there's no such signs for this one.
On the other hand, looking at the Fermi die-shot, I personally can't spot any array of recognizable display I/O pad assembly... :???:

p.s.:
Now looking once again more carefully, this could be the display output set: click!
This is zoomed in the lower left corner of the die-shot. Those pad structures are really thin for a full display IO -- previously I thought to be a new kind of NVIO phy link interface, but now I'm not sure anymore, looking back to GT215 and G92b shots.
 
Last edited by a moderator:
Back
Top