NVIDIA Fermi: Architecture discussion

I don't believe the war will be won over the top end battle, but in the performance mid-range battle. Just matching 5970 with Fermi whilst being larger/more expensive to produce for the top end, low-volume SKU is not a viable approach to the whole market. It's especially the case if AMD have a refresh product ready to beat Fermi over the head.

I think that's why Nvidia is looking to at HPC as a new market to skirt that fight, and is gambling on looking further down the line to a future of more programmable GPUs.

Not sure I follow...the chip in the GTX 380 won't be more expensive than the two chips in the HD 5970. Far from it. If it manages to be close enough in performance, NVIDIA can keep the price of the GTX 380 close enough to the HD 5970 and then beating it, with a more cost effective dual GPU part.

Also, what kind of refresh can AMD release that isn't like what the HD 4890 was in regards to the HD 4870 ? It was barely 10% faster in most instances.
Maybe a HD 5950 (two HD 5850 on a PCB), but that's still more expensive than a single GTX 380 chip.
 
I doubt that.


why not you have two chips where the die area is more then Fermi's one die, Ok you still have those 2 dies are going to be cheaper or around the same price as one Fermi die that is true, but you have a less complex PCB for Fermi, less memory cost for Fermi (Fermi doesn't need 2 gb's of memory only 1.5 gb). Fan solution will be less complex.
 
The yield on Cypress is bound to be higher than Fermi, though. You can't simply say that because ATi has to use 2 chips to build the 5970 it is more expensive than a single chip GTX 380. Defect rate increases exponentially with die size, and Fermi is a heck of a lot bigger than Cypress. Sure, ATi won't get twice the number of candidate dies per wafer as NV does, but you'd better believe the workable number of chips will be more than twice as high, even given identical defect rates (highly unlikely).
 
The yield on Cypress is bound to be higher than Fermi, though. You can't simply say that because ATi has to use 2 chips to build the 5970 it is more expensive than a single chip GTX 380. Defect rate increases exponentially with die size, and Fermi is a heck of a lot bigger than Cypress. Sure, ATi won't get twice the number of candidate dies per wafer as NV does, but you'd better believe the workable number of chips will be more than twice as high, even given identical defect rates (highly unlikely).


That is true, but you still have salavage parts on nV's chips too. I don't think its an exponential increase, its not 1:1 thats for sure.
 
The yield on Cypress is bound to be higher than Fermi, though. You can't simply say that because ATi has to use 2 chips to build the 5970 it is more expensive than a single chip GTX 380. Defect rate increases exponentially with die size, and Fermi is a heck of a lot bigger than Cypress. Sure, ATi won't get twice the number of candidate dies per wafer as NV does, but you'd better believe the workable number of chips will be more than twice as high, even given identical defect rates (highly unlikely).

NV will sell a ~10x more fermi chips in the quadro and tesla markets vs cypress, with 10x absolute margins than in gaming market. So it will make them a lot of profit in those segments, even if it loses in gaming.
 
It's endearing to see how everyone believes that metal spins magically increase clock speeds. In some cases from 500 to 750MHz, no less! I would love to know how that works...
 
NV will sell a ~10x more fermi chips in the quadro and tesla markets vs cypress, with 10x absolute margins than in gaming market. So it will make them a lot of profit in those segments, even if it loses in gaming.

10x more *in those markets* or 10x more total volume sales as a result of sales to those markets? The first I can believe, the second I highly doubt.
 
You'd think that after 3 previous chips in the same technology, the core IP has been debugged.

Wrt clock speeds, the one thing that can be fixed with metal is noise. But it's not such a big deal as people seem think it is. This is obvious if you go though a regular place and route iteration: noise checks are usually not enabled because it just takes too long to calculate. Only when the chip timing starts to stabilize, you enable them once in a while (say, once per week) and fix things. The first time you do this, you'll see some paths slow down by 5%, maybe 10% if you're unlucky. Fixing this is a matter of buffering up nets or rerouting a few. The paths that are susceptible to noise are often not in the non-noise critical path: because they are non-critical, the gates are not buffered up to the max and thus can fall victim to noise aggressors. That's why buffering up is often an easy solution that doesn't negatively impact critical paths.

If things go bad, a few paths may fall through the cracks, but this is a case of human judgement: where you put the bar for noise fixing. The tools are pessimistic and based on their own models. Are you going fix violations of 20ps? Probably not. 200ps? Always.

Anyway: a long story to say that it improving from 2ns to 1.3ns (IOW being off by 700ps!!!) is just not going to happen.

My personal experience is that first revision silicon pretty much always performs faster than expected speed but is unstable in the lab. Not because of the silicon itself but because the whole setup is immature: power supply may be cranky, PCBs noisy, PLL's programmed incorrectly, quick release sockets unreliable etc. (And that's on chips that consume two orders of magnitude less than a GPU.) So, yeah, you run the whole thing at a lower speed until those issues are worked out.
 
Last edited by a moderator:
What if the low clocks are due to a thermal/power constraint from leakage, rather than some inherent inability to scale clocks?
 
why not you have two chips where the die area is more then Fermi's one die, Ok you still have those 2 dies are going to be cheaper or around the same price as one Fermi die that is true, but you have a less complex PCB for Fermi, less memory cost for Fermi (Fermi doesn't need 2 gb's of memory only 1.5 gb). Fan solution will be less complex.

I tried to estimate the cost of a Fermi GPU compared with a Cypress GPU based on the following:

- 40nm process with the same defect rate for both designs (not correct but best guess)
- 334mm² for Cypress, 530mm² for Fermi
- defect rate of 0,001 per mm² (also pur guess), defects evenly spaced, each defect means to scrap a GPU (not correct also, but well its a guess).


=> with 300mm wafer and 334mm² AMD would be able to produce 211 GPUs per Wafer. With ~70 defects per Wafer AMD would have 140 working GPUs per Wafer.

=> with 300mm wafer and 530mm² Nvidia would be able to produce 133 GPUs per Wafer. With ~70 defects per Wafer Nvidia would have 63 working Fermi GPUs per Wafer.

Therefore, with the same defect rate a Fermi GPU would be roughly 2,22 time more expensive than a Cypress GPU.

Well, clearly all the above is a first estimate guess, noting more.
 
Back
Top