NVIDIA GF100 & Friends speculation

aaronspink · Feb 21, 2010

rpg.314 said:
My argument against GF100 stems from the fact that even companies with widely acknowledged process advantage like Intel don't build reticle sized chips on cutting edge process. That could very well be affecting yields. GT200 being late only adds to this, though it wasn't on a problematic or new process.

basically two main thinks affect large die designs, defects and variability. Defects naturally go down as a process matures and the process engineers have more time to fine tune the recipes. Variability is also something that improves over time but always has an effect. Basically the smaller the die, the smaller the max variability in transistor and metal characteristics. The larger the die, the greater the variability. For a large die you are always going to suffer more from variability than a small die will just due to the fact of distance from one part to another.

Now where variability becomes an issue is that generally you can to spec the part for the worst part of the die. That means that if one corner of the die is in the SS corner, then you have the spec the voltages and frequency for the whole die based on that SS corner.

Besides the defect issues, it sounds like Nvidia is dealing with a lot of variability issues as well and it is severely effecting their performance.

silent_guy · Feb 21, 2010

rpg.314 said:
As for GDDR5, I think it has been said earlier by neliz too that nv is having trouble designing the memory controllers for gddr5 on 40 nm. The delay in their first gddr5 only adds to it.

Nice example, thank you: Neliz, silicon expert par excelence, also claimed that Nvidia tried to metal spin GT215 to reduce leakage. I didn't realize Nvidia was aiming for a Nobel prize in physics, but what do I know?

Why would designing a GDDR5 memory controller be so much harder than GDDR3? I thought there were GT215 boards on the market with GDDR5 and they have A-something version silicon. Apparently, it didn't seem to require a base spin. Doesn't that make it somewhat dubious that GDDR5 was the main reason? It's not hard to come up with more likely causes: TSMC low 40nm yields and unusually low shader clock speeds come to mind.

GT215 was also the first part with 96 shader cores. It can't be a coincidence that the first part with 96 was late. Nvidia doesn't know how to handle the number 96.

I agree arm chair silicon experts do a lot of "look for correlation, if true then it's probably the cause", but often it comes out right too.

How will you know?

CarstenS · Feb 21, 2010

WRT to variability and the supposedly grown independency of TPCs/SMs/SIMDs/watchmacallits: Would it be a possible solution to have them (their shader-cores respectively) run at independent max frequencies? Since their command streams seem to be quite indepenent after being scheduled and their talking to each other is (almost?) exclusive via L2-Cache, the only problem I can think of is: How to give Marketing some fixed numbers to include in their PDFs.

I know it sounds a bit ridiculous, but why not?

CarstenS · Feb 21, 2010

rpg.314 said:
My argument against GF100 stems from the fact that even companies with widely acknowledged process advantage like Intel don't build reticle sized chips on cutting edge process.

After all we've heard, LRB wasn't up to scratch performance wise. IMO it's a difference if you're building something big that has the chance to be a performance-leader-and-halo-provider at least at launch or if your risk is only for something that has mediocre performance and cannot be sold as a premium part or provides any marketing usable halo for smaller derivatives.

BTW: AFAIH, G80 was originally scheduled instead of G71 for spring/summer 2006 instead of november - could be wrong though.

KimB · Feb 21, 2010

CarstenS said:
WRT to variability and the supposedly grown independency of TPCs/SMs/SIMDs/watchmacallits: Would it be a possible solution to have them (their shader-cores respectively) run at independent max frequencies? Since their command streams seem to be quite indepenent after being scheduled and their talking to each other is (almost?) exclusive via L2-Cache, the only problem I can think of is: How to give Marketing some fixed numbers to include in their PDFs.

I know it sounds a bit ridiculous, but why not?

A more reasonable feature would be to just have the ability to turn off specific shader cores to save power for laptop parts.

Sontin · Feb 21, 2010

rpg.314 said:
GT200 being late only adds to this, though it wasn't on a problematic or new process.

Why was GT200 late? Did you expect them to bring GT200 as their first 65nm product? :?:

mapel110 · Feb 21, 2010

One word to the cherry picked benchmarks from nvidia. These benches might be the best cases, but they should be real and can be verified after the launch. Afair that was always the case in the past with Vendor-Benchmarks. Cherry picked, but real and verifiable.
I think these 5% can only be true, if nvidia lied about their own numbers.

rpg.314 · Feb 21, 2010

aaronspink said:
basically two main thinks affect large die designs, defects and variability. Defects naturally go down as a process matures and the process engineers have more time to fine tune the recipes. Variability is also something that improves over time but always has an effect. Basically the smaller the die, the smaller the max variability in transistor and metal characteristics. The larger the die, the greater the variability. For a large die you are always going to suffer more from variability than a small die will just due to the fact of distance from one part to another.

Now where variability becomes an issue is that generally you can to spec the part for the worst part of the die. That means that if one corner of the die is in the SS corner, then you have the spec the voltages and frequency for the whole die based on that SS corner.

Besides the defect issues, it sounds like Nvidia is dealing with a lot of variability issues as well and it is severely effecting their performance.

Is there a correlation (at least statistically) between defects and variability? Intuitively, it would appear so.

GZ007 · Feb 21, 2010

Anyway for me the 70C idling in 2D with 70% fan speed would be a bigger problem than the performance

.

rpg.314 · Feb 21, 2010

silent_guy said:
Nice example, thank you: Neliz, silicon expert par excelence, also claimed that Nvidia tried to metal spin GT215 to reduce leakage. I didn't realize Nvidia was aiming for a Nobel prize in physics, but what do I know?

The effect of metal spins on leakage of transistors is weak at best, I agree.

Why would designing a GDDR5 memory controller be so much harder than GDDR3? I thought there were GT215 boards on the market with GDDR5 and they have A-something version silicon. Apparently, it didn't seem to require a base spin. Doesn't that make it somewhat dubious that GDDR5 was the main reason? It's not hard to come up with more likely causes: TSMC low 40nm yields and unusually low shader clock speeds come to mind.

I stand corrected then. GDDR5 is unlikely to be a cause of gf100's delay. However, the possibility of (one of them or a combination) large variability, higher defect rates and higher leakage toasting gf100 still stands. We'll know for sure in a month or so.

(BTW: I don't buy the GF100 via story from Charlie either. He's very reliable about tape-out dates, but the moment he steps into anything technical he's a loose cannon who really has no clue. I've given up correcting him, it's pointless.)

Well, one IHV specifically put 2 vias for one every where on their first 40 nm part,and apparently used it all over the place in their rest of 40 nm lineup. NV last year specifically pressed TSMC for less via defects. Seems rather likely to me. Simply because Charlie also said the same thing about another part doesn't make it any less plausible.

trinibwoy · Feb 21, 2010

mapel110 said:
I think these 5% can only be true, if nvidia lied about their own numbers.

Or if they were run at clocks that are currently unattainable in volume production.

mapel110 · Feb 21, 2010

trinibwoy said:
Or if they were run at clocks that are currently unattainable in volume production.

If nvidia had problems with clock rates, would they be so dumb to go to a A2 AND a A3 instead of a B1?! I have my problems to believe that. AND they are planing to release a dual GPU card quite soon. How should that be possible?!

Sontin · Feb 21, 2010

nVidia will sell Tesla cards with 1200Mhz - 1400Mhz. Clocks are not the problem.

BTW:

Q: How did you get so behind schedule on the Fermi? I just saw that it was delayed to 2010. How will you recover from lost sales to AMD/ATi?

Jason Paul, GeForce product manager: On the GF100 schedule—I think Ujesh Desai (our Vice President of Marketing) said it best when he said "designing GPUs is f'ing hard!" J With GF100, we chose to tackle some of the toughest problems of graphics and compute. If we merely doubled up on GT200, we may have shipped earlier, but essential elements for DX11 gaming, like support for scalable tessellation in hardware, would have remained unsolved.

While we all wish GF100 would have been completed earlier, our investment in a new graphics and compute architecture is showing fantastic results, and we're glad that we took the time to do it right so gamers can get a truly great experience.

http://forums.nvidia.com/index.php?showtopic=109093&view=findpost&p=1004893

CarstenS · Feb 21, 2010

rpg.314 said:
Well, one IHV specifically put 2 vias for one every where on their first 40 nm part,and apparently used it all over the place in their rest of 40 nm lineup.

I find that hard to believe, given Anand's reporting of Cypress being trimmed down from ~480ish mm² to 340ish already. If the former is without doubled vias and the latter with them in place, they must have removed half the chip to do so.

aaronspink · Feb 21, 2010

rpg.314 said:
Is there a correlation (at least statistically) between defects and variability? Intuitively, it would appear so.

Generally, not. Defects are generally cause by external factors or statistical error issues involved in complex lithography. variability is generally a result of things like dopants, etch times, etc. Generally defects over time can be reduced to the point where they are minor factors. Process variability always exists and is always an issue. Its the whole reason you have things like binning further down the pipeline.

Given a single wafer, you will have wide variability across the performance of the individual dies based on location of die and to some extent random factors. For example, different thickness of resist, different doping levels, metal thickness, which parts of the wafer hit the acid first, and which parts hit it last, etc. All the results will be within the margins of the process and all the parts will function, they just won't function the same. Some parts of the wafer will be higher leakage, some lower, some will need a higher voltage to hit a given frequency, others won't be able to hit a high frequency but will need a lower voltage to hit a lower frequency.

nagus · Feb 21, 2010

Sontin said:
nVidia will sell Tesla cards with 1200Mhz - 1400Mhz. Clocks are not the problem.

BTW:

http://forums.nvidia.com/index.php?showtopic=109093&view=findpost&p=1004893

Wow, I am convinced now... I will sell me 5870 and wait 3 to 4 Months for a 512sp Fermi

:|

rpg.314 · Feb 21, 2010

I thought via's didn't take much area.

rpg.314 · Feb 21, 2010

Generally, not. Defects are generally cause by external factors or statistical error issues involved in complex lithography. variability is generally a result of things like dopants, etch times, etc. Generally defects over time can be reduced to the point where they are minor factors. Process variability always exists and is always an issue. Its the whole reason you have things like binning further down the pipeline.

Like you said, variability also comes down over time, I was wondering whether it was uncommon for a process to have high defect rate and low variability or vice versa.

Jawed · Feb 21, 2010

silent_guy said:
If even Jawed is starting to spout this kind of nonsense...

GF100 gives the appearance of needing a B refresh to achieve decent performance/yields. Such a refresh (if it happens) makes it 3 to 4 quarters late.

Anyway, I'm not counting chickens till the damned thing has been on the market a while. Demand will be "insane" if it's at all good, so it'll be a while before we know whether NVidia can keep up with demand. Then we'll get a feel for whether it's yielding well.

Of course if the reviewed chips are as bad as Charlie asserts then the case will be closed. I don't believe the "5% on current games" thing.

(I don't think texturing capability is going to kill performance, though being 59% of HD5870's theoretical does cause some qualms - I'm assuming NVidia's managed a monster boost in efficiency there and most games seem to show little dependency on texturing. Also, ROP performance - Z rate specifically - appears to be considerably better in GF100, and current games tend to indicate this is where most pain lies.)

NVidia's architecture, with its hot clock, seems to require custom implementation for those parts of the die at TSMC. Though I'm not sure of the extent of that. That's more difficult than going fully synthesisable is it not?

G94 is the only chip from the last few years that NVidia's apparently delivered "on time". NVidia has also cancelled two chips (GT212 and GT214 - a third if we count G88 which I'm still not sure about). The hot-clock based architecture appears to be making things quite difficult for NVidia. In the same period ATI chips with greater feature increments (D3D10.1, two variations of LDS, GDDR5) and higher performance have shown considerably less susceptibility to delays - with RV740 having the worst problems.

You like to assert there's no causation. Well feel free to provide an argument against the repetitions, rather than hand waving.

silent_guy said:
I would love to hear specific details from Jawed about exactly what would make an architecture unmanufacturable. And how GDDR5 fits in that picture is a similar mystery.

I never said it was unmanufacturable, I said Charlie's theory appears to hold some water, emphasis on "some".

I'll ask again: feel free to explain why NVidia has consistently struggled with chips that aren't feature increments (e.g. GT200b is A3), let alone the feature incrementing chips, in the same period on the same fab's nodes that AMD has executed on, usually in advance of NVidia.

Apart from the difficulties of custom design the other factors I can think of include packaging-related stuff (bump-gate) and NVidia's apparent reticence to be first to a node (or inability). Though NVidia did boast that it would be first to 40nm, I'm not quite sure why. Unless it was an attempt to assuage rumblings that 40nm was going to be a problem and NVidia wanted to keep Wall Street off its back by saying it was ahead of AMD for 40nm.

Jawed

Jawed · Feb 21, 2010

mapel110 said:
If nvidia had problems with clock rates, would they be so dumb to go to a A2 AND a A3 instead of a B1?! I have my problems to believe that.

NVidia's alluded to problems in implementing the distributed setup scheme. Is it possible that metal spins can reduce these problems?

Jawed

NVIDIA GF100 & Friends speculation

aaronspink

silent_guy

CarstenS

Moderator

CarstenS

Moderator

KimB

Sontin

mapel110

rpg.314

GZ007

rpg.314

trinibwoy

Meh

mapel110

Sontin

CarstenS

Moderator

aaronspink

nagus

rpg.314

rpg.314

Jawed

Jawed

Similar threads