NVIDIA Fermi: Architecture discussion

This is why you should stick to reporting tape-out dates: all they require are a cosy relationship with a back-room fab worker in Taiwan. When you fantasize about technical issues that are above your level of understanding, you have this strange tendency to run with the most unlikely (most sensational?) story line.

You obviously can't come up with a single example of how bullying your fab partner into violating a particular process rule when doing a shrink could result in some kind of advantage (at the cost of more risk.) Here's the funny part: neither can I.

Yes I can. NV thought they could do it better, and TSMC disagreed. TSMC let them do it their own way if they shouldered the risk, and they did. TSMC won.

You are dealing with egos and management fed by a circle of yes-men, not necessarily rational decisions.

-Charlie
 
Yes I can. NV thought they could do it better, and TSMC disagreed. TSMC let them do it their own way if they shouldered the risk, and they did. TSMC won.

You are dealing with egos and management fed by a circle of yes-men, not necessarily rational decisions.

-Charlie

Ah no, even in software design, design processes aren't that flexible, even in a client driven market for software, specifically because the consequences of not following design rules causes cost over runs and poor products. Just as an example, when they aren't followed unless very very lucky, you can expect a two fold in increase in cost. In an engineering environment those two areas would be substantially higher. TSMC and nV wouldn't be stupid enough to do something like that.
 
Ah no, even in software design, design processes aren't that flexible, even in a client driven market for software, specifically because the consequences of not following design rules causes cost over runs and poor products. Just as an example, when they aren't followed unless very very lucky, you can expect a two fold in increase in cost. In an engineering environment those two areas would be substantially higher. TSMC and nV wouldn't be stupid enough to do something like that.
Isn't that kind of the situation nV is in right now?
 
Isn't that kind of the situation nV is in right now?


no not at all. And its not just nV's problem it will also be TSMC's as well. You would have to have stupidity on both sides of the fence for something like what Charlie is suggesting. There is absolutely no logical explination to skip design steps unless it was carifully looked into and is something that would give a time advantage vs. a small increase in cost in the short term, but long term benefits are substantial.

Charlie you are painting a picture where they never looked into risk managment and mitigation because of steps not being followed and what the reprocutions of those risks are, if this was a choice made, its just stupidity, no one in thier right mind would actually do something like this unless they look into what I have just said. Because we aren't talking about a hundred grand or something like that here, its more like hundreds of millions and billions of dollars on both sides.
 
Last edited by a moderator:
i know that , but i mean more about management descions that go against technical recommendations jsut so some sort of short term milestone can be reached.


Not really again, the problems weren't fully solved till they went to a lower process node on the processor, they had no choice, either that or come out well after the PS3, which would have been very bad for MS.

http://en.wikipedia.org/wiki/Xbox_360_technical_problems

pretty acurrate account of what happened.

The article also revealed that representatives of the three largest Xbox 360 resellers in the world (EB Games, Gamestop and Best Buy) claimed that the failure rate of the Xbox 360 was between 30% and 33%, and that Micromart, the largest repair shop in the United Kingdom, stopped repairing Xbox 360s because it was unable to fully repair the defective systems. Because of the nature of the problem, Micromart could only make temporary repairs, which led to many of the "repaired" systems failing again after a few weeks. At that time Micromart was receiving 2,500 defective consoles per day from the U.K. alone.[30]
 
Not really again, the problems weren't fully solved till they went to a lower process node on the processor, they had no choice, either that or come out well after the PS3, which would have been very bad for MS.

the management mistake was the decision that 'we need a slimmer system than the gigantic xbox1 because that will look cooler' precluding the implementation of adequate cooling.
 
Has Fermi shrunk from here

Nvidia%20Fermi1.jpg


to here

nvda_geforce_fermi_sli_2.jpg
 
Yes I can. NV thought they could do it better, and TSMC disagreed. TSMC let them do it their own way if they shouldered the risk, and they did. TSMC won.
Before you intervened to say that those alleged rule violations where about the 65 to 55nm, there was a very small possibility that things were not done as they should have been for 40nm. After all, it's a new process and a lot of new things have to be designed with a lot of uncertainty at the beginnen.

But the fact that it was about 55nm closed the door on that. And even more so the fact that it was a shrink of a chip that actually went into production. That's because a shrink is laid out in a 65nm flow and shrunk after the fact.

You are dealing with egos and management fed by a circle of yes-men, not necessarily rational decisions.
Can we at least give them the benefit of the doubt that they're not bat shit insane? That *if* they decide to change process rules, it's at least because they hope to get some benefit from it? Such as:
- accelerate schedule: NO. On the contrary. Changing process rules requires you to redesign your standard cells or your memory generators. It would take many, many man years to do that.
- increase speed/performance: NO. You don't need to violate process rules to trade off increased speed or power consumption at the cost of yield. It's a simple matter of asking the fab to skew the process. (If you want to know: you ask the fab to increase to decrease the N or P doping level.) No need to violate anything. In fact, this is exactly what a fab does when you order corner lots for prototype characterization.
- reduce area: NO. The potential additional area gain over a shrink is too small. And you run into the same objection as schedule because this too would require redesigning your full library.

So, please do give me a plausible reason why an irrational egomaniac executive would order his minions to violate process rules for a shrink because I can't come up with any.

An irrational egomaniac executive would ask his yes-men to cut corners in established procedures to accelerate schedules. (e.g. tape-out when certain blocks are known not to be completely verified, reduce hold margins in the fast timing corner, waive inconvenient noise numbers, etc.) But violate process rules for no benefit? No way.

It's really mind boggling that you're making up these stories without bothering to check if they actually make technical sense.
 
If anyone brings up pixel counting to measure the board, I swear something bad is going to happen.
 
How did it scale badly? By the same token you could say that RV530 scaled badly from R580 compared to G73 scaling down from G71.

For the most part they scaled as expected. The starting points were just too disparate. Similarly, GT280 was huge compared to RV770 with minimal performance gain, so similarly sized derivatives based on these achitectures were not expected to be close in performance.


I think compared to the G80/90 family GT200/GT21X are really not scaling too well. The tiny GT21X imho just show, that G200 was not one of NV´s best designs lately.

Comparing them to another vendors silicone is pointless, if you want to draw conclusions on how there line of thought went and where it might have gone in regard to fermi and what we can expect of the Fermi line. I would shy away from drawing much conclusions from G21X towards the Fermi line, well apart from the fact that they also had a rough start with 40nm at TSMC, but who hasn´t?
 
silent_guy, charlie:

It always seemed very likely to me that what Charlie's source meant wasn't design rule violations, but simply not following as many DFM suggestions as TSMC thought would be desirable. It's easy to see how this kind of thing can get lost in rumour mill intermediaries (or even more simply in translation...) - given what TSMC claims about 55nm, I cannot imagine that the design rules are different at all for digital logic, but I could imagine that they added some DFM rules and NVIDIA decided not to bother. If this is correct though, it would hardly be as big a deal as what Charlie makes out of it.

Regarding 40nm, neliz just made the strange comment that:
neliz said:
NV skimped on the 40nm design rules and that's causing huge leakage everywhere on the die, but especially the MC.
I'll ignore the first part (it really doesn't say much), but the MC part is intriguing. It doesn't really make sense if you take it literally, but is it just me or did he/his source probably rather meant the memory PHY rather than the digital MC? Neliz, any idea?

There have been tidbits implying analogue & I/O are actually the most problematic parts on 40nm because of how greater variability affects them (one example: http://danielnenni.com/2009/08/05/tsmc-40nm-yield-explained/). Given that AMD had more impressive PHY results on 65/55nm than NVIDIA (RV770 PHY being barely bigger than NV's GDDR3-only PHYs for example), we could very naively assume that maybe NVIDIA's team isn't as talented (I obviously mean no disrespect) and that they're responsible for the GT215 delays (their only GDDR5 chip pre-Fermi) for example.

It's certainly much easier for me to believe that a company might have screwed up an analogue part which still requires a lot of skill just to make it work at all and that now needs a very deep understanding of variability effects to make it work well. Whether that's actually what happened, I don't know...

I guess this could also have something to do with Charlie's claims wrt the 55nm problems although I'm very skeptical; to be honest I still don't understand how TSMC's claim that 65nm I/O could be shrunk linearly to 55nm makes technical sense so I'm hardly qualified to say whether there might be a problem/potential shortcut there.
 
Regarding 40nm, neliz just made the strange comment that:
I'll ignore the first part (it really doesn't say much), but the MC part is intriguing. It doesn't really make sense if you take it literally, but is it just me or did he/his source probably rather meant the memory PHY rather than the digital MC? Neliz, any idea?

It's certainly much easier for me to believe that a company might have screwed up an analogue part which still requires a lot of skill just to make it work at all and that now needs a very deep understanding of variability effects to make it work well. Whether that's actually what happened, I don't know...

I guess this could also have something to do with Charlie's claims wrt the 55nm problems although I'm very skeptical; to be honest I still don't understand how TSMC's claim that 65nm I/O could be shrunk linearly to 55nm makes technical sense so I'm hardly qualified to say whether there might be a problem/potential shortcut there.

This was back in July, before the first tape-out of GT300. Talking about why GT216 and GT218 hadn't shown up yet despite that the chips were coming out of TSMC since Jan/Feb.
Only things that came out after that were related to power distribution, all inherent to the design of GT2/21/3.

I'm quite sure that NV's engineers spend a lot of time on the subject, the A2 revision of the 216 wasn't supposed to be the last, They have tried another spin but couldn't get the issues out and after that decided to go ahead with the A2. 215/6/8 would've been a lot more competitive on speeds if it didn't have these "issues". And it would certainly be here a lot quicker.
 
GeForce 360 rumored specs are right up my own speculation, although I doubt that this is just a full chip with units disabled.

Frankly, I can't see Nvidia making a 384-SP chip, because that would mean too many distinct DX11 dies:

GF100 with 512SPs, one with 384SPs, one with ~256SPs, one with ~128SPs, one with ~64, one with ~32, and probably one with ~16 to replace the GeForce 310. Hopefully they won't replace the 8-SP GeForce 205.

I guess they could just skip the 32-SP part, but still...
 
Back
Top