NVIDIA GF100 & Friends speculation

There's no problem legitimately using FP10 when it is called for by the dev where the argument comes from is when AMD did the akin to the shader replacement arguments from a few years in their drivers.
Yeah although it's a bit more grey here because DX9 doesn't include that format for the devs (on PC). ATI also claims they aren't doing this for any DX10+ apps (which could explicitly use the format), so it's maybe less reprehensible than it seems.
 
Yeah although it's a bit more grey here because DX9 doesn't include that format for the devs (on PC). ATI also claims they aren't doing this for any DX10+ apps (which could explicitly use the format), so it's maybe less reprehensible than it seems.
I obviously agree this is massively less reprehensible than what NVIDIA did way back then, but this is a very slippery slope: DX9 doesn't support the GeForce FX's FX12 format either. Does that mean NVIDIA was to be allowed (without fuss) to manually replace FP32/FP24 instructions by their FX12 equivalent where they could honestly prove it did not reduce image quality? And if image quality does go down, how are we to know they made a genuine mistake rather than cheating on purpose? Needless to say, that's already a lot more controversial.
 
It just seems to me that if they put these sorts of optimizations in their driver, they should make it optional, something along the lines of a checkbox that says, "Enable further performance improvements that may lower image quality slightly in some titles."

It's not so bad to offer this sort of thing at all. What's bad is doing it by default with no way to disable it (without disabling a whole host of other things as well).
 
It's not so bad to offer this sort of thing at all. What's bad is doing it by default with no way to disable it (without disabling a whole host of other things as well).

Sure and they'll just say "disable Catalyst AI", which is one step, but they probably need another notch in there that includes "optimizations" but only ones that produce functionally the same result. Unless that's what "off" means...
 
Dang, I'm not sure even Charlie thought it was that bad. Amazing looking back in hindsight how foolish Nvidia management looks when they booked all those wafers for a hot lot.

Regards,
SB
 
From what I understood with my less than brilliant english, what Huang talked about had nothing to do with yields, as was suggested here. It was a matter of getting dead silicon back from the fab.
 
From what I understood with my less than brilliant english, what Huang talked about had nothing to do with yields, as was suggested here. It was a matter of getting dead silicon back from the fab.

So they yielded perfectly, but there was a problem with the silicon?

I don't think he said they were dead, just that there was massive interference in the fabric layer that left them useless. That's the point where he blames TSMC for the interference in their fabric layer.
To me, it just sounds like their design yields badly. but if anyone can explain it better, please share.
 
To me, it just sounds like their design yields badly. but if anyone can explain it better, please share.
He explicitly mentions a management issue at the end of the video , (in other words : incompetent management) , they didn't pay attention to the interconnect layer , (both in terms of physics and logic design) , as this is a new process , it needs to be specifically designed for .
 
Well actually Charlie stated very clearly and repeatedly that there were 7 good chips in there according to his sources.

So I guess he needs to seduce some other TSMC janitor next.
 
So they yielded perfectly, but there was a problem with the silicon?
No, there was no mentioning of yield and yes, of course there was a problem, otherwise you wouldn't be soldering A3 silicon onto your cards. But I am sure, you understand much better than you're trying to imply here, working at a major IHV yourself.

I don't think he said they were dead, just that there was massive interference in the fabric layer that left them useless. That's the point where he blames TSMC for the interference in their fabric layer.
To me, it just sounds like their design yields badly. but if anyone can explain it better, please share.
You should really watch the linked video. :)
 
I thought he explained it quite well.

No one took charge of the process engineering to implement the interconnection fabric twixt GPCs and twixt the rest of the chip. With the physical reality of 40nm at TSMC being troublesome and with the interconnection fabric being technically difficult anyway, lack of focus in making sure the fabric was going to work on 40nm became the key issue.

I presume this boils down to huge amounts of noise which meant the fabric could only clock very slowly.
 
i'm surprised that simulations were so far off that yields tanked. those are some shoddy tools. luckily wiring issues are an relatively easy fix. btw, broken in this context is really poor diction.
 
He explicitly mentions a management issue at the end of the video , (in other words : incompetent management) , they didn't pay attention to the interconnect layer , (both in terms of physics and logic design) , as this is a new process , it needs to be specifically designed for .

he specifically said they simulated the interconnect. ideally this should provide an idea of what is going on. they are far more accurate than any hand calculation. basically when they validated that fermi worked through simulation, in reality it didnt.

the only thing that stood out as abnormal to me is the fact that their engineers are specialized. this is a bad idea because a chip is not just an architecture or a circuit or a physical device. it is all of them and they all affect each other. hence you must know what architecture makes a good physical design and vice versa. see the Gajski Kuhn Y chart for more on this.
 
I thought he explained it quite well.

No one took charge of the process engineering to implement the interconnection fabric twixt GPCs and twixt the rest of the chip. With the physical reality of 40nm at TSMC being troublesome and with the interconnection fabric being technically difficult anyway, lack of focus in making sure the fabric was going to work on 40nm became the key issue.

I presume this boils down to huge amounts of noise which meant the fabric could only clock very slowly.

Yeah he certainly doesn't seem to harbor any illusions about what went wrong. I'm still baffled though that they didn't do proofs of concept for something this complicated. I had assumed all designs went through some sort of physical prototyping phase. Is it all just based on simulations and claims made by TSMC?

nv already had 40nm parts way before GF100 came out, that's no excuse at all.

Did you watch the video? The problem wasn't 40nm alone.
 
nv already had 40nm parts way before GF100 came out, that's no excuse at all.

well, are we going to assume that gf100 would have the same problems that another chip on 40nm would? there is more going on than that. it is likely that the interconnect in fermi is not like the one in gt21X and it will not encounter the same problems.
 
I'm still baffled though that they didn't do proofs of concept for something this complicated.
A similar question arises with GDDR5, in theory. Though, in practice, NVidia's first GDDR5 was supposedly planned to be ready for the end of 2008 on its first 40nm chips.

But with those chips ending-up rather late, NVidia didn't get much time to sort out GDDR5.

I don't know how similar or different the GDDR5 issues are in comparison with the fabric issues. Fabric is very wide (as well as being many<->many), whereas GDDR5 channels are fairly well constrained and point-to-point on the ultra-high-speed side - though clocks are very high there.

I had assumed all designs went through some sort of physical prototyping phase. Is it all just based on simulations and claims made by TSMC?
I interpret the fabric in GF100 as an increased-complexity version of what's seen in GT200. GT200 doesn't require clusters to talk to each other, but there is a wide crossbar between the clusters and the ROPs/MCs. GF100 requires triangle data exchange amongst the GPCs, which adds a new dimension of complexity. Not sure what other data is inter-GPC. (The ring bus in ATI was a solution to this kind of everyone speaks to everyone problem...)

Conceptually it is "just wires" as he says. Obviously the tricky bit is the physical environment. And everyone who's using 40nm at TSMC has struggled with the mismatch between the specification and reality.

But as far as I can tell everyone expects there to be a mismatch, for any process. And for it to be worse when it's new.

So then you get into questions of managing the transition to a new process. The ATI guys seem to have a better handle on that, but R600 and 110nm are their lows. And 90nm was some kind of third party problem.
 
Back
Top