NVIDIA shows signs ... [2008 - 2017]

Status
Not open for further replies.
Does the agreement NUFI and NVidia signed say NUFI has a right to participate in negotiations? If it doesn't then I don't see why they are upset over it. Obviously the reason they are in court is b/c NUFI thinks it does and Nvidia thinks it doesn't. NUFI is also saying the policy doesn't cover it and Nvidia is saying it does. Once again that is why they are in court.

Which was part of everything I was trying to explain. Yes it is their right (and obligation to their customers) to make sure they are party to any financial agreements that they are obliged to pay. It's one of the most basic principles of trying to avoid fraud.

As others have pointed out (better than I did obviously), the insurance company has a right and obligation to make sure that the money being paid out isn't too much or too little.

Pay out too much and that increases the financial burden on all of your customers. Pay out to little and you open yourself up to lawsuits and bad publicity.

Now this isn't to say that Nvidia is trying to defraud the insurance company. However, by denying them access, it prevents the company from determining whether compensation is fair and whether or not it thinks Nvidia is trying to defraud them.

Regards,
SB
 
Charlie I don't read your stuff much I was discussing the TG daily piece as it was coherent. Specifically whether NUFI has a right to be part of negotiations or not.

As I stated, the policies IN FULL are attached to the suit as Exhibits 1 and 2. I have them. I read them. Fully. It was about as exciting as reading 58 + 39 pages of insurance verbiage could be. Is that so hard to comprehend?

All the things you wonder about are contained in the filings. It is pretty obvious you didn't look for them, don't have them, and didn't do much other than skim a couple of articles. Fair enough.

The problem comes in two parts, when you complain about not being able to comment unless you read the filings, which Mike Magee pretty obviously did, is the first. The second is when people say they have read it, and it does say what they claim, and you still speculatively wonder.

Should you want them, write me and I'll send you all three relevant docs, they are a few meg each, so I will send them separately unless you state otherwise.

As far as your piece goes, with a wee bit less self aggrandizement and a bit of clearer writing it might be readable. One tip is to avoid a ridiculous amount of hyperlinks in the text.

Yeah, but once again, you (and to be fair, most others) aren't doing the barest minimum of research before you make statements that contradict the facts in the field. Then again, at least you are not doing the "I have an 8600GT, and it has never broken, so no NV card can ever go bad." thing. :)

I put the links in there so people can follow the story, if I didn't, the morons out there would say "that never happened, you teh sux0rz" again and again.

From the links in the article (I might be adding a few too, I got them from another article).

When I stated G92 and G94s are bad -
http://www.theinquirer.net/inquirer/news/1038400/nvidia-g92s-g94-reportedly
The link in it that matters is locked out unless you have a digitimes subscription, but it was found here:
http://www.digitimes.com/NewRegister/join.asp?view=Article&DATEPUBLISH=2008/07/25&PAGES=PD&SEQ=206
and it was titled. "Channel vendors demand card makers recall faulty Nvidia products". The only quote I have from it is, "Due to Nvidia did not clearly explaining the details of the faults reported in its notebook GPUs, some channel vendors have demanded graphics card makers issue a recall for desktop-based discrete graphics cards using the same GPU core, according to sources at graphics card makers."

To answer your next questions preemtively, yes I did contact multiple Taiwanese GPU vendors, yes they did all confirm very high failure rates in G92 and G94, and they backed up the story. The NUFI suit also mentions several vendors from Taiwan, and given NV's lack of forthrightness, it would not be much of a stretch to believe that the bad parts list will grow if/when they come clean.

Also, those AIBs directly named multiple desktop G92 and G94s to me. I won't (note, didn't say can't) prove that statement though. :)

More importantly is Apple. I posted two stories on this,
http://www.theinquirer.net/inquirer/news/1047022/apple-notebooks-defective
That is the first, and to quote the main link here:
http://support.apple.com/kb/TS2377
"In July 2008, NVIDIA publicly acknowledged a higher than normal failure rate for some of their graphics processors due to a packaging defect. At that same time, NVIDIA assured Apple that Mac computers with these graphics processors were not affected. However, after an Apple-led investigation, Apple has determined that some MacBook Pro computers with the NVIDIA GeForce 8600M GT graphics processor may be affected."

Can you feel the love between those two companies? Have you ever seen a more direct "they lied to us!" in public? I sure haven't. No, this is not G92/94, but it sets the stage.

The second link is this:
http://www.theinquirer.net/inquirer/news/1010836/apple-knowledgebase-g92s
No links, it is the internal Apple repair KB. Several of the parts listed there are towers. They directly name G92s "for known Nvidia bump crack issue". Hmmm, yeah, that is narrowing it down a bit, don't you think?

The next RADAR article names it in an MXM, so I assume that is mobile, but both KB articles directly name G92.

How much more do you need? Honestly.

I put in the hyperlinks so people can do the barest minimum research with the least effort. You didn't seem to go that far, but you can ponder questions answered by them.

Ctrl +F for exhibit turns up nothing except in the comments. You have 15 links to your own site in your article that is just repeating what Tgdaily turned up anyway. If you want people to wade through the morass you create then provide the links to the NUFI suit in the beginning. Or if you refuse to document any sources except yourself over and over don't expect people to take you too seriously. Thanks for wandering by it was enlightening in some respects.

I did document sources other than myself, you just didn't follow the links. As for the suit, it was linked in the article for a bit, but the Inq engine blows hard, and is run by barely competent people who seem not to have a clue. Shortly after going up, it was pulled down, but I was out of town and couldn't do much from where I was.

Anyone who asked has gotten a copy mailed to them. I tried to put that into the story, but the admins were on walkabout at the time, and haven't bothered since.

You can also get it on your own from PACER, it is a public service, but they do charge you minimally for replication. I think I paid $7.20 for all three docs.

Now, how is this not documenting? All I can see is you not reading.

-Charlie
 
Removed the last 3 posts since they're just flamebait and/or OT.

Anyway Charlie, FWIW, I think you've done a very good job at analyzing this issue (despite my relatively low opinion of your leaks in general, which you could argue is more a factor of your sources than yourself - also, for the love of god, please stop assuming A0 is the first spin for NV/ATI. Unlike Intel, I promise you it's actually A11) and that you're probably right on most things. As your post above highlights, if it does turn out you're wrong on a few things it certainly won't be because of lack of documentation.

One thing I *honestly* wonder though is whether the failure rate is so high that it makes sense for the OEMs/AIBs to want NV to do much more (especially earlier on when many of the worst offender chips were still on the market)? Remember the OEM/AIB's reputation is hurt a bit too, and nobody likes having their supply chain disrupted too much.

The other thing I've never seen you addressing and I'm curious about (but maybe I missed it) is Hara's claim in a few CCs that the problem is in a specific temperature range, presumably both staying in it and getting in/out of it (i.e. thermal cycling to/out of that specific range). Some of your articles clearly highlight that the entire solution is very fragile, but isn't it still possible that the absurdly high failure rates are mostly caused by that range? In other words, if you never got there, your failure rate would be (ridiculous made up numbers) 5% versus 2% normally, but if you do regularly go in that range it jumps to 25%. Wouldn't that at least seem plausible?

And wouldn't that potentially make the BIOS/driver/packaging pseudo-fixes potentially more effective? I'm not saying they are, but I'll admit to have some difficulty believing a company the size of NV couldn't analyze of problem of this nature relatively efficiently. As you implied many times, there probably weren't any perfect fixes here because some of these decisions must be made earlier in the design phase - but that doesn't mean the hackish fixes are that bad. I really don't know, it's just that as I said it is the one point I'm the most uncertain about - unlike the fact that early G92s are failing, which is why I moved my passive(!) G92 to a PC where it'll do as little thermal cycling as possible. Ah well...
 
But if they made the fan run faster to keep the card cooler, wouldn't that kill one of their selling points I see mentioned so often in threads, not as noisy as ATI.
 
Anecdotal evidence but so far I've only seen one G92 failure in the field and no G94 failures out of perhaps 20 cards. Not a big enough set to be conclusive but certainly not irrelevant.
 
Anecdotal evidence but so far I've only seen one G92 failure in the field and no G94 failures out of perhaps 20 cards. Not a big enough set to be conclusive but certainly not irrelevant.

Well apparantely internal Apple knowledge base articles specificly name G92 parts as being prone to this type of failure. Whether it's large scale or small scale is hard to tell, but it was worrysome enough for technical support that they created some KB's for it.

And I'd say Apple has a fairly large sample size.

Likewise, there's claims that the AIBs themselves have noted higher than normal failures for G92/94. Unfortunately, there's no real way to verify that without having an inside source. AIBs would be hesitant to bite the hand that feeds them. Nvidia haven't been shy in the past with reducing chip allocation to AIBs that don't toe the line.

I'm sure most of this stuff will get aired in court if Nvidia doesn't choose to settle out of court.

Regards,
SB
 
ISTR some info coming out a short while ago WRT RMA rates for various graphics cards but can't recall whether G92/G94 were included in the list. Anyone else remember that and know where to find it? I believe it was on these very forums.
 
ISTR some info coming out a short while ago WRT RMA rates for various graphics cards but can't recall whether G92/G94 were included in the list. Anyone else remember that and know where to find it? I believe it was on these very forums.

It was, but it was low signal-to-noise as I recall, from some (french?) european high-street store and the sample size was fairly small.
 
Anyway Charlie, FWIW, I think you've done a very good job at analyzing this issue (despite my relatively low opinion of your leaks in general, which you could argue is more a factor of your sources than yourself - also, for the love of god, please stop assuming A0 is the first spin for NV/ATI. Unlike Intel, I promise you it's actually A11) and that you're probably right on most things. As your post above highlights, if it does turn out you're wrong on a few things it certainly won't be because of lack of documentation.

OK, well, my guys call it A0....Ax, for minor, B0.....Bx for the next major step. All the NV docs I have seen have the steps listed as [letter][single digit number], not two digit number as you suggest. First may be A11, followed by A1, but who really cares about semantics on that level. If NV wants to change names on a whim, why don't they just call the first step Ion GTS250?

One thing I *honestly* wonder though is whether the failure rate is so high that it makes sense for the OEMs/AIBs to want NV to do much more (especially earlier on when many of the worst offender chips were still on the market)? Remember the OEM/AIB's reputation is hurt a bit too, and nobody likes having their supply chain disrupted too much.

I was specifically told that NV was paying HP $150 per failure average, and that was from NV IR directly to an stock buyer. RegFD issues aside, I will take that number as being fairly real, I have heard it echoed from other sources. I also hear that the percentage of payments given out goes Dell, HP, others, with the little guys getting shut out totally. NV at least appears to be stiffing everyone they can get away with.

The OEMs are pissed off in a way that you will understand shortly. I can't say how without scooping a few articles I am researching now, but things look VERY grim for NV later this year, and next. NV screwed their customers, both OEM and end users. It is payback time.

Go read the Apple public KB article I linked, they say, in no uncertain terms, that NV lied to them. Other OEMs say the same thing, just not publicly.

The other thing I've never seen you addressing and I'm curious about (but maybe I missed it) is Hara's claim in a few CCs that the problem is in a specific temperature range, presumably both staying in it and getting in/out of it (i.e. thermal cycling to/out of that specific range). Some of your articles clearly highlight that the entire solution is very fragile, but isn't it still possible that the absurdly high failure rates are mostly caused by that range? In other words, if you never got there, your failure rate would be (ridiculous made up numbers) 5% versus 2% normally, but if you do regularly go in that range it jumps to 25%. Wouldn't that at least seem plausible?

Yeah, I had a LONG chat with Hara about that, and the summary of NV's claim is that what is going on is a new form of failure that they don't understand, and science hasn't caught up with yet. I was told this LONG before it cropped up in a CC, and I laughed at him when he said it.

I laughed because I talked to 4 or 5 packaging experts, and they ALL told me what was happening, some with micrographs to illustrate. They all said the exact same thing, and it totally lined up with what I researched for my articles, and what the guys in the teardown shop found as well.

To date, Nvidia has not had any response to these:
http://www.theinquirer.net/inquirer/news/1004378/why-nvidia-chips-defective
http://www.theinquirer.net/inquirer/news/1013947/nvidia-should-defective-chips
http://www.theinquirer.net/inquirer/news/1036374/nv-should
They just say "we don't get it, it is beyond science". Everyone else says that it is a fairly simple and well understood problem. Intel caught it BEFORE it was a problem. MS didn't listen to ATI and found out about the problem the hard way. ATI caught it long before it bit them. Every other silicon house did too.

Then again, NV has several lawsuits aimed at them over this. If they admit it, they will likely sink the company. Really, it is that bad.

The failures are caused by heat cycling, and no one I have EVER talked to has given me one iota of evidence of how Hara's claim can be right. Since NV can't come up with the science either, I will call bullshit until they do.

Lowering the temps lessens the strain on the parts, but does not eliminate it. It does lengthen the time to failure, but that isn't a fix. I have said this many times as well.

And wouldn't that potentially make the BIOS/driver/packaging pseudo-fixes potentially more effective? I'm not saying they are, but I'll admit to have some difficulty believing a company the size of NV couldn't analyze of problem of this nature relatively efficiently. As you implied many times, there probably weren't any perfect fixes here because some of these decisions must be made earlier in the design phase - but that doesn't mean the hackish fixes are that bad. I really don't know, it's just that as I said it is the one point I'm the most uncertain about - unlike the fact that early G92s are failing, which is why I moved my passive(!) G92 to a PC where it'll do as little thermal cycling as possible. Ah well...

There were perfect fixes, it just seems NV didn't want to bite the bullet and do them because it would cost money and down time. They were selling defective parts to Apple until September if Apple is to be believed. NV is screwing everyone to protect it's pocketbook, and that is a short term protection. OEMs do see this kind of misbehavior.

The fixes are a joke, do you want your laptop fan on 24/7? How much battery does that suck? How much noise does that make? NV is obligated to do a recall, but they won't. Instead of doing it, they put out a BIOS patch that actively and retroactively hurts the end user. You are having capabilities taken away from you after purchase because NV won't fix their problems.

Why is this good, ethical, or even acceptable? It sure as hell isn't to me, no matter how they spin it.

-Charlie
 
If all this pans out as Charlie claims (and so far it seems that way) then for a short term damage control NVIDIA have made long term enemies of the very people that feed it. This could put AMD/ATI in an enviable position of being the only real company to turn to for fast discrete graphics especially in the notebook arena.

Only isse at this time is that AMD/ATI do not have much of a market in for Intel based notebooks (talking about the discrete graphics side here).
 
Thermal power cycling failure depends on the difference in thermal expansion among soldering materials and substrate and packaging materials, as well as the temperature delta (higher temperature reached in operation minus lower operation temperature), thickness in the solder, dimensions of the interface (lenght, width) elasticity of the solder and there is also a correlation with the average temperature reached in operation compared to room temperature. So it's correct that if a chip operates (like in a notebook) with higher delta it will fail sooner than later. This happens however for all chips, the difference is only in the number of cycles. Now, I wonder that to have returns from the field so soon the number of cycles reached in the accelerated thermal stress cycling (you cycle the parts for less time with higher temperature deltas) should have been really low. I mean, normally you see things like this in the prototypes.
So it could be that or time to market did not allow to correct the problem before starting the production, or the problem was not identified or something went wrong on the production phase (maybe the high lead material has more problems in controlling the thickness) of some batches.
 
The fixes are a joke, do you want your laptop fan on 24/7? How much battery does that suck? How much noise does that make? NV is obligated to do a recall, but they won't. Instead of doing it, they put out a BIOS patch that actively and retroactively hurts the end user.

I don't see how these fixes are Nvidia's solution to the problem. It's up to Dell and HP to do right by their laptop customers, not Nvidia. Nvidia should just make sure they're doing whatever they can to ease the burden on their own customers, the OEMs. And if the average failure compensation is indeed 150$, that seems a fair bit more than the actual purchase price of a G84/G86.

And I really do wish you would get off this singleminded focus on Nvidia as if they somehow stand out as being particularly ruthless in their corporate evil. There hardly a chip company out there that hasn't made seriously problematic products at one time or another and tried to make the best of the situation.

Other than that, I've yet to see data that effectively shows that models other than G84 and G86 (in laptops) are showing really high failure rates in the field.
 
I don't see how these fixes are Nvidia's solution to the problem. It's up to Dell and HP to do right by their laptop customers, not Nvidia. Nvidia should just make sure they're doing whatever they can to ease the burden on their own customers, the OEMs. And if the average failure compensation is indeed 150$, that seems a fair bit more than the actual purchase price of a G84/G86.

And I really do wish you would get off this singleminded focus on Nvidia as if they somehow stand out as being particularly ruthless in their corporate evil. There hardly a chip company out there that hasn't made seriously problematic products at one time or another and tried to make the best of the situation.

Other than that, I've yet to see data that effectively shows that models other than G84 and G86 (in laptops) are showing really high failure rates in the field.

I think your whole mindset about 'focus on your customers', ie OEMS is bogus.

You care about end-users as well, since it is end users that buy notebooks.

I also agree that there are lots of chip companies that have produced defective products and had to deal with resulting issues. For example, the FDIV bug.

Intel denied it was an issue for a while, since most people didn't use FP. Some OEMs were fine with that, although apparently IBM was sitting on a ton of 386 and 486 chips (some they bought, some they made internally) and squawked a lot in public, since they thought it would help them sell those 486's. Then Intel finally realize how big a mess it was and offered to replace all the Pentiums in the field with fully functional ones.

It was a huge PR success for them and the Intel brand, albeit a costly one (it was at least $150M, probably more like half a billion).

Let's look at another example, the 180nm Itanium speedpath bug. McKinley was 1GHz part, but after several months of shipments an OEM found a bug that could only be fixed by lowering the frequency to 800MHz.

What did Intel do? They gave free 1GHz parts (made at 130nm, not 180nm) to anyone with a defective McKinley. That was probably cheap, since not many bought McKinley.

What about Barcelona's TLB bug? I don't think that was handled all that well, but they at least had the decency to delay most shipments until they had a fix in place.

Anyway, the moral of the story is that in the internet era, you can't cover up stuff like this, the public will find out. And you should just address the problem head on, instead of trying to deny it's a problem, blaming other people, etc.

DK
 
Status
Not open for further replies.
Back
Top