AMD: R8xx Speculation

How soon will Nvidia respond with GT300 to upcoming ATI-RV870 lineup GPUs

  • Within 1 or 2 weeks

    Votes: 1 0.6%
  • Within a month

    Votes: 5 3.2%
  • Within couple months

    Votes: 28 18.1%
  • Very late this year

    Votes: 52 33.5%
  • Not until next year

    Votes: 69 44.5%

  • Total voters
    155
  • Poll closed .
http://www.semiaccurate.com/2009/07/23/how-peek-chip-guts-without-killing-them/

Please explain the third picture, that is a G200 with lots of missing TIM under the lid. Does this mean NV is incompetent? That would be a yes, but I do look forward to hearing your opinion.



OK, so if all these AIBs are incompetent, why are they only incompetent making NV laptops? Why don't ATI parts suffer from the same catastrophic failure rates as NV ones do, your rather biased, if real, sub-sub-sub-sample not withstanding.

The part that you don't understand is that the half-connected parts are indeed running in spec. As long as they maintain a temperature as specified by NV, they are in spec. The chips maintain that temp.

Look at the graph here:
http://www.theinquirer.net/inquirer/news/1013947/nvidia-should-defective-chips
Look at the recommended temps for the GPUs that you are talking about. See a problem? The GPUs are failing within the recommended temperature range, overheating has nothing to do with it. THIS IS NOT AN OVERHEATING PROBLEM, it is an incompetent engineering problem at the chip level, not the laptop level.

Any heatsink attach problem is not an issue here.



It doesn't matter. If you do believe it matters, then you really need to answer the question of why 10+ OEMs failed to engineer proper cooling solutions only on their NV based models. In fact, on some, where there was a choice via MCM modules, only the NV parts had 'faulty thermals', and only they failed. Hmmm.....



No, they were not, and if you think they are, you are an idiot. There is _NO_ thermal solution specified in the design guides, only temps to remain in. If it is what you say it is, why did Nvidia design such a crappy thermal solution.

You need to keep a GPU in a certain temp range, and the OEMs did. It doesn't matter how they do it, with an aluminum slug, a vapor chamber, or fairy dust, the temp matters, and only the temp matters.



Yeah, actually I do think they treat it that way, and far worse. Having just spent a week in Taipei discussing thermal solutions for laptops at a conference, I do actually know how they test, and have been to several of their labs. You are dead wrong here.

So, to conclude, you seem to not want to answer the question about why only NV laptops have badly designed and connected HSFs. Why is it?

-Charlie

What does the GT200 chip have to do with the g86? Are they failing in droves? And I would like to know how n the hell a G86 in a laptop, non-MXM version, isn't going to go beyond allowable temps if the HS/P assembly IS NOT making complete contact with the GPU. Take a desktop g86 based card, lift the HSF to one side say 1/32nd of an inch and see how fricken long it lasts before poof. And why Aren't the G86 desktop GPUs failing in drves like the laptop version, they are the same chip are they not? So again cooling plays a role in this mess. Most of the blame belongs on Nvidia, I AM NOT FUCKING DENYING THAT, but the OEMs are not freakin blameless.
 
Still does not explain why several manufacturers had the problem.
Still does not explain why ATI chips did not have the same problem.

Or the fact they have allocated hundreds of millions to service these parts, or the insurance snafu, on and on. All he has is anecdotal evidence that ignores the pretty condemning picture: OEMs following standard tolerances and expected production deviations had *massive* issues with this specific series of chips that are unknown with the same deviations with other NV chips and competitor chips.

They are bad chips. Anyone disagreeing has an agenda or is "branded."

NV knew this and chose to continue selling them. They harmed OEMs and more importantly harmed consumers (e.g. driver updates that throttle performance and wind the fan up significantly to diminish heat fluctuations at the cost of performance and noise with the explicit intent not to save the chip but to get it to survive outside of the warranty period so said dealers don't have to guarantee the product). What OEMs are complicit in is the fact after knowing the mass failures chose to go hush hush and clear their stocks instead of ceasing selling the product. OEMs knew for about 6 months there was an issue before even making a concession there was an issue and months more before working out "extending" warranties. Of course now that all the cards are on the table I expect to see more of these.
 
I really doubt this. If the HSF was underspecced, you would get thermal runaway/overheating. In this case, you are not getting that, the failures occur at normal temps.

-Charlie

I've yet to see a laptop with an G86 idle and load temps come anywhere within 10c of the eVGA card I have. Under load it idles around 40c, under load close to 60c. Everylatop I've seen still working with G86s, are idle 55-60c and load temp near 90-100c.
 
Those are normal temps of the era, even normal today in new Griffin + GPU laptops.

If that was the norm for them and within Nvidia's own spec'd allowable heat tollerances, then I stand corrected and appologize. Nvidia was freaking NUTS for going with temps that high.
 
that high ????? let me introduce you to my friend X800XTPE he has a thing or two to show you about high temps, 117 under load :oops: up until about 6 months ago ( that pc got upgraded) it was still running fine.
 
What does the GT200 chip have to do with the g86? Are they failing in droves? And I would like to know how n the hell a G86 in a laptop, non-MXM version, isn't going to go beyond allowable temps if the HS/P assembly IS NOT making complete contact with the GPU. Take a desktop g86 based card, lift the HSF to one side say 1/32nd of an inch and see how fricken long it lasts before poof. And why Aren't the G86 desktop GPUs failing in drves like the laptop version, they are the same chip are they not? So again cooling plays a role in this mess. Most of the blame belongs on Nvidia, I AM NOT FUCKING DENYING THAT, but the OEMs are not freakin blameless.

Well, let me use small words and spell it out for you again.

You said bad HSF contact kills chips, or contributes to their failure. I showed you how bad G200 HSF contacts are, and that they are not failing in droves (yet). If it is poor contacts that are killing chips, the G200s should be crapping out all over the place. Are they?

If they are not, then your HSF contact issue with the laptops are a red herring. To use smaller words, the HSF issue is not the problem. Which one is it?

You are dead wrong here.

So, can we go back to the question of why only NV chips have such crappy OEM installations of HSFs across 10+ OEMs and hundreds of models? Why ATI chips don't die from the same issue?

Don't change the subject again, answer the question, it can't be that hard.

-Charlie
 
I've yet to see a laptop with an G86 idle and load temps come anywhere within 10c of the eVGA card I have. Under load it idles around 40c, under load close to 60c. Everylatop I've seen still working with G86s, are idle 55-60c and load temp near 90-100c.

Did you read the article I posted, and look at the graph? The temp range that kills GPUs is going from 60 to 80. So if you fire up a game, you are hurting it. Your numbers show that you are exceeding the Tg of the underfill under normal operating conditions.

See the problem?

-Charlie
 
Well, let me use small words and spell it out for you again.

You said bad HSF contact kills chips, or contributes to their failure. I showed you how bad G200 HSF contacts are, and that they are not failing in droves (yet). If it is poor contacts that are killing chips, the G200s should be crapping out all over the place. Are they?

If they are not, then your HSF contact issue with the laptops are a red herring. To use smaller words, the HSF issue is not the problem. Which one is it?

You are dead wrong here.

So, can we go back to the question of why only NV chips have such crappy OEM installations of HSFs across 10+ OEMs and hundreds of models? Why ATI chips don't die from the same issue?

Don't change the subject again, answer the question, it can't be that hard.

-Charlie

Go back up and re-read one of my posts, I did say we have had ATI based system in the shop for FAILED GPU. Care to guess who made the GPU? ATI. Reason for failure? HEAT FUCKING RELATED ISSUES! Cause of heat related issues? IMPROPER FREAKING HS/P ASSEMBLY CONTACT PATCH! It does happen, just not to the massive extent as with substrate affected Nvidia GPUs, but it does happen. Something YOU REFUSE TO ADMIT TO as you CLAIM IT DOESN"T HAPPEN!

Charlie, your biggest problem is you have SUCH A DAMN HARD ON for anything bad concerning Nvidia, that even WHEN you are right, you are still concidered a nut job. HardOCP, Maximumpc, ANAND, Toms, FiringSquad all view you in about the same light for communities, a nut job who will make up stories about them to get hits.

Now, I have said Nvidia is at fault in this thread SEVERAL DAMN TIMES NOW! And if the given thermal designs were for such high temps, then they should take een more of the blame, BUT THE DOESN'T MEAN THE FREAKING OEMS DESERVE A FREE FREAKING PASS FOR SHITTY QA OF PART ASSEMBLIES! And as I have stated before a simple augmentation to the HS/P assembly has shown to DECREASE, can you say that word or even understand its meaning, temps of G86 GPUs by AS MUCH as 20C. Not gonna take it down below the substrate thermal threshhold, but lower it enough to allow it to live that much longer. Still, Nvidia should pay for the fuck up, but it ISN'T ALL THERE FAULT!

And your G200 thing DOESN"T pertain to G86s. G200 have this nice big heat spreader and a HUGE HSF assembly that does about 1000 times better job of moving heat away from the GPU. Where as the laptop, usually has a flattened out heatpipe and cooling fins anywhere from 4 to 12" away from the GPU itself which in itself has NO HEAT SPREADER ON IT.
 
Did you read the article I posted, and look at the graph? The temp range that kills GPUs is going from 60 to 80. So if you fire up a game, you are hurting it. Your numbers show that you are exceeding the Tg of the underfill under normal operating conditions.

See the problem?

-Charlie

You graph points to 60-80c as being the killing zone. I said "CLOSE TO 60C". Do you know what that means, it means it get close, but DOESN'T QUITE MAKE IT. On the other hand, Laptop GPUs idle around there and then go thru the damn roof. But yet a simple augmentation of the heatpipe for the heat sink giving it better contact WILL lower both IDLE and LOAD temps. A POINT YOU STILL FREAKING REFUSE TO FREAKING ACKNOWLEDGE!
 
And SB, it myay be SoP to test the setup, but do you honestly think any of the OEMs treat the laptops like most people do? Play heavy gaming for 3-4 hours, turn off, start up repeat, turn off. Do some homework, surfing, light work, sleep/standby/hibernate, back to heavy gaming. Turn it off. ALso while also do all this sometimes using it in their laps, on pillows, beds, blankets, over top loose papers for month on end? I some how seriously doubt they stress test their laptops anywhere near that strenuous.

Actually all OEM's test their laptops under far more stressful thermal loads than any potential customer will ever run into, short of a completely failed (fan no longer operatng) cooling system. Their entire business relies upon it.

Regards,
SB
 
Actually all OEM's test their laptops under far more stressful thermal loads than any potential customer will ever run into, short of a completely failed (fan no longer operatng) cooling system. Their entire business relies upon it.

Regards,
SB

If thats the case, then the G86 issue should have been picked up long before all the issues that have become known. I'm sorry, but Nvidia is mostly at fault for this whole mess, but to say the OEMs are faultless is wrong.
 
If thats the case, then the G86 issue should have been picked up long before all the issues that have become known. I'm sorry, but Nvidia is mostly at fault for this whole mess, but to say the OEMs are faultless is wrong.

how do you come to that? unless an OEM is going to run lots of test systems for thousands/ tens of thousands of hours how are they supposed to find faults that are reducing the expected MTTF. heat kills tend to be quick , never turn on an AMD XP without HSF its dead before you can blink for example.

generally there has been a localised event to kill something via heat, otherwise its heat affecting something else that is causing failure and as i said before unless they run lots of systems to there MTTF how is a OEM supposed to find those?

So again in comes back to the operating conditions that NV specified.
 
http://www.eetimes.com/news/latest/...ZO0ITDQE1GHPSKHWATMY32JVN?articleID=221600875

This is curious, is it just "damage control" or are the reports of new yield issues false?
''The 40-nm yield didn't drop as reported. As a matter of fact, yield on the 40-nm process remained flat. TSMC is confident that the 40-nm yield will improve at the beginning of next year.''
Here was my thought process when I read first about yields before this new article.

LordEC911 said:
I think it needs to be said that yield most likely isn't something that was affected by TSMC's problem that occurred. Back when Cypress and Juniper were ramping at TSMC there was talk of ~60% yields up from the original ~20-25% from RV740. The main problem facing TSMC is capacity. I believe the machines that caused the problems were new equipment that was meant to increase capacity which didn't work out so well.

Edit- Hmmm... reading my post again I guess I need to clarify the point I was trying to make. TSMC has two options, either run the machine as it is now, causing lower yields since the new equipment increased capacity or not run the new equipment until it is fixed and keep capacity at where it is. It is obviously in TSMC's hands.
 
Last edited by a moderator:
If thats the case, then the G86 issue should have been picked up long before all the issues that have become known. I'm sorry, but Nvidia is mostly at fault for this whole mess, but to say the OEMs are faultless is wrong.

Because this problem creeps up over time with extended on then extended off cycles. Quickly turing the GPU on/off/on/off isn't going to reproduce the issue. Otherwise, yes, the OEMs would have run into this. Or just as likely they got a good batch of chips that were more resistent to this over time issue.

Considering the bulk of OEM sales is to businesses, it's ludicrous to think they don't do extensive testing or have no experience with testing of customer useage patterns.

And to think that upwards of 10 major OEMs ALL designed faulty cooling systems and ALL didn't test it...

Ummm, yeah...

As I said previously, if only one OEM exhibited this problems. Probably OEM fault. Two OEMs, pretty damn unlikely but ok, maybe. Three or more? Ummmm, yeah...

And, of course, all that speculation ignores the fact that the failures occur within the Nvidia specified operating temperatures of the chip. Meaning the cooling system is doing the job exactly as Nvidia specified they should.

Regards,
SB
 
Because this problem creeps up over time with extended on then extended off cycles. Quickly turing the GPU on/off/on/off isn't going to reproduce the issue. Otherwise, yes, the OEMs would have run into this. Or just as likely they got a good batch of chips that were more resistent to this over time issue.

Considering the bulk of OEM sales is to businesses, it's ludicrous to think they don't do extensive testing or have no experience with testing of customer useage patterns.

And to think that upwards of 10 major OEMs ALL designed faulty cooling systems and ALL didn't test it...

Ummm, yeah...

As I said previously, if only one OEM exhibited this problems. Probably OEM fault. Two OEMs, pretty damn unlikely but ok, maybe. Three or more? Ummmm, yeah...

And, of course, all that speculation ignores the fact that the failures occur within the Nvidia specified operating temperatures of the chip. Meaning the cooling system is doing the job exactly as Nvidia specified they should.

Regards,
SB

I didn't say the design was faulty, I said teh assembly was faulty. Augment(bend) the heatpipe just slightly to allow for better contact to the GPU and the temps decrease because of the better contact patch.
 
Back
Top