View Full Version : "GeForce 7900 Inferno"
http://enthusiast.hardocp.com/article.html?art=MTA2OSwxLCxoZW50aHVzaWFzdA==
Hrrm. Is this the same issue that pcperspective (I think --or was it TechReport? Edit: Duh, it's right in the article; pcper) addressed originally and NV assured everyone had been dealt with? Or something new?
sonyps35
26-May-2006, 04:58
I was just about to post this. And actually thought of posting something like this days ago based on forum posts.
Basically it seems over at Hardocp forums a lot of 7900 GTX's are dying..
Nvidia's hard launch pressure and vendor overclocking biting them in the backside?
Probably, NV knew it would be time to upgrade again when G80 comes in very soon (as rumour have it). So, the 7900GTX will serve you just for a stop gap period (if the GTX is dull, the owner will buy out a new one anyway), and it sounds like what ATi did on slide about quick die the GTX coming ture :???:
PS. Don't take it seriously
digitalwanderer
26-May-2006, 06:27
Well at least it sounds like the companies are honoring their warranties and aren't shirking the blame, although it sounds like Kyle ain't real enthused with XFX at all....and oddly enough I just spoke to Ryan Dumas last week on the phone my ownself and I can sort of see where he's coming from.
nutball
26-May-2006, 09:47
Gosh. Running chips out of specification isn't without consequences. Who'd'a thunk it?
digitalwanderer
26-May-2006, 13:41
But it's the whole "manufacturers clocking cards beyond their specs and selling them as stable" bit that doesn't sit right with me.
When an enthusiast does it that's one thing, but when a company is selling an OCed product they should make sure the product can handle the OC. :(
It seems that manufacturers weren't running the factory OCed cards long enough to weed out the problems. If you read through the evga forums, you'll see that most of the cards worked fine for at least a couple days/weeks, then they started having issues. After they started having problem, even running at nonOCed speed didn't seem to help. And they are handling the RMA process quite well, so that's a plus.
My eVGA 7900GT CO has been fine since March, though we'll see what happens when I step up to the 7900G KO SC. I just want the quieter cooler, but the bump in speed is nice (as long as i works:))
trinibwoy
26-May-2006, 17:11
Or something new?
Think it's the same issue. Overclocked boards dying.
Bouncing Zabaglione Bros.
26-May-2006, 17:38
If you read the forum comments, it seems people are complaining of exactly the same symptoms at standard speeds too.
When a manufacturer overclocks, do they do it with the explicit permission of Nvida? Do they get guidelines on how far they can go, or are they on their own with trial and error?
Think it's the same issue. Overclocked boards dying.
Well, that's not a comfort. Part of the pcper piece from April 10th that [H] did not quote was this:
The vendors, and NVIDIA, are being much stricter at the fab where G71 is being manufactured and NVIDIA tells me the new chips going out will be able to run at the speeds the vendors will sell at.
Yet here we are 6 weeks later and "not so much"? Possibly still sell-thru of "pre-much stricter"? One would like to hope, because otherwise some rosy scenario soap got sold to the community six weeks ago, and some people who took that at face value may be regretting it now.
Yep, it's the same issue that was discussed here on April 17th ...
from the HardOP article:
Well today NVIDIA officially got back to me with an answer to all my inquiries on the subject. The short of it is: the GPUs on the cards that were having problems didn’t have the headroom necessary for the vendor’s overclocked specifications. The two most prominent problem areas were the vertex clock (which remember runs 50 MHz faster than the pixel clock) and the memory clock. These GPU subsystems were running far enough out of spec that the chips were having physical issues with stability; causing the random “freezes” we were seeing.
from the TechReport article:
http://www.pcper.com/article.php?aid=235
Recently an issue came up with the first batch of 7900 GTX retail cards that were "freezing" in games for 10-30 seconds at time that we have determined is related (http://www.pcper.com/article.php?aid=235) to the overclock retail cards only. The issue seems to be more related to the vertex clock on the GPU that was running 50 MHz higher than the pixel clock on the G71 by NVIDIA's design. That engine was 50 MHz over whatever the core clock is set at, so a GPU running at 670 MHz had a 720 MHz vertex shader clock. This apparently caused some issues with the chips themselves, causing these freezes.
After talking with BFG and NVIDIA, as well as the other vendors on this issue, it has been mostly resolved. The answer turns out to lower the vertex clock difference or remove it completely in order to get the entire core to run at the same speeds. This doesn't affect performance very much, but does place some concerns on both NVIDIA's and the vendors QA processes.
Todate I've had no issues with my PNY 7900 GTX OC running at 675mHz.
Pharma
Goragoth
26-May-2006, 23:26
My brand new (three days old) MSI 7900 GTX at stock speeds (650mHz/800mHz) is having the same problems many others are reporting. So far Dawn of War crashed to desktop after a couple of hours playing and the screen was partially corrupted and would flicker on and off. I was able to see enough to reboot and it seemed fine after that until I played F.E.A.R. for a few hours and when exiting the game bam! same thing again.
Someone at guru3d speculated that this is due to the card not being able to switch from 3d mode to 2d mode properly. Is this possible? Anyway, it seems that 3dmark06 is particlarly problematic so I'm dwnloading that now to test and I'll grab the beta drivers to see if those fix anything. If it remains broken I guess I'll have to RMA :(
Are there really that many bad cards out there or is there a deeper problem in the BIOS or driver for this card? I would feel somewhat stupid doing the RMA and having the same thing happen with the next card.
fallguy
27-May-2006, 01:26
While it may be the overclocked boards dying more, stock clocks boards are as well. This is one of the reasons ATi put a clamp on tampering with the specs of a card years ago. It can give NV a bad rep of producing bad cards.
Goragoth
27-May-2006, 02:22
Okay, I've tried 3dmark06 and the Deep Freeze test kills it pretty reliably. First time through going through all the tests it froze completely part way through, had to hard reboot, and the second time (just running the Deep Freeze test) it froze at exactly the same place as the first time but this time it recovered after a couple of seconds and carried on (with a big dip in framerate) but once the test exited the desktop was corrupted and it made with flashing on and off until I managed a soft reboot.
Can anybody here with a 7900GTX confirm being able to run through the Deep Freeze test without problems? At least in that case I can put it down to a bad card, although it seems weird that so many of these cards are failing in pretty much the same way.
I'm grabbing the beta drivers now to see if they fix anything but I'm not holding my breath since the release notes don't mention anything like this.
SugarCoat
27-May-2006, 03:42
While it may be the overclocked boards dying more, stock clocks boards are as well. This is one of the reasons ATi put a clamp on tampering with the specs of a card years ago. It can give NV a bad rep of producing bad cards.
The clamp has been off for some weeks now. Very soon cometh the 700/1700 X1900XTXs
Only solution to this is water cooling for everyone!
On a serious note, i dont mean any disrespect to those who like Nvidia cards, but i have had lots of problems like this going back to the 7800s when the gloves were really taken off for overclocking. They dont test many cards that go out. The little write up seems to give the impression that they test each one. A few of the manufacturers have released thousands of units with the completely wrong set of bios in a couple instances with GTX an GT cards from the 7800 series. The AIBs pay little attention to what they're releasing, and rather then testing every card, usually test a random 1 in 20 or 1 in 50 that go out the door. Its obvious the cards are being pushed too hard for the cooling they have. In the cast of many of the 7800 series problems the solution was a bios that removed all dynamic clock function. A beefer cooler is always welcome as well, but that doesnt stop longterm damage.
Goragoth drop your clock by 50MHz and see what happens if drivers dont do anything. You'll lose about 100points in that test and at the most 3FPS in games. If that works and you're adventurous like me you can start hunting around for a new set of bios. eVGA mods/techs in their forum use to build ones on request last time i was there (7months ago).
Goragoth
27-May-2006, 04:31
Dropped core/memory clocks by 50mHz each, ran 3dmark06 Deep Freeze and bam! Crash. A little different this time with the graphics in the test itself corrupting, textures vanishing and finally locking up after a few seconds. I suspect there's nothing left but to RMA the damn thing.
fallguy
27-May-2006, 05:00
The clamp has been off for some weeks now. Very soon cometh the 700/1700 X1900XTXs
Actually, some months now. But yeah, its lifted somewhat. Which ATi needs with so many NV cards being overclocked.
Goragoth
27-May-2006, 05:32
Ugh, the underclocking actually seemed to make things worse. About 10-20 seconds into running Dawn of War it crashed with a completely corrupted display. Reseting the clocks to the defaults has at least made it possible to play the game again. Seems odd though.
It seems that manufacturers weren't running the factory OCed cards long enough to weed out the problems. If you read through the evga forums, you'll see that most of the cards worked fine for at least a couple days/weeks, then they started having issues. After they started having problem, even running at nonOCed speed didn't seem to help. And they are handling the RMA process quite well, so that's a plus.
My eVGA 7900GT CO has been fine since March, though we'll see what happens when I step up to the 7900G KO SC. I just want the quieter cooler, but the bump in speed is nice (as long as i works:))
AMD K6-300/350 have these problem in back in late 98. Start off with 350MHz. After using it for a few hours , the workable clock speed start to drop. (The chip might be good for 370MHz but u won't notice it to fail until it drop below 350Mhz.) The rate of clock speed drop decline and it will stabilize at some lower frequency. Some just die. That problem caused AMD and vendors to implement a lot of additional screens. If voltage is lowered, the rate of degradation slowed down exponentially.
Earlier K6 (200, 233, 266) did not have the problem. AMD eventually fixed the problem but that was a painful period.
Nvidia and/or partners may be pushing clock speed (to compete with X1900XTX) by cranking the voltage up. (Just a wild guess)
I remember slides by ATI marketing on Nvidia clocking the G70 (7800GT) voltage beyond recommendations by TSMC for 110nm. Ironically 7800GTX seem to be reliable. When u flirt with danger, sometime it is OK but sometimes - SHIT HAPPENS. Kind of like Nvidia always
aggressive with using the latest process tech from TSMC until we all known what happened with 130nm/NV30.
I have not read thru the various forums on this but I suspect the problem is more with the 7900GTX than the GT. Nvidia did not need to push the voltage to achieve GT speed even with the small cooler.
Is Nvidia voltage stressing the G71 as well. Anyone have info on the voltage vs recommendations by TSMC?
http://www.techpowerup.com/reviews/ATI/NVDesperate/images/NVDesperateStand-5.jpg
Richteralan
28-May-2006, 08:01
From my experience with a BFG Geforce 7900GT OC and eVGA e-Geforce 7900GT Signature Series,
I can confidently say the problem isn't with the graphic core.
The first BFG 7900GT worked absolutely perfect for me. NO errors NO artifacts NO lock-ups. Able to overclock pretty high as 570/1720 WITHOUT any artifacts/lock-ups.
The eVGA 7900GT SS is just a bad card. RMAed 3 times and got 3 bad cards. None of them worked through the 2nd day. And all of them having the same problems, as most of the forum users posted. Downclock the graphic memory frequency solved the problem. Downclock the graphic memory frequency while overclocking core frequency has NO problems AT ALL.
Obviously it's a bad batch of GDDR3 from Samsung OR it's a fundemental PCB design/vreg. components failure.
And more funny thing is, NVIDIA/eVGA NEVER acknowledged this. NVIDIA is saying graphic core is overclocked too much and eVGA is just repeating that they will take care of the customers who need to return the card. NOBODY explained WTH is happening and WHY there are so many users experience the same problems.
I'm already getting a RMA with newegg. I don't like to play with these frequent RMA stuffs. I grabbed a X1900XTX from ebay for the same price as 7900GT SS.
Good luck NVIDIA/eVGA
Richteralan
28-May-2006, 08:06
But it's the whole "manufacturers clocking cards beyond their specs and selling them as stable" bit that doesn't sit right with me.
When an enthusiast does it that's one thing, but when a company is selling an OCed product they should make sure the product can handle the OC. :(
That's just NVIDIA blaming others and don't wanna take responsibilities
Richteralan
28-May-2006, 08:08
While it may be the overclocked boards dying more, stock clocks boards are as well. This is one of the reasons ATi put a clamp on tampering with the specs of a card years ago. It can give NV a bad rep of producing bad cards.
I'm not sure if it's really related to overclocking. I could oc my BFG to pretty high clock and no signs of failure(yes, n+1 hrs of ATITool artifact scanning/n+1 hrs of RTHDRIBL/and much Oblivion and 3DMarking).
Richteralan
28-May-2006, 08:09
Nvidia and/or partners may be pushing clock speed (to compete with X1900XTX) by cranking the voltage up. (Just a wild guess)
I remember slides by ATI marketing on Nvidia clocking the G70 (7800GT) voltage beyond recommendations by TSMC for 110nm. Ironically 7800GTX seem to be reliable. When u flirt with danger, sometime it is OK but sometimes - SHIT HAPPENS. Kind of like Nvidia always
aggressive with using the latest process tech from TSMC until we all known what happened with 130nm/NV30.
I have not read thru the various forums on this but I suspect the problem is more with the 7900GTX than the GT. Nvidia did not need to push the voltage to achieve GT speed even with the small cooler.
The eVGA 7900GT Signature Series increased 3D core voltage from 1.2V to 1.45V.
This maybe dangerous because the HSF is too lame when compared with 7900GTX.
But I wouldn't think this is the cause because some 7900GTX is failing as well.
From my experience with a BFG Geforce 7900GT OC and eVGA e-Geforce 7900GT Signature Series,
...
I'm already getting a RMA with newegg. I don't like to play with these frequent RMA stuffs. I grabbed a X1900XTX from ebay for the same price as 7900GT SS.
Good luck NVIDIA/eVGA
Agree, even if the manufacturer pay their responsibility on RMA but it is not WORTH our time and EFFORT doing it frequently! The RMA one, at least, should be tested before hand out to us again.... sigh.... :evil:
The eVGA 7900GT Signature Series increased 3D core voltage from 1.2V to 1.45V.
This maybe dangerous because the HSF is too lame when compared with 7900GTX.
But I wouldn't think this is the cause because some 7900GTX is failing as well.
OC without raising the voltage is "safe" If the heatsink cannot cool it sufficiently, the game/software crashes/artifacts but no permanent damage. (Thermal overuns at junction need to >~ 170C before causing permanent damage) So when the clock is lowered, everything is fine. Running at high temp will accelerate shorten life but the failures should still take many months and not hours/days.
However jacking up the voltage may exceed the recommendation for the gate oxide thickness used for 90nm. Too high a voltage may rupture the gate oxide or accelerate other failure mechanisms.
From my experience with a BFG Geforce 7900GT OC and eVGA e-Geforce 7900GT Signature Series,
I can confidently say the problem isn't with the graphic core.
The first BFG 7900GT worked absolutely perfect for me. NO errors NO artifacts NO lock-ups. Able to overclock pretty high as 570/1720 WITHOUT any artifacts/lock-ups.
The eVGA 7900GT SS is just a bad card. RMAed 3 times and got 3 bad cards. None of them worked through the 2nd day. And all of them having the same problems, as most of the forum users posted. Downclock the graphic memory frequency solved the problem. Downclock the graphic memory frequency while overclocking core frequency has NO problems AT ALL.
Obviously it's a bad batch of GDDR3 from Samsung OR it's a fundemental PCB design/vreg. components failure.
And more funny thing is, NVIDIA/eVGA NEVER acknowledged this. NVIDIA is saying graphic core is overclocked too much and eVGA is just repeating that they will take care of the customers who need to return the card. NOBODY explained WTH is happening and WHY there are so many users experience the same problems.
I'm already getting a RMA with newegg. I don't like to play with these frequent RMA stuffs. I grabbed a X1900XTX from ebay for the same price as 7900GT SS.
Good luck NVIDIA/eVGA
U will assume that after a RMA, eVGA would ensure that u get a good card. After the 2nd, they would at least wake up.
Some one posted this problem on a major on-line retailer in Canada. Staff which was monitoring the on-line posts responded within hours :
"RMAs with the 7900 GTX series is almost negligible and currently at 1% to 2% or so (this also includes the odd return, where the card gets open boxed instead of it actually being defective).
With the 7900 GTs across all the brands, even with all the hoopla and paranioa, the RMA is only a couple higher. There've been the odd bad board, but not as many as people seem to think.
There would be a portion of boards coming back because of other things (poor quality PSU, heat, etc.)"
The funny thing is that a couple guys posted right after the adminstrator detailing the problems with their cards.
Well, that's not a comfort. Part of the pcper piece from April 10th that [H] did not quote was this:
"The vendors, and NVIDIA, are being much stricter at the fab where G71 is being manufactured and NVIDIA tells me the new chips going out will be able to run at the speeds the vendors will sell at."
Yet here we are 6 weeks later and "not so much"? Possibly still sell-thru of "pre-much stricter"? One would like to hope, because otherwise some rosy scenario soap got sold to the community six weeks ago, and some people who took that at face value may be regretting it now.
When have TSMC fab processing ever being non-strict? Hundreds of process steps, any one step can screw things up. How can Nvidia be much stricter at the fab?
Someone at Nvidia PR messed up on this. Does that mean that the chips out there today are not that robust. If it can't overclock, can u still work after warranty expired. If u really must have a 7900 family, make sure u get those that offer >= 3years warranty.
Bouncing Zabaglione Bros.
28-May-2006, 11:04
Obviously it's a bad batch of GDDR3 from Samsung OR it's a fundemental PCB design/vreg. components failure.
And more funny thing is, NVIDIA/eVGA NEVER acknowledged this. NVIDIA is saying graphic core is overclocked too much and eVGA is just repeating that they will take care of the customers who need to return the card. NOBODY explained WTH is happening and WHY there are so many users experience the same problems.
Good luck NVIDIA/eVGA
Doesn't the [H] article say that all cards are built by one company for Nvidia and the other card companies just rebadge/heatsink/maybe change BIOS? If there is a design flaw or operating out of spec, it looks like it's really down to Nvidia as the ultimate source of not just the chips, but the card designs and card manufacturing too.
Dave Baumann
28-May-2006, 11:46
From my experience with a BFG Geforce 7900GT OC and eVGA e-Geforce 7900GT Signature Series,
I can confidently say the problem isn't with the graphic core.
To make this statement, I assume your sample size is greater than two? Chips never come out of the oven exactly alike, there will always be variances with their tolerances - they vary just from their location on the wafer.
It certainly does not seem like it ought to be a problem --after all, it is a much smaller chip than R580, on the same process. They ought to be able to make that work at same/somewhat higher clocks than R580, and that's pretty much what we're talking about here, within < 10%.
I have found myself wondering from time to time at those 22 million transistors they "took out" going from G70 to G71, and if they have any impact on some of the issues reported.
trinibwoy
28-May-2006, 14:04
Well they did shorten the pipeline and that would be of more significance than transistor count or die-size which I would think impact power consumption and heat more than max clockspeed.
Richteralan
28-May-2006, 16:24
Doesn't the [H] article say that all cards are built by one company for Nvidia and the other card companies just rebadge/heatsink/maybe change BIOS? If there is a design flaw or operating out of spec, it looks like it's really down to Nvidia as the ultimate source of not just the chips, but the card designs and card manufacturing too.
The eVGA 7900GT Signature Series has more components soldered when compared with reference 7900GT PCB.
Richteralan
28-May-2006, 16:25
To make this statement, I assume your sample size is greater than two? Chips never come out of the oven exactly alike, there will always be variances with their tolerances - they vary just from their location on the wafer.
Well, then tell me why most users downclocked memory ONLY and problems are gone?
Richteralan
28-May-2006, 16:26
U will assume that after a RMA, eVGA would ensure that u get a good card. After the 2nd, they would at least wake up.
Some one posted this problem on a major on-line retailer in Canada. Staff which was monitoring the on-line posts responded within hours :
"RMAs with the 7900 GTX series is almost negligible and currently at 1% to 2% or so (this also includes the odd return, where the card gets open boxed instead of it actually being defective).
With the 7900 GTs across all the brands, even with all the hoopla and paranioa, the RMA is only a couple higher. There've been the odd bad board, but not as many as people seem to think.
There would be a portion of boards coming back because of other things (poor quality PSU, heat, etc.)"
The funny thing is that a couple guys posted right after the adminstrator detailing the problems with their cards.
Well, some mods over eVGA forum even says 3DMark/other benchmark is stressing too much for the card and cause the failure.:lol:
SugarCoat
28-May-2006, 16:59
It certainly does not seem like it ought to be a problem --after all, it is a much smaller chip than R580, on the same process. They ought to be able to make that work at same/somewhat higher clocks than R580, and that's pretty much what we're talking about here, within < 10%.
I have found myself wondering from time to time at those 22 million transistors they "took out" going from G70 to G71, and if they have any impact on some of the issues reported.
R520/R580 are more robust as well though. One thing that would worry me is due to the size of the G71 that heat would attack it far worse and have an effect at a lower temperature then a chip twice its size. Especially at the frequency/voltage it operates at.
Well, then tell me why most users downclocked memory ONLY and problems are gone?
Memory is fabbed as well so Dave's comment can be carried over to the GDDR. GDDR can fail just like a GPU can. There may also be an issue with the G71s memory controller here unable to cope with the speed of the GDDR or otherwise being effected by heat/voltage causing it to fail. Reducing the speed of memory may infact be reducing stress on the core.
It is kinda confusing why G71, being so small compared with R580, needs as much cooling (if not more, that is some metalwork on 7900GTX!) at the "same" clockspeeds...
How much of that "over-engineering" is just to enable low fanspeeds?
Has anyone tried to put a 1900XT cooler onto a 7900GTX or vice versa to see how they compare? Or put the same third-party cooler onto both?...
Jawed
trinibwoy
28-May-2006, 17:59
How much of that "over-engineering" is just to enable low fanspeeds?
Are temps similar?
I dunno, I'm really in the dark about temps on the two GPUs.
The only thing I "know" is that the fan on the 7900GTX runs quieter. I presume it runs at much lower revs...
From pix it seems that 7900GTX's cooler is much heavier-duty - hence the question about "over-engineering".
Jawed
SugarCoat
28-May-2006, 18:20
It is kinda confusing why G71, being so small compared with R580, needs as much cooling (if not more, that is some metalwork on 7900GTX!) at the "same" clockspeeds...
How much of that "over-engineering" is just to enable low fanspeeds?
Has anyone tried to put a 1900XT cooler onto a 7900GTX or vice versa to see how they compare? Or put the same third-party cooler onto both?...
Jawed
I know for the Zalman VF900cu the GTX operates at a 5-8C lower temp then the XTX at stock frequencies. XTX being at around 68-75C @ load. One thing about the GTX is that it seems to get extremely hot under very quickly in certain applications like BF2, synth benchmarks and FEAR. So much so that it does benefit the card to have all that metal to remove the heat like the stock GTX cooler has. Something like the VF900cu doesnt seem to remove it fast enough to be worthwhile for the GTX but it is worthwhile for the XTX which is odd.
These two parts of your posting contradict each other:
I know for the Zalman VF900cu the GTX operates at a 5C lower temp then the XTX at stock frequencies.
Something like the VF900cu doesnt seem to remove it fast enough to be worthwhile for the GTX but it is worthwhile for the XTX which is odd.
I expect that was your intent, but it's kinda confusing what you actually mean...
EDIT: Hmm, I take it back. They don't contradict each other, but they indicate something unpleasant is happening.
Jawed
Richteralan
28-May-2006, 18:24
Memory is fabbed as well so Dave's comment can be carried over to the GDDR. GDDR can fail just like a GPU can. There may also be an issue with the G71s memory controller here unable to cope with the speed of the GDDR or otherwise being effected by heat/voltage causing it to fail. Reducing the speed of memory may infact be reducing stress on the core.
Yes I am aware of this, too.
Let me share some of the experience of my eVGA 7900GT SS problems:
If I downclock memory and keep core as eVGA's factory default clock(600MHz), no problems at all.
If I downclock memory to 1400MHZ, and overclock core to 650MHz, no problems at all, too!
If I downclock core to 550MHz, and keep memory as eVGA's factory default clock(1600MHz), same problems appears.
If I downclock both, no problems at all.
Some users reported that they modified the memory timing in BIOS and helped a lot.
Richteralan
28-May-2006, 18:26
Are temps similar?
For eVGA 7900GT SS with stock HSF, I'm getting around 75C under full load condition. 45C idle.
P.S. the fan setting for SS is 2D: 25%, 3D 100%
Well, some mods over eVGA forum even says 3DMark/other benchmark is stressing too much for the card and cause the failure.:lol:
Shouldn't the manufacturer make sure that the chip can take the stress since running 3DMark is not unresonable. Likewise with FEAR and Oblivion.
There is a manufacturing process called "burn-in" improve reliability. Basically, parts are
stressed at elevated voltage and/or temperature (in ovens but at reduced clock rates to keep power manageable. Lots of chips per board)
The military mandates it for their parts. The automotive guys also mandates it for certain components (such as used in Engine Control Modules). intel and AMD does it for 100% of the CPU. Glitches in graphics is not as critical in glitches in calculation for a computer doing payroll.
Burn-in is a very expensive process. Not only u need to do design special boards but the chips have to be designed for it. U can't have the whole motherboard in the oven (it will be stressing the MB+drives, etc) So CPU have BIST (built-in seft test) that exercise the chip
repeatedly with very few external stimulus or can be exercise by boundary scan.
For consumer grade chips such as those in DVD players, cellphones, ... no burn-in.
The rely on reliability provided by foundry (TSMC, UMC, VSM) when those guys qualify their
process. TSMC monitor its process by requring customers to embed test structures in
the scribe line (the part that got sawn away) that it can use to monitor production
processes and predict reliability.
However the reliability data that TSMC provides is for the process ASSUMMING that
ATI/Nvidia/Broadcom... design/layout per design rules provided by TSMC for each of its processes.
Individual designs have differing reliabilty levels depending on how aggressive the design is and whether they are pushing the limits recommended by TSMC.
ATI RV3xx did not have problems with TSMC 0.13 while Nvidia NV30 was a disaster.
Some problems show up early as non-functional (yields) while others could be degradation
like what we are observing with 7900 family.
I guess Nvidia can't point finger at TSMC because u don't want to piss off TSMC off now since their fabs are running close to capacity. IBM don't have any cap to spare now with
the consoles ramping.
SugarCoat
28-May-2006, 18:32
These two parts of your posting contradict each other:
I expect that was your intent, but it's kinda confusing what you actually mean...
EDIT: Hmm, I take it back. They don't contradict each other, but they indicate something unpleasant is happening.
Jawed
Well both cards have different load temps Jawed, i think it may seem confusing if one were to assume both cards top out at the same load temps. People report GTX cards as running anywhere from 60-70C at load where as the XTX commonly hits 80-85C. The VF900cu simply isnt robust enough for some reason to remove enough heat to be beneficial over the stock cooler in the case of the GTX at load (does help at idle temps) but it does help in the case of the XTX to reduce its temperature to around 72-75C. What i said before was a bit confusing but i think this is a bit better.
SugarCoat
28-May-2006, 18:41
Yes I am aware of this, too.
Let me share some of the experience of my eVGA 7900GT SS problems:
If I downclock memory and keep core as eVGA's factory default clock(600MHz), no problems at all.
If I downclock memory to 1400MHZ, and overclock core to 650MHz, no problems at all, too!
If I downclock core to 550MHz, and keep memory as eVGA's factory default clock(1600MHz), same problems appears.
If I downclock both, no problems at all.
Some users reported that they modified the memory timing in BIOS and helped a lot.
Sounds like the memory is being pushed too far which is exactly what was going on with both eVGA 7800GTs i had. It depends how much you care, but if you dropped the memory speed to 1500MHz it would problably stabalize. Alternatly you can try to get ahold of their KO Superclocked BIOS which will set your clocks to 580/1580 (go on their forums and try to get the attention of a tech). Otherwise i would just RMA.
trinibwoy
28-May-2006, 18:43
From pix it seems that 7900GTX's cooler is much heavier-duty - hence the question about "over-engineering".
Or it could be something simple like they designed the heatsink for G70 Quadro and GTX-512, it works very well so they use it on 7900 as well. I have no idea what the cost structure is for GPU coolers so I wouldn't speculate on what it's costing them to forego a redesign to come up with a cheaper/simpler cooler. They're already making a killing on the chips themselves so that could absorb some of the cost as well.
Well both cards have different load temps Jawed, i think it may seem confusing if one were to assume both cards top out at the same load temps. People report GTX cards as running anywhere from 60-70C at load where as the XTX commonly hits 80-85C. The VF900cu simply isnt robust enough for some reason to remove enough heat to be beneficial over the stock cooler in the case of the GTX at load (does help at idle temps) but it does help in the case of the XTX to reduce its temperature to around 72-75C. What i said before was a bit confusing but i think this is a bit better.
OK, well it seems this is a question of "density" then, as there's such a marked difference in "maximum tolerable" temperature between the two GPUs. Wow, that's a huge difference.
Richteralan's problems with "memory speeds" may actually be problems with "memory controller" speed/power - and that point about ATI's ring-bus being designed to avoid thermal hot spots could be relevant in a comparison of the two GPUs :?:
Jawed
Or it could be something simple like they designed the heatsink for G70 Quadro and GTX-512, it works very well so they use it on 7900 as well. I have no idea what the cost structure is for GPU coolers so I wouldn't speculate on what it's costing them to forego a redesign to come up with a cheaper/simpler cooler. They're already making a killing on the chips themselves so that could absorb some of the cost as well.
I've no idea either - but you see fairly cheap cards with "silent" coolers consisting of lots of metalwork and heatpipes, and the premium for them isn't a huge amount, so I guess cost isn't a big factor. (It amazes me how a hugely complex mobo can sell so cheaply compared to a value graphics card, so what do I know?...)
But Sugarcoat's comments seem to imply that the Quadro cooler is essential to G71's viability at GTX clocks, so I guess the argument over cost is moot.
Jawed
Well both cards have different load temps Jawed, i think it may seem confusing if one were to assume both cards top out at the same load temps. People report GTX cards as running anywhere from 60-70C at load where as the XTX commonly hits 80-85C. The VF900cu simply isnt robust enough for some reason to remove enough heat to be beneficial over the stock cooler in the case of the GTX at load (does help at idle temps) but it does help in the case of the XTX to reduce its temperature to around 72-75C. What i said before was a bit confusing but i think this is a bit better.
Just would like to say in order to transfer heat effectively, not only thermal conductivity coefficient need to be considered, but also the surface area used to convect heat too. If the surface area is much smaller, the material with higher thermal conductivity than Cu must be used in order to get the same effectiveness. (Assume that both large and small area core having got the same heat flow rate)
Just would like to say in order to transfer heat effectively, not only thermal conductivity coefficient need to be considered, but also the surface area used to convect heat too. If the surface area is much smaller, the material with higher thermal conductivity than Cu must be used in order to get the same effectiveness. (Assume that both large and small area core having got the same heat flow rate)
The only metallic conductor better than copper is silver.
http://hypertextbook.com/physics/thermal/conduction/
Note that AMD single core 90nm Athlon 64 at 90nm is < 100mm^2. intel have a version
of P4 at 90nm (1M cache) with size of 112mm^2. A bit smaller than G71. Similar power
envelope.
The only metallic conductor better than copper is silver.
http://hypertextbook.com/physics/thermal/conduction/
Note that AMD single core 90nm Athlon 64 at 90nm is < 100mm^2. intel have a version
of P4 at 90nm (1M cache) with size of 112mm^2. A bit smaller than G71. Similar power
envelope.
That might cost arm and leg :???: for using silver (anyway, that is why they put silver in thermal compound too). For AMD and Intel chip size, how many transitor of the two... and will you could the ball with AMD and Intel method :razz:
PS. I think it may be comparable for cpu and gpu but the clock speed is much different too.
Richteralan
28-May-2006, 20:10
Sounds like the memory is being pushed too far which is exactly what was going on with both eVGA 7800GTs i had. It depends how much you care, but if you dropped the memory speed to 1500MHz it would problably stabalize. Alternatly you can try to get ahold of their KO Superclocked BIOS which will set your clocks to 580/1580 (go on their forums and try to get the attention of a tech). Otherwise i would just RMA.
The KO and SS cards are PHYSICALLY different. So I wouldn't want to do that.
That might cost arm and leg :???: for using silver (anyway, that is why they put silver in thermal compound too). For AMD and Intel chip size, how many transitor of the two... and will you could the ball with AMD and Intel method :razz:
PS. I think it may be comparable for cpu and gpu but the clock speed is much different too.
Bottom line is the power that generates the heat. CPU may be running at higher clock speed but 1/2 the chip is cache of which use little power since only a small section is active at any one time.
Poor layout could result in hot spots where small areas within the chip have very high temp.
As I read thru the forum at HardOCP, the problem show up even on regular non OC 7900GT, so the thermal conductivity theory don't seem to hold. However, since we don't have the statistics, that failure might be an outlier.
One poster think that 7900 single card are junk and he have tons of problems. He give up on them but will wait and step up to 2 x 7950's.
Yes I am aware of this, too.
Let me share some of the experience of my eVGA 7900GT SS problems:
If I downclock memory and keep core as eVGA's factory default clock(600MHz), no problems at all.
If I downclock memory to 1400MHZ, and overclock core to 650MHz, no problems at all, too!
If I downclock core to 550MHz, and keep memory as eVGA's factory default clock(1600MHz), same problems appears.
If I downclock both, no problems at all.
Some users reported that they modified the memory timing in BIOS and helped a lot.
It is claimed that someone in Wikipedia wrote about the "Fourth Day Syndrome" and the theory of it being bad RAM. Nvidia is saying OC but others have differring opinions.
http://www.maximumpc.com/forums/viewtopic.php?t=40909
"Wikipedia wrote:
Shortly after the initial launch of the 7900GT CO/KO/Superclock series, a trend of hardware instability became more and more prevalent. Some symptoms of the unusually large number of defective cards include: artifacting while rendering graphics in graphics benchmarks such as 3DMark03, 3DMark05, 3DMark06 (all programmed by Futuremark, Inc.) Futuremark and Aquamark3, artifacting while playing games, BSODs (blue screens of death), total system restarts, and blinking screen.
A large batch of the 7900GT XX (note: XX may signify CO/KO/SC variants of the 7900GT) are believed to have defective and/or malfunctioning memory modules, thus causing instability and ultimately, total card failure. Another proposed cause of large-scale instability among the 7900GT XX include undervolting from the factory. That is, the 7900GT XX run at a 1.2 volt GPU core voltage, while their higher end relatives, the 7900GTX, have 1.4 volt GPU voltages, thus permitting higher clock frequencies (GPU/RAM). The 1.2 core volt coupled with factory clocks of up to 520/770 (1540 effective) may suggest that the core voltage is simply too low to allow for higher clock speeds. Another point to note is that the 7900GT XX and the 7900GTX are both based around the exact same core, featuring a 90nm process, allowing for a smaller die size and fewer total transistors within the core itself; it now becomes apparent that the 7900GT XX are actually meant to run at 1.4 volts, much like the 7900GTX, but are factory undervolted to 1.2 volts, for whatever reason nVidia may propose for doing this.
A term dubbed by myself and another fellow, known as the 'Fourth Day Syndrome' is in reference to the fact that many nVidia patrons who have purchased a 7900GT XX from a sub-manufacturer such as eVGA have frequently run into the aforementioned hardware failures synonymous with a significant number of 7900GT XX after the fourth day of using the card itself. The Fourth Day Syndrome itself is very likely a mere coincidence, but it may be a trend within a subset of the defective cards; something triggers massive card failure after four days, for an unknown reason. Recent speculation has determined the this Fourth Day Syndrome is most likely a continuation of the batch of 7900GTs with defective RAM. This batch first apeared with stock clocks of 520/1540, however the current KOs are clocked at 500/1500.
We believe that nVidia decided to downclock the core and RAM so that the problems with the defective RAM would propogate less often in the average user (who leaves their system at stock clock speeds). We are assuming of course that nVidia and its subcontractors are working to resolve this problem, and the future revisions of the card should be free of any defects. Currently, cards containing BIOS revisions up to 05.71.22.14.15 have had confirmed instances of Fourth Day Syndrome. Also note that even though this trend is common in enthusiast groups (such as EOCF), the actual number of defective GPUs makes up somewhere between five and ten precent of total 7900GT sales, based on RMA numbers."
trinibwoy
29-May-2006, 13:53
the actual number of defective GPUs makes up somewhere between five and ten precent of total 7900GT sales, based on RMA numbers."
Wow that's a lot. What's the source for the RMA numbers?
Although, I'm not sure how one can come to the conclusion that GT's are meant to run at 1.4v based on the failure rate when overclocked. Please choose memory, voltage or overclocking as the culprit. When everything is thrown in there it just looks like a Fuad special.
Richteralan
29-May-2006, 15:08
Wow that's a lot. What's the source for the RMA numbers?
Although, I'm not sure how one can come to the conclusion that GT's are meant to run at 1.4v based on the failure rate when overclocked. Please choose memory, voltage or overclocking as the culprit. When everything is thrown in there it just looks like a Fuad special.
I guess the numbers is just an estimate.
But I'm sure it's not related to GPU core undervoltage. My BFG 7900GT running 1.2V and can't be more stable.
Wow that's a lot. What's the source for the RMA numbers?
Although, I'm not sure how one can come to the conclusion that GT's are meant to run at 1.4v based on the failure rate when overclocked. Please choose memory, voltage or overclocking as the culprit. When everything is thrown in there it just looks like a Fuad special.
Finally found it at Wikipedia. Written by Matt Ferguson and AriusDante
http://en.wikipedia.org/wiki/GeForce_7_Series#GeForce_7900_GT
I presume it came from here. Too many posts but on post #10, and in that it case it is
just wild speculation. Quite a number of folks seem to need 2-3 RMA's b4 they got a satisfactory card. Either those guys have problematic PC to begin with (not sufficient power, etc) or the RMA cannot be a slow as 3-4% as claimed by retailer. (NCIX. They also don't count customers that RMA direct to eVGA, etc)
"it seems like a high precentage of US are having problems even though it's only a small portion of actual 7900GT's is because we're the only ones who push them hard enough to actually notice. I bet people that actually OC these things like us really don't make up more then 15% of total sales, so if 1/3 of us have problems, then that's the 5% of total 7900GT's with problems right there."
__________________
http://forums.extremeoverclocking.com/showthread.php?t=221201
I don't buy faulty RAM as the cause. In that case it would just be a batch and Nvidia could
finger Samsung/Hynix. If anything, it is the memory controller within G71 or some
card layout or power regulation issue.
Nvidia viral marketing asleep at the wheel. Shouldn't they be fixing the inaccuracies in Wikipedia?
Fourth Day Syndrome - Better movie title name them MI-3.
Nvidia viral marketing asleep at the wheel. Shouldn't they be fixing the inaccuracies in Wikipedia?
Fourth Day Syndrome - Better movie title name them MI-3.
References of Fourth Day Syndrome have been removed from Wikipedia.
Thanks for the prompt change.
Heh. That's anti-viral anti-marketing, isn't it? :lol:
Apparently based on feedback received, [H] feels a need to put this on the front page: http://www.hardocp.com/news.html?news=MTkyNjYsLCxobmV3cywsLDE=
Heh. That's anti-viral anti-marketing, isn't it? :lol:
He also took the effort to high-lite in color Nvidia technology such as Intellisample, CineFX, Ultrashadow, Purevideo and Scalable Link Interface.
http://en.wikipedia.org/w/index.php?title=GeForce_7_Series&diff=55924925&oldid=55641503
Apparently based on feedback received, [H] feels a need to put this on the front page: http://www.hardocp.com/news.html?news=MTkyNjYsLCxobmV3cywsLDE=
Kyle have been pro-Nvidia last year. Nvidia must have forgotten to pay his bills. He is all over the forum on this issue.
Here is the detailed tally of the failures on evga.
http://www.evga.com/community/messageboard/topic.asp?TOPIC_ID=15366
eVGA and Nvidia claim low failure rate. However there are a number of users on their 2nd or 3rd RMA. So either those folks are very unlucky or the failure rate is higher.
Kyle have been pro-Nvidia last year. Nvidia must have forgotten to pay his bills.
Puh-leese... Kyle just criticized ATI because of the lacking availability and such, nothing more than that. He may have his bad moments, but certainly nothing of that kind.
Puh-leese... Kyle just criticized ATI because of the lacking availability and such, nothing more than that. He may have his bad moments, but certainly nothing of that kind.
Banner headlines on ATI and harping about it regularly but pretty quiet on 7800GTX 512MB.
Look at the reviiews and the comments. Kyle has a history of love-hate relationship with Nvidia dating back to GeForce FX if not earlier.
He already made his point with 7900 Inferno article. Does he need to continue with headline
"Good GeForce 7900s" It is fine to have this topic in the forum for discussion and to collect more info but does he need to put on his homepage on June 1st?
He has good relationships with one or two AIB manufacturers though, not nVidia! ;)
But whatever, that's irrelevant. He expresses his opinion on things a bit too often and is quite vocal sometimes, but there were never any "planted" misinformations or cheats etc. on his side that I could think of.
I guess the numbers is just an estimate.
But I'm sure it's not related to GPU core undervoltage. My BFG 7900GT running 1.2V and can't be more stable.
NCIX Admin says RMA rate is low and normal but posted this today :
"Just a note about recent shipment quality.
While some earlier batches of EVGA 7900 GTs had higher than normal RMA rates according to EVGA (if your board has no visible problems, it's all OK!), they asked us to send back old batches for re-testing and re-certification.
As of earlier this week, all 7900 GT products from EVGA are from the "new" replacement shipment and should have no problems at all.
As usual, EVGA was very proactive with us in making sure even "higher than normal RMAs" issues were fixed even before there were more problems. Kudos to EVGA on this..."
http://forum.ncix.com/forums/index.php?mode=showthread&msg_id=1102276&threadid=1102276&forum=101&product_id=17950&msgcount=1&overclockid=0#msg1102276
vBulletin® v3.8.6, Copyright ©2000-2013, Jelsoft Enterprises Ltd.