What Causes Most Xbox 360 Red Light Errors ? (RROD)...
- Xbox 360 Flexing Zones Caused by Chassis Design and X-clamps
At xbox-experts.com we discovered that the standoffs and outer lip of the metal chassis in which the motherboard rests were not completely level with each other. The 2 standoffs are exactly 0.75mm too high, this might not seem like much, but since they are supposed to be 3mm high this is about 1/4th more which is definitely noticeable with a straightedge or level. In addition to that we also discovered that middle area of the chassis where the X-Clamp bolts screw down are about 0.5mm lower than the rest of the chassis, causing the mainboard to be pulled down in the center as soon as the screws are tightened. This puts the entire mainboard under extreme stress which explains pretty much why there is such a wide range of errors that are not related to the CPU or GPU. This mainly applies to older units as we have seen some chassis revisions in which MS corrected the error themselves (indication they knew of the problem and it did indeed exist). It is advised to use three 1mm thick washers to measure your standoffs first to check if you have a revised chassis or not. An easy way to spot it is that the older faulty ones usually have rounded tops and the newer fixed ones have flat tops. This flexing could also explain why certain errors occur more frequently than others. The flexing zones can be divided into three parts...
Zone 1:
The most frequent error codes are 0102 (0100;0101;0103 as well)and 0020. These are GPU and CPU related errors mainly. 0020 can also be caused by the RAM in rare cases. About 80% of the errors fall into this category, as the solder balls under the CPU and GPU experience the most flexing of all. This is caused by the x-clamps flexing upwards in the area directly under the cpu/gpu, all concentrated on small points extremely in addition to the natural flexing caused by the metal case layout. Zone 1 is the area right under the CPU and GPU.
Zone 2:
The next most frequently occurring error codes are E74, 0022 and 0110, which are usually either RAM or ANA/HANA chip related. Sometimes E74 and 0022 can also be GPU related depending on the trace. Since the RAM and ANA/HANA chips are close to the GPU they can be affected by the Zone 1 flexing, in addition flexing caused from the two standoffs. These may not occur as often as the Zone 1 errors, but still make up approximately 12.5% of the overall error codes.
Zone 3:
The Zone 3 area experiences the least amount of flexing versus the other 2, but can still make up about 7.5% of the error codes. The related error codes are E73, 0021, E79(if hardware related), E71(if hardware related). The components effected by Zone 3 flexing are the Southbridge, ethernet chip, NAND and the entire motherboard to some extent. Since the NAND and ethernet chips are not BGA (ball grid array) chips like the Southbridge, GPU, CPU, HANA, etc, they are a bit more resistant to the flexing.
- Xbox 360 Overheating / Thermal Runaway
Many of the 3 red light errors are blamed on overheating due to heat buildup caused by thermal runaway. Microsoft tried to combat this issue by installing better GPU heatsinks with added heatpipe attachment which didn't help much as the problems still persisted. Eventually MS got sick of the red rings and came out with the SLIM model which has an entirely new thermal design. The SLIM models seem to handle the heat a bit better, but still suffer from overheating and insufficient air flow.
In 2007 thermal design experts Naoki Asakawa and Mayuko Uno from
Nikkei Electronics in Japan analyzed the 360's heat radiation system in early models to determine if overheating was a problem or not. Some of their findings...
- The airflow cooling the heat sink is proportional to the cross-sectional area of the flow path, and in this case the cross-sectional area for the graphics IC heat sink was only about one-seventh that of the microprocessor. "Almost all of the air pulled in by the fan is used to cool the microprocessor, it looks like. They've made some effort to increase the cross-sectional area by widening the heat sink, but it doesn't look like it's very effective." "In PCs it is common practice to enclose the heat sink in a duct. There might not have been enough space available in the Xbox 360, but the duct stops just short of the heat sink. The heat sink is instead enclosed on top by the DVD drive, the case, etc. If the duct should happen to be dislodged in transport, for example, the airflow cooling the heat sink would drop significantly."
- There was a temperature gap of 22C between the exhaust and room air, "When designing consumer products, it is common to seek a temperature gap of around 10C between exhaust and room temperatures," the thermal design expert said.
- The maximum wind speed of the exhaust air is only 1.1 meters per second, only 1/2 to 1/3 compared to normal desktop PCs produce. The expert noted, "The amount of switched air is slightly in short considering the chassis' size (309 x 258 x 83 mm3)."
- It takes only 5 minutes of gaming for the GPU heatsink to reach 70C, a thermal gradient of about 10C/min and after 15 minutes of play, the GPU heatsink can reach temps near 100C.
- The heat sink temperature for the microprocessor was stable at 59*C, but the heat sink on the graphics IC reached 70*C within only five minutes of starting the game. The incline was about 10*C/min, and by 15 minutes it reached 80*C, representing a difference of 57*C from room temperature. Assuming a summer room temperature of 35*C, estimates indicate that heat sink temperature would exceed 90*C, and IC temperature might well exceed 100*C.
- The airflow cooling the heat sink is proportional to the cross-sectional area of the flow path, and in this case the cross-sectional area for the graphics IC heat sink was only about one-seventh that of the microprocessor. "Almost all of the air pulled in by the fan is used to cool the microprocessor, it looks like. They've made some effort to increase the cross-sectional area by widening the heat sink, but it doesn't look like it's very effective."
- When the IC, board, etc, reach excessive temperatures, the difference in the coefficients of thermal expansion cause board warpage, which in turn applies severe stress to the periphery of the ball grid array (BGA) connecting the two. Repeated exposure to elevated temperatures would cause cracks in the solder balls from heat fatigue, leading to failure.
- Lead-Free Solder and Improperly Reflowed Solder During Manufacture
The next issue many people blame problems on is the use of lead-free solder and "cold" solder joints because of its use. Starting July, 2006, the E.U. set strict environmental guidelines called the RoHS Directive, which banned the use of lead in any products marketed towards children. For nearly 50 years, the standard solder used was a tin and lead combo which had a melting point of around 183C. The new lead-free solder now needs temperatures of at least 217C. In fear of damage from over-heating, it is speculated that Microsoft's engineers most likely opted for the low-end of temp profiles needed for re-flow.
Seattle PI's
"Digital Joystick" interviewed an inside source who has worked on the Xbox 360 project for many years who stated that the...
"RROD is caused by anything that fails in the “digital backbone” on the mother board. Also known as a core digital error. CPU, GPU, memory, etc. Bad parts, incompatible parts (timing problems) bad manufacturing process (like solder joints), misapplied heat sinks or thermal interface material, missing parts, broken parts, parts of the wrong value, missed test coverage. Any one or more, on any chip, or many other discrete components, would cause this. And many of the failures were obviously infant mortality, where they work when they leave the factory and fail early in use. The main design flaw was the excessive heat on the GPU warping the mother board around it. This would stress the solder joints on the GPU and any bad joints would then fail in early life."
"Some defective parts, like BGAs where the solder balls are not of sufficient and uniform size, so they don’t solder down evenly, or the substrate is warped, causing some joints to have insufficient solder. Bad chips from marginal or under tested wafers. Others are deficient processes, like misaligning the solder paste to the circuit board, or same on the parts, or not having the thermal profile right in the reflow oven during soldering."
"Manufacturers new to PB free tend to err on the low temp side thinking they are saving the parts reliability wise from a large thermal load. What they are really doing is not reflowing the PB free solder enough to make a good joint. PB free solder is non eutectic, which means the different metals in the solder alloy melt at different temperatures, unlike leaded solder where everything melts at the same temperature. If you under heat it, it won’t bond well to the board or parts, won’t form a good joint, leaving voids and other defects in the joints that lead to early failure under normal circumstances. But when you add the extraordinary heat and mother board warpage that goes with it, well you get a catastrophic failure rate like we’ve all seen on 360."
Reflow experts from
Manncorp did an extensive investigation into the quality of the lead free solder joints in the 360 and found that with an x-ray they could actually see solder balls that did not look like they had been re-flowed properly in the first place. On a pair of x-rays from the GPU, different sized solder balls are clearly visible, which is an indication some spheres were not completely reflowed during manufacture....
Manncorp also noted, "While a "cold" solder joint may provide an adequate electrical connection, long-term reliability is jeopardized, especially in application where the solder bonds are subject to wide temperature fluctuations. In such an environment, continuous expansion and contraction of materials with varying thermal coefficients will quickly destroy the integrity of a "cold" solder joint, creating intermittent problems or even complete failure. This is precisely the environment of the Xbox 360 motherboard, due to the high amounts of heat generated by the CPU, GPU and memory components when running graphics-intensive gaming applications."
The guys at
bunniestudios decided to send in a 360 for failure analysis to
MEFAS for solder joint inspection on the GPU through a process called “dye and pry”. In this process, the motherboard is flooded with red ink, and then the GPU is mechanically pried off the board. The red ink flows into any of the tiny cracks in the solder balls, and at least in theory, when you pry the GPU off the cracked regions will shear first so you will be left with visible red spots at the points of failure.
Normal Solder Joint:
Here is one of several balls on the GPU that exhibited signs of partial failure, showing there was some “voiding” seen in the balls, e.g. trapped gas bubbles inside the solder balls that might serve as starting points for mechanical failure :
- Bad Heatsink Mounting due to X-clamp Setup
In many units with 3 red lights or no video problems, we noticed that the GPU heatsink looked to be sitting slightly crooked with one side 1mm or so raised above the other. The leads to extra pressure being applied at one corner and not enough in the opposite side. The x-clamp setup allows the heatsink to sit unlevel because of its free floating design. Then they use way too much thermal paste which they apply to the heatsink first, then secure the heatsink with the x-clamps.
The x-clamps essentially act like a little prying device that stresses the connections more and more with each thermal cycle until the already poor solder connections are broken. The pics below were from a "no video" unit....
- Sloppy Thermal Compound Application....
Almost every unit from the factory has a very sloppy application of thermal compound. Since the x-clamps have a hard time keeping the heatsink level, the compound oozes out all over other components, possibly causing risk of a short. Also since the heatsink is not flat and level, it has a horrible thermal connection with the chips leading to overheating. In the pics below you can see how they used way too much compound that spread out over surrounding components. Notice all the air pockets that were formed which would act like little insulators. Also the compound in general is very dry/crusty and not likely to be very high quality...
As you can clearly see, there are many factors which contribute to the xbox 360 3 red light error problems. Controlling heat and replacing the x-clamp retention system is the first step to take to protect your console. The "screws & bolts" x-clamp fixes are usually very temporary and still allow the motherboard to flex over time. They also do not address the overheating issues like the Hybrid eXtreme Uniclamp system does. eXtreme-Cool 360 thermal compound is super easy to install and provides the best thermal interface connection possible. Working units can benefit from using the kits by preventing further flexing/damage and fixing the problem before it starts. Most 3RROD units can be easily repaired in less than an hour, although a few stubborn units will require a reflow for proper repair. After a reflow, install the x-clamp repair kit and prevent the problems from ever coming back. Make sure to get the real Hybrid eXtreme Uniclamp X-clamp repair kits made by xbox-experts.com