My math earlier was off, so a billion transistor chip is doable within the confines of the 65nm process at a die size smaller than G80.Fair enough. We're really have to see about what's going to happen. My belief is not that GX2-like solutions won't happen, but they don't necessarily mean that larger dies than R600 and G80 are impossible. I think we'll see both solutions for quite some time.
I'd expect that as long as SLI and Crossfire continue to have headaches, whichever GPU manufacturer can keep single-die solutions at a higher market segment will win out, if they don't lose money in the process.
Yes, but the wafer size and its effect on the area of silicon that can be produced by a fab has an impact on the costs of manufacturing a given die size.I'm walking a bit out of my comfort zone here, but there aren't really that many technology parameters that are related to wafer diameter. The reason to go to larger wafers has mainly 2 reasons, both economic:
- increase fab capacity: higher die throughput per handled wafer
- reduce the amount of unusable wafer real estate: this is important when dies grow larger. Even with current large die sizes, it's still not that much of a factor, though it's definitely part of some equation in some cost calculation spreadsheet.
My argument that larger die sizes than G80 and R600 are undesirable is a primarily economic one.
The cost per good die from a foundry goes up and variation starts to hurt binning out of that smaller pool of dies.
Lower binned chips can't be sold with good margins, even if the silicion itself is functional.
I tend to conveniently ignore these kind of issues. They are really not much of a concern for vendors who are using external fabs and standard cell design. So let me clarify: worst case electrical characteristic (which are always used to calculate the critical timing path of a chip) are really quite reliable, even in 65nm. Going forward, I don't expect major changes with this.
I suspect this is because fabs for one tend to add a certain margin of error exactly to make sure customers get what they expect.
I put too much emphasis on timings as opposed to the leakage variance.
On that note, G80 has parts that are not standard cell design, and R600 is an example where clock timings are likely good.
What has turned up is that AMD(ATI) with R600 has discovered what the CPU guys in every market but the extreme high end has known for years: that TDP and power draw is a first order limiting factor, regardless of circuit performance.
GPU price segmentation (and by extension the required binning) is odd.This is less of a concern for the CPU fabs of Intel and AMD, so they'll try to get closer to the limits of their process. As long as AMD will continue produce GPUs externally (which is obviously not a given), I'd like to stay with that model, and there I believe my argument still holds.
For reasons I'm not sure I am fully aware of, the number of speed grades a given GPU die can be assigned is incredibly small compared to CPUs these days.
CPUs have over a half-dozen speed grades per chip stepping that become products.
GPUs like R600 will at the end of its life have only 3, and one is a cut down version, probably due to defects.
It can't go higher because of power draw, while it can't go too much lower with the cheaper to produce RV cores in the way.
I'm betting there is a fair amount of selection bias going on with GPU fabbing that neither Nvidia or AMD will disclose.
The CPU side is little better, they don't really give details on binning, but the wider selection of products provides more data points.
That may be more of a market segmentation thing and an engineering concern with regards to fixed TDP brackets for marketability.If there are multiple timing skews for R670/G92/... it will be interesting to see how much the clocks differ from from each other. They are really close for e.g. the 8800Utra/GTX/GTS, so there is clearly not yet a problem.
If wires continues to play a larger role in the overall delay, I expect that variance in speed actually to go down. (Just like we're already seeing now.) You can't significantly reduce wire delays by increasing voltages.
It does help with crummy drive currents when you want that last GHz.
Intel does well on this account as well.
Most of those products don't seem need to push the envelope for performance or die size like top-flight GPUs, which in turn don't push the envelope on circuit performance like CPUs do.I agree that leakage variation can be quite high within the same process. Speed variation is much less so, once again keeping my more restricted rules in mind. Unlike the GPU world, there are lot of silicon products where all chips have to run at the seem speed. (Think cell phones, modems, TV chips, ...)
That sort of begs the question why AMD's going the other route was so much slower...I'd argue that this is more a matter of getting to market faster. Just like the 7950GX2 was a nice way to crash the R580 party while the next big thing (in the same process!) was getting ready back-stage.
I agree that as long as the design is highly repetitive internally, the debugging effort need only incrementally increase over a smaller design based on the same building blocks.Anyway, my main initial argument was that debugging chips with 1B transistors didn't have to be a major burden. We deviated quite a bit from that.
My concern is that the company might not make much money if the die size places the product at the wrong end of scaling trends.
The foundry isn't going to shield GPU designers from increased costs and fewer good dies per wafer start.
Once the inflection point where the costs in achieving a given level of performance versus the die area needed makes multi chip more cost effective, why not go multi-die?