SilentGuy your arguments are all based on some sort of weird argument that "the chip may have a large number of transistors and interconnects globally, but this doesnt affect the operations of transistor and interconnects locally".
Yes, that's exactly what I'm saying, because it's my practical experience. (It could be, of course, that design and the laws of physics for GPUs are entirely different, in which case I have to pass.)
Now it doesnt really matter if there are tons of complex busses on the other side of the bus?
Yes, that's entirely correct. We don't give a shit about those buses interfering with other signals. It's a back-end problem:
The presence of a lot of buses in a design can give you major headaches wrt routing congestion: your logic density decreases in congested areas, which increases the area of a chip, which increases cost. Tough. But that's about it, and you plan for it anyway during the architecture phase by using lower fill factor estimates.
Signal integrity problems are largely orthogonal to this kind of issues, because they happen on very small scales: an aggressor net of a chip (=a net with a very big driver) can impact victim nets that surround it, but the distance over which it happens is really small: it's typically only the nets that are placed right next to it, and even then the only problem nets are those which are far away from their driver AND that had critical timing at the same time.
In the past, the problem was finding those nets, but that's standard practice now. Fixing them was and is easy: you insert another buffer on the victim net. Or reduce the driver of the aggressor net and add a driver in the middle. Or you do a small routing tweak where you flip 2 wires, so the victim net gets farther away from the aggressor. 99% of those fixes are done automatically by the tools.
Signals just have to pass through these complex busses.
Exactly. Do you have a problem with that?
Edit: I have this little feeling that somehow you're under the naive impression that buses are treated different from other signals during the back-end phase. Typically, they're not. Per macro block, back-end gets a gigantic, often flattened, blob of nets and cells and that's about it. Most of the time, they don't even know what the blob is doing, but if they do they don't care. At All.
For top-level routing purposes, it can happen that certain channels are reserved for buses to cross a macro block. That's fine. It just means that cell placement in those channels will be at lower density, but even then bus signals are not treated differently from other signals.
And handtuning can't do no better cause it just cant overcome the law of physics. For a given integration scale, if you gonna make a transistor work at a higher frequency you going to need higher power. If you make the transistor at 4 times a frequency you are going to need 4 times the power. So for a G80 if it was to double via hand tuning the frequency of shader cores and quadruple the frequency of the rest of the chip (ROPs+TMUs+Memory controller) it was goind to need at least 3 times the power assuming that the shader ALUs and the rest of chip are equal in transistor count, though if i remember correctly shader ALUs arent more than 30% of the total transistors. Now 3 times the power makes for about 300W+ for just the graphic card. What cooling do you need for this?
Duh. So, basically, all you were initially saying is that GPU's have to take power into account? Why didn't you then just write: "GPU's have to take power into account" and leave it at that???
This has nothing to do with the implication that high frequency is not possible
because it has a lot of transistors.
High frequencies by themselves are not a problem, even on large area designs, as long as you only use it where it makes sense.