Are you saying some parts of the GPU are running in the gigahertz but because of how the deep the pipelines are constructed the chip outside the pipline is running in 100s of megahertz to counter say the propogation time for data to get all the way thru a 10 stage 1,000 megahertz?
I'm saying a single transistor can switch in a length of time equivalent to hundreds of Ghz. That is, 500Ghz = 1 / 2ps.
A transistor can take 2 to 3 ps to switch (333 to 500Ghz rate). However, a single pipeline stage takes a lot longer than the switching time of a single transistor, it is made up of a series of transistors and wires. The signal delay in wires between the transistors is important, and it is the total sum of switching and wire delays throughout the single pipeline stage that must sum ot a number LESS than the processor frequency or else it won't work.
I'm not sure that's true, as it determines how quickly a signal can propagate through the chip. Given infinite cooling, the time it takes for a signal to propagate through the longest pathway in the chip determines the maximum clockspeed of that chip. I claim that this is one reason why, even with extreme cooling, more modern processors clock higher than older ones.
If by "pathway" you mean a series of transistors with wire delays that make up a single pipeline stage, then yes. That determines the fastest frequency.
And the wire delays and transistor switching speeds get faster with each generation of manufacturing process. However, transistor speed gets faster at a greater rate than wire delays, and so wire delay is a larger component than in the past..
I really doubt this is true, for the simple reason that GPUs need to hide many cycles of latency to keep texture reads moving at a good clip. Also consider that the performance hit for state changes (which is, essentially, flushing the pipelines) on a GPU is huge (on the order of hundreds of cycles).
You doubt that a single pipeline in a GPU can do a bilinear filter per clock? I never said it did the WHOLE thing -- fetching the data and writing it out. I was talknig about one of the pipeline stages in that deep pipeline you talk about. One of the stages in a GPU does work on the order of a bilinear filter per clock. This represents the granularity of pipeline stage operation in a GPU roughly. In a CPU the pipeline stage granularity is about 1 64 bit add per clock (though there are many pipeline stages and the latency is much more than one clock, the throughput of a pipeline is 1 add per clock). On a GPU the throughput is much more than that, pipeline stages are larger.
Thus, with longer pipeline stages (and many many total stages in a pipeline) GPUs are lower frequency.
If power density were the only reason, you could slap on liquid nitrogen cooling on a GPU and run it at 2Ghz.
But there are limits to clock speed that are not heat related and GPUs run into those much earlier because they are doing more per clock.
It is as simple as that.
Note that the clock in a chip is a signal that kicks off the start of processing on a pipeline stage. It does not specify how fast each transistor works in a stage... each stage is essentially an asynchronous domain that is triggered by the clock and that must finish its work and put the resulting data onto the inputs of the next stage before the next clock signal comes around.
The length of time between these signals is longer on a GPU because more work is expected to be done (more transistors and wires between pipeline start and finish) per clock than on a CPU.