g__day said:
Am I off base here assuming a 200M transistor GPU that switched at several gigahertz would need a staggering amount of current to drive all those active transistors?
Think of how much current a modern GPU needs today just to switch those transistors at 400MHz.
The above indicates a significant misunderstanding of how a chips Mhz rating relates to transistor switching speed.
Transistors in nodes below 90nm can switch at almost 1 Thz (1000Ghz), and this is the same for CPUs and GPUs, though Intel and AMD's are faster and more highly tuned.
http://www.reed-electronics.com/electronicnews/article/CA185610?pubdate=12/10/2001
Transistors are in the hundreds of Ghz in switching speed today. The transistor switching speed is only part of the clock frequency of a chip, and a small one.
A GPU runs at 400 Mhz, not because that is the speed that its
transistors switch, but the clock rate at which a
pipeline stage completes doing whatever work it is meant to do in a single clock. Each pipeline stage has many cascaded transistors (and wire delays) and completes a task of larger scope than one switch turning on or off.
The larger scope task is typically doig some sort of calculation, or retrieving or writing some data.
g__day said:
I presume a CPU can reach such high clock speeds today because it is far less parallel than a GPU; in peak load at any moment I believe only 5% of a CPU's instruction transistors may be in use, for a GPU that figure can be 95% of all transistor in use. That 20 times more transistors in use means 20 times more current demand and alot more than 20 times more heat produced. I thought heat management of all the active transistors is the key reason GPUs are clocked 10 times slower than a CPU, so they only need * 20 / 10 = about twice the power demands, and thereby generate a manageable amount of heat.
Nope, the reason is nothing of the like.
A single CPU pipeline stage does very small work compared to a GPU. A CPU pipeline stage might add two 64 bit numbers together. A multiply will take 4 clocks (on a K8 core, for reference). A (L1) cache read may take 3 clocks on a CPU.
Clearly, if all you have to do is add two numbers in a clock cycle, you can run at high frequencies.
What does a GPU typically do in one clock? Maybe a full dot product of two 4-vectors (8 multiplies and three dependant adds). Or a texture blend (bilinear filter = a handful of adds and multiplies). Both of these tasks are larger than a single add by far and should be expected to take longer, meaning lower clock.
Quite simply, GPUs are lower clock because they are doing more work per clock. The transistors are switching at similar speeds to a CPU, but there are just more transistors cascaded together in a row in a single pipeline stage. Per clock, more transistors have to switch.
Although parallelism does play a role in the design differnces between a CPU and GPU, it is not a direct affect on clock speed. Otherwise, you'd see chips with half the pipelines running at twice the clockrate. But that isn't the case.