No one said 9W at theoretical peak, french toast implied 9W while running real games.
Anandtech's scenario was running a game then running Coremark in the background. That's not simulating a real load. That's pathological. Their power consumption numbers show that the game isn't coming anywhere close to heavily loading the CPU. This won't change with Tegra 4. Even if the OS scheduling is bad at dealing with powering off cores clock gating will still bring down a large amount of the power consumption on even if there's a "heavy" thread driving the clocks up decently.
I doubt you'll be physically able to use the four cores at peak 1.9GHz because that's how nVidia did Tegra 3. These aren't Intel or AMD CPUs that have automatic turbo capabilities that are regulated by on-chip logic. They're not going to trust the OS to reliably regulate it for short bursts only.
Actually, it's not as pathological as you may think. It's throttling the CPU as well when only running the game. It's not quite on the level of the synthetic test they did with the multitasking situation, but if you look at only the game run...
http://www.anandtech.com/show/6536/arm-vs-x86-the-real-showdown/12
You'll see that the CPU spikes up to 2.5 W but then throttles back to 0.5-1.0 W whenever the GPU requires 3.5-4.0 W. Basically the same behavior as you have with the synthetic multitasking case. It's not as extreme in this case but it does show that the CPU is generally throttled in favor of the GPU in order to maintain a 4-5 W TDP which is what the synthetic test was showing.
But you also see this in the synthetic test...
http://www.anandtech.com/show/6536/arm-vs-x86-the-real-showdown/13
If you look only at the Modern Combat section (with the yellow bar), you'll notice that the CPU spikes up to 4 W, but is rapidly throttled when the GPU ramps up to 4 W. Anytime the CPU spikes up, the GPU is simultaneously reduced. Or it could be that anytime GPU load is reduced the CPU is allowed to spike up as it is no longer as heavily throttled.
So, it's true that the hardware itself has a max TDP of about 8 W, but that you will almost never see that as the CPU and/or GPU are generally heavily throttled if one or the other approaches the throttle limit.
So it's not like the game doesn't need more CPU power than it uses, but that it isn't allowed to use as much CPU power as a developer might want if they also want to use the GPU.
Hence, you'll never really see AAA level games on mobile devices that potentially push the CPU while simultaneously pushing the GPU.
Something like Crysis 3 or StarCraft 2 would never be able to exist on something like this no matter how theoretically powerful it is for example because they both push the CPU and GPU fairly equally. As in this case, I'd imagine either the CPU or GPU would get throttled down or both would get throttled to half its theoretical performance. Imagine what would happen with these games if the CPU was throttled everytime the GPU was pushed. It'd be a disaster.
Which then makes you realize why there's no good benchmarks. If you bench the CPU and then bench the GPU that doesn't give any indication of what the hardware will actually do as if you push both, the performance of the CPU and GPU (combined) will not be able to reach the theoretical limits. And if you bench both, then you come down to what hardware throttles CPU and/or GPU preferentially? How quickly do they throttle? Does it deal gracefully if it has to throttle both? Etc. If hardware from IHV X throttles GPU more heavily than CPU while hardware from IHV Y does the opposite, what do the benchmarks even mean when theoretically they are both equal if you bench the CPU and GPU separately? Does it even then reliably equate to application performance? If App C uses the GPU slightly more than CPU while App D uses CPU slightly more than GPU, they'll perform completely differently on IHV X hardware versus IHV Y hardware.
Things like that don't happen for PCs unless the hardware is just really bad (CPU/GPU throttling) or people are overclocking way too much. Hence benchmarks on PC can reliably be used to gauge performance across different types of hardware.
Regards,
SB