Arun, your arguments are correct but you're making the general mistake of only looking at the high-end. The average system sold today does not come with a GeForce 8800, but it does have dual-core.
And they'll play old games that already use the CPU for physics anyway. Your arguement only stands if this dynamic remains true going forward, which it very possibly won't IMO (more below).
In fact more than half of all systems are laptops
I don't know where you get your stats, but I suspect you should reconsider your source. Even the most optimistic estimates don't expect that to happen before 2010.
Lots of people, including occasional gamers, are even content with integrated graphics or a low-end card.
There's the gaming market, and then there's the gaming market. I could have a lot of fun playing casual games and 3-5 years old games, but that's not what I'm doing. There will ALWAYS be a market for games that run on IGPs. I say IGPs because low-end cards will die within the next 2 years, as they'll become essentially senseless: if you look at AMD's and NVIDIA's upcoming DX10 IGPs, they're practically good enough for Windows 7 and for a few years after that. All you'll see after that are incremental increases in performance & video decoding quality, imo.
But it's not because there is a segment for games on what will be a $300 PC market that there won't be a market for games above that; and as has traditionally been the case, these two will be completely separate.
Sure we'll see multi-teraflop GPU's in the not too distant future, but we'll see the average system equipped with quad-cores sooner.
I have massive doubts that more the 2-3 cores makes sense in the 'low-end commodity PC market'. If a game aims at the low-end, it should aim at that, and that some artificial segment with average performance that nobody really fits in. Anyway, that's arguable, but to the next point now...
So basically the average system has a potent CPU, but a modest GPU. For games this means that the GPU should be used only for what it's most efficient at; graphics.
You assume this to remain true: once again, it will not. Rather than looking at the present, it might be a good idea to try and look at the future instead. In the 2010-2011 timeframe (32nm), dual-cores will remain widely available for the low-end of the market. These will be paired with G86-level graphics performance in the ultra-low-end, with probably a higher ALU:TEX ratio. So in that segment of the market, you'll see maybe 200GFlops on the GPU and 75GFlops on the CPU.
And indeed, I can't really imagine any circumstance where offloading the physics to the GPU makes sense there, but GFlops still aren't massively in the CPU's favor and this market would mostly play casual and old games; amusingly, given the performance of current GPUs in DX10 games, they would presumably still play DX9 games!
Now, look at another segment of the market: $120 CPU, $120 GPU, $60 Chipset. In 2010-2011, that would probably correspond to a 3GHz+ quad-core on the CPU side of things (with a higher IPC than Penryn). That's about 150GFlops, maybe. On the GPU side of things, however, you'll easily have more than 1TFlop: just take RV670, which manages 500GFlops easily on 55nm at 190mm2. It's not exactly hard to predict where things will go with 40nm and 32nm...
Only with high-end GPU's they achieve a good speedup, but actual work throughput versus available GFLOPS is often laughable.
Uhm, that's just wrong. In apps that make sense for their current architecture, and there are *plenty* of them, the efficiency in terms of either GFlops or bandwidth (whichever is the bottleneck) is perfectly fine.
The megaherz race is over, but the multi-core race has just begun and has some catching up to do.
Yes, it has a lot of catching up to do in terms of, as you kinda said yourself, politics and hype. You don't see to realise that when predicting the future configurations of PCs (i.e. are most consumers going to go with a $300 CPU with a $150 GPU, or a $100 CPU with a $350 GPU?) what maters is what decisions the developers take. If the CPU is never the bottleneck, then why would you want more than a $100 CPU anyway?
If physics acceleration on the GPU doesn't take off, then obviously you'll want more than a $100 CPU. But if it does, then who knows - and that's why NVIDIA and ATI are so interested in it. They want to increase their ASPs at the CPU's expense, and there is no fundamental reason why they cannot succeed. It's all about their execution against Intel's.
And if what happens is that GPUs capture more out of a PC's ASPs, then you're looking at $100 CPUs being paired with $500 GPUs. Heck, as I said I'm already a pioneer in that category - slightly overclocked E4300 ($150) with a $600 GPU. The difference in GFlops between the two is kind of laughable, really, and in this case GPU Physics would clearly make sense. *That* is the dynamic that NVIDIA and AMD are trying to encourage, and that's why it's a political question, not really a technical one (although perf/watt for CPUs vs GPUs for physics also matters).
I might be right, or you might be right, but we aren't personally handling any of these companies' mid-term strategies so I wouldn't dare claiming anything with absolute certainty given that it's not even really a technical debate from my POV. I do agree that a 100GFlops CPU is just fine for very nice physics, but I do not believe that closes the debate either.
So we'd have to see major changes in GPU architecture to make them more suited for non-graphics tasks, likely affecting their graphics performance.
Those are already happening and barely affecting graphics performance, as they are mostly minor things and the major things can easily be reused for graphics. There is no fundamental reason why this will not keep happening.
P.S.: I just thought I'd point out that I do NOT consider Larrabee to be a CPU here, but that obviously it might be a very interesting target for GPGPU/Physics. An heterogeneous chip with Sandy Bridge and Larrabee cores on 32nm, if the ISA takes off, ought to be much better than a GPU for Physics and other similar workloads either way (assuming there's enough memory bandwidth). So this is another important dynamic of course, and arguably much nearer to the discussion subject of this thread...