So as we can see, the recent history of CPUs shows them growing FLOPs by a staggering ~2x, while GPUs scaled at a mere 4x on an inconsequential order of magnitude difference in base capability.
Are you being sarcastic? The P4 was released in 2000, Nehalem in 2008. So by your metric, it took them 8 years to double it. Meanwhile, in your GPU comparison, it took AMD/ATI only 2 years to quadruple it. @3Ghz, that means they went from 12 GFlop to 96Gflop in 8 years, for an 8x increase. I don't even want to bother comparing the R200 to the Rv770. If you look at bandwidth in 2000 vs 2008 available to CPUs vs GPUs, it's a similar story.
The fact that CPUs must operate with commodity DIMMs is not really relevant, it's purely a manufacturing decision by the PC industry that hamstrings performance at the cost of flexibility. The vast majority of consumers don't really care how the RAM is wired up. Lots of people buy iPhones and Macs with non-removable batteries, or hard to change RAM, people buy consumer electronics goods, and consoles with hardwired manufacturing techniques. Laptops seem to be extremely popular these days and people often just buy a whole new machine rather than try and upgrade it. There is an assumption that what people love about PCs is how interchangeable the parts are, but I think that assumption really only applies to a niche market, and that the broad market for computers could be very vertically integrated (like Apple has done) without most people giving a shit.
The reality is, an ALU is cheaper than a CPU core, and if your problem is embarrassingly parallel, then packing in the ALUs is better than throwing in more general purpose cores. For most of the workloads modern PCs do (outside games and multimedia), there is very little gain from additional CPU-core level parallelism.
Moreover, with the move cloud based computing and the web browser dominating the CPU time of most PCs out there, the purely single-threaded nature of Javascript and the browser core means a lot of power is simply going to waste. You could put 64 Nehelem cores on a chip, and it wouldn't speed up the subjective latency of most applications that people use on a daily basis, but scaling GPGPUs certainly does lead to a very measurable difference in games.
Thus, I would say, scaling CPU cores these days matters more for server environments, like Google's data-centers, and for the client-side, at this point, no one will notice much of a difference except on the few applications that tax a system, e.g. games, multimedia, content-creation apps.