It's a (logarithmic) hockeystick curve. In the Pentium 4 days, performance improvements had become almost linear.
That was the case for clock speeds, after Northwood.
There was no hockey stick when it came to FP throughput.
The P3 to P4 transition was the same kind of vector bump as what happened between P4 and Core2. Given the significant clock bump Netburst had over the last P3s, the gain for vector throughput was more than double.
The general lack of peak scaling with Prescott (that is, until they went dual-core) is very much the same kind of lack of scaling between Penryn and Nehalem.
It's actually worse with Nehalem, since there are quad-core MCM Penryns, whereas the first Pentium D was a Prescott derivative.
The curve in FP throughput so far tracks with Moore's law, with stops and starts over 3-4 years.
Intel is basically due for another doubling of vector width, and that is what AVX offers.
In keeping with the trend the P4 established, we can expect core counts to scale later, although the lengths they will go with a symmetric solution beyond 4-6 cores for desktop look somewhat limited, going by the roadmaps.
FMAC comes in later, which will add a bump depending on how it is implemented.
What you're doing is going back to 2003 on the flat part of the curve, and connecting that with today. Indeed that's fairly unimpressive compared to how GPUs have been doing (though better than the flat part itself), but it doesn't mean that this is the (logarithmic) slope at which things will continue to evolve.
There's a good chance we can see this happen with AMD in 2011 with Llano.
Going by the speculative diagrams, we can expect an FP unit per core with an output of 8 64-bit results, which I'm interpreting to also allow 16 32-bit results per clock.
Half of those come from an FMAC unit.
If, and this is not quite certain from the diagram, the FADD and FMAC pipes can issue at the same time, we could see the quad-core Llano yield 96 SP ops a cycle.
Otherwise, it's still a nice 64 a cycle.
The latter is in keeping with Moore's law, the former is somewhat better, which is good growth for CPU cores.
Going by the roadmap and assuming good AMD execution, I think the Llano chip may potentially double or triple that FLOP count.