IMHO (layman perspective):
Series 6000, Bulldozer or even Larrabee are not the biggest threat for Fermi regarding GPGPU, but Fusion.
Fusion is said to have 480 ALUs and ~1 TFlop SP performance. Therefore the DP speed should be around 200 GFLops. On top, Fusion has access to the complete main RAM and 4 K10.5 cores.
If AMD made no big mistakes and included support for ECC and more than one HT3 interface it should be simple to plug 2 - 4 Fusion on a Server board and voila HPC performance en mass. Bandwidth could be a problem though.
The big problem there is that HPC apps tend to be written for CPU
or for GPUs, and not some kind of load balanced simultaneous combination. So for a GPU-centric app, it's likely that any significant die area used for a serial CPU would be wasted, and conversely for a CPU-centric app.
Even for an application which has been efficiently ported to both domains (something like Folding@home, perhaps), it's likely much more efficient in one domain than the other. Fusion won't be great for Folding@home, since it's so much faster on the GPU, so it just prefers a full undiluted GPU.
What kind of apps will Fusion boost over a lone CPU or lone GPU? Apps that need to rapidly send data from serial code to parallel code and back with little overhead. This isn't an obvious niche, though perhaps sparse matrix multiplies might be an application. One can argue that Cell is more like this tight connection between serial and parallel processors, and we've all seen how difficult it is to use to its full theoretical potential.
Fermi, with its more generalized kernel handling and parallel kernel scheduling is likely a more practical approach. The use of parallel kernels (and faster synchronization via on-chip atomics) will help with the scheduling complexities that current GPUs depend on CPUs to handle now.
So the lack of on-chip CPU cores is likely not a big deal. Lack of direct access to system RAM is a bigger problem, though in many (even most) apps it can be minimized as long as the GPU has enough RAM of its own. The other bottleneck is intercommunication between multi-GPUs which has a similar and related bandwidth issue. PCIE is very fast, but it's not fast enough for these modern computing approaches.