Considering what needs to be done to perform an FP multiply for instance, that makes perfect sense. And I don't think the FP16 capabilities of the GX6650 is a fluke either. The changes were presumably made because licensees find them functional and desirable. I would assume that FP16 is primarily used for graphics, and FP32 for "other codes" whatever the heck it may be that uses the FP32 capabilities of the GPU on these systems.
App developers want their products to look good and perform well, and they will use the available resources accordingly. Benchmarks however aren't under the same pressure, so this is an example where conceivably benchmarks don't necessarily give a good prediction of real world app performance. The same goes for other comparative material. One occasion where the issue of FP formats was raised here was when it was noted that Anandtech was only quoting FP32 FLOPs in their comparison tables, and when the question was raised as to why this was so, it was simply because that was what they used in their desktop GPU tables. No attempt to corroborate with actual use had been made.
My layman's speculative math for Gfxbench3.0 is only because:
1. It's one of the rather accurate GPU synthetic benchmarks out there.
2. Manhattan in particular is heavily ALU bound.
For example when QCOM stated before the 420 launch that it'll have 40% more arithmetic performance than their Adreno330, it was fairly easy to predict where the latter would land more or less in Manhattan.
Now the 6650 in Manhattan will most likely perform quite a bit higher than "typical" 150 GFLOPs FP32 GPUs, but on the other hand not even remotely close to it's 300 GFLOPs FP16 peak values either.
While I agree with your points above - and unless a game is heavily tailored to work well on Rogues only - I'd think that it could have the efficiency of a 200-230 (FP32) GFLOP GPU at best.
It would be interesting to know (if the IMG folks can spare another dime) if and up to which degree the GPU's resources allow FP32 and FP16 to run in parallel.