So you are thinking in 2fpi/cycle for a especial FPU ? I see... I was thinking in using the VFPU in paralel for independent FP calculations.The FPU can only peak at 0.6+ GFLOPS: it should be no more than a standard FPU with FP MADD support ( FMA + FDIV ).
The VFPU would peakr at 2.6+ GFLOPS using a 4-way Vector MADD every cycle.
Edit:
Now that I think in it this is a stupidity...A FPU is added just for that, to not have to use the VFPU for simple fp calculations.