Isn't that a little extreme? forgive my lack of proper knowledge on this subject, but aren't CPUs have separate Integer and FP units(especially intel's) why share the same execution port? just to simplify scheduling?
Adding execution ports increases complexity for scheduling and instruction issue. The scheduling logic needs to be able to send out as many operations per clock simultaneously as there are ports.
A unified scheduler has a bigger set of operations it needs to scan through, and more places to put them.
A non-unified scheduler would tend to split things right where they are now.
The gains are also limited by the ability of the core to provide operations (decode/rename) and absorb them (bypass/retire).
Adding ports adds expense and can compromise the OoO engine's clock ceiling, and without expanding the pipeline sections before and after--some of which can be expensive to scale and require even more elements of the pipeline further away to scale up--not much would come of it.
And my original question is about why can't CPUs exploit the FPU unit at it's disposal to maximize integer throughput? by running integer code on the FPU unit when fp code is not needed.
The FP unit's pipeline, latencies, and exception behavior are different from what the INT units need. It does simplify scheduling if the scheduler doesn't need to mix the requirements of the two.
Using the FP unit as an extra integer unit also means running against the separation of the register files, instruction issue, and bypass networks.
There are latency penalties when moving data between domains, but they exist because FP and INT units generally don't need to send results to each other that much.
The FP and INT units for these cores also don't have a direct means of reading from the register files for each type without an explicit operation moving the data from one to the other.
Accepting the penalty for crossing means each side has its own register file and result forwarding.
That's a bonus in terms of giving more registers without expanding register identifiers, and the register files and bypass networks can grow more independently. Two full-width domains can do much more for their own data types, while combining them can mean less performance in aggregate because a combined scheme puts pressure on expanding resources that can scale quadratically in cost.