The description of the FP unit shows support for FMAC and 64-128 bit maximum operand width.
From a silcon point of view, we have a total of 4 INT units per core, up 25% from the 3 in a current Opteron core.
The FP unit is going to support operations that could force its size up at least by that much. An FMAC would require at least 50% more operand bandwidth, and the bit width could be enough to bloat the FP unit up as well.
The proportion of idling silicon isn't massively changed or it could be even more slanted in favor of the FP unit.
I think it could be that the design isn't sharing a deemphasized FP unit, but instead it is balanced around several critical resources, some of which might be more related to a much more powerful FP unit than they are for integer execution.
Clustering points to a certain amount of deemphasis of peak integer execution.
Highest peak would be a big expensive 4-way scheduler and a big expensive crossbar servicing all 4 integer lanes.
AMD has cut these into two half-sized entries. This is actually a net savings, as a lot of common circuits for superscalar issue scale quadratically with peak width.
The front end has been increased signficantly. It's 4-wide, but if AMD uses the same symmetric decoder, it is significantly more expensive to implement than the complex-simple-simple-simple scheme used by Intel.
The rename stage works in terms of 4 instructions, which is also expensive.
It then feeds, however, integer clusters that are physicaly incapable of that kind of throughput.
As a result a very expensive front end is amortized over more threads.
The slimmer integer clusters with private schedulers can also do more speculation, since they do not speculate over as wide an integer pipeline.
Other patents hint at attempts to reduce the complexity of the integer register file.
The cache bandwidth is also much higher. 4 data cache loads in total doubles what Opteron can do.
However, each integer cluster has access to only one L1 capable of two loads.
The FPU, however can hit both, which is something that data-hungry FP really needs.
The FPU, being separate, can also go with less speculation that doesn't benefit is as much, and may also have more register ports to support FMAC.
Peak single-threaded integer performance would be increased, if clocks and other things oblidge, but the FP unit looks like it might be the big winner.