Bulldozer was built on the concept that integer instructions are more common and important than float, what changed now?
The ratio of FP to INT between Zen and Bulldozer in terms of uops is mostly the same. Zen has 4 INT and 4 FP per core, which Bulldozer's narrow INT and shared FPU wound up halving per core.
The originating idea for BD was more ambitious, with what we know as BD "cores" actually being inner clusters of execution inside of a larger overarching core's scheduler. More complex speculation, advanced multi-threading, and a much tighter critical L1D+EXE loop for very high clocks were the original motivations. Area savings, such as they were, would have been a secondary benefit.
Physical reality gutted the high clocks, and the Bulldozer line did nothing interesting, or at least nothing good, with the other motivations.
The second-order upsides to shared resources were measured to be as compelling as BD was found to be wanting.
All that aside, AMD was outplayed pretty seriously on the FP front, given the SSE5/FMA4/FMA3/AVX+noFMA mess.
Some items BD had were nice, but in general Intel had more influence and eventually what proved to be a cleaner and more extensible vector solution.
It all looks pretty good to me, at worst AMD should have a sound, solid design that they should be able to refine over time without having to "fight" it, as they did with Bulldozer.
Those discrete scheduling queues seem a bit odd, though.
At least initially, much of this seems to be on the order of a Haswell core, with SMT potentially closer in complexity to Sandy Bridge, with AVX, LS width, and memory speculation being the most notable areas where AMD has not shown features that bring it into parity with Haswell/Broadwell. That level of software equivalence is what I've seen mooted as a rough design goal in order to piggy-back on software generally targeting Intel.
The split integer schedulers looks like a possible power optimization. The IEUs are pretty generic in their support, with only certain instructions requiring specific lanes to be active. The znver1 patch indicated IE1 is the sole integer multipler, and IE2 is the sole integer divider. Call instructions can choose between 0 and 3. There's almost nothing referenced where the integer units need to work in concert like in the FPU.
Integer instructions that interact with the FP unit reside on 0 and 2.
An highly serial integer-only workload with little division could potentially ignore half the lanes, or more.
Splitting the schedulers may enable the more aggressive clock gating the IEUs have, and the core may be able to determine if specific units become less necessary for a stretch of cycles and gate them off. Perhaps other optimizations are possible if the integer cluster has banked resources for forwarding and the register file, where portions of those could be turned off if the schedulers detect that nothing their units need is valid due to a mispredict or the renamed registers they are linked to have been rendered obsolete.
The FPU is not really as capable of doing this due to the more heavy specialization and because multiple operations use multiple pipes at the same time.