Ryzen has twice as much I$, which might make a difference when you have two contexts stomping around. It also has twice the L2 cache, and as important twice the associativity. Sky/KabyLake's L2 is only four way associative. Ideally you want to have three ways per context for when the code, stack and heap segments alias or you might end up evicting hot cache lines because you run out of associativity ways.
I'm curious what the implications may be for Skylake X, with some of the early reporting/rumors indicating that its L2 leapfrogs Zen's with 1 MB capacity. That would make sense with its expanded vector capability. Usually, Intel would be increasing the number of ways as its increases capacity, although one complication is that the capacity and associativity of the per-core caches raises the burden on an inclusive L3--which seems unusually small per-core.
The aggregate scheduling resources are also bigger for Ryzen than Sky/Kabylake, something that might make a difference once you're limited by LS throughput and have to schedule around extended latencies. Also, according to Agner Fog, execution throughput, AVX instructions excepted, is higher for Ryzen than any Intel core.
Ryzen's integer schedulers are segmented, and are individually relatively shallow compared to how Intel describes its unified scheduler. That's not to say that Intel's is necessarily without some kind of internal subdivision, however.
While we do not know the exact limitations of Zen's schedulers or its mapping stage, it would seem more likely to hit transient stalls on an unlucky confluence of dependences/hazards on a 14-entry queue within a window of 192 versus Skylake's 97-entry scheduler out of a window of 224.
On the other hand, a balanced SMT load mostly doubles the number of ops available before in-thread dependences can exhaust a scheduler, and might be more readily spread out between schedulers with the exception of the few operations like MUL and DIV that are not replicated across multiple ports.
More schedulers might also give more flexibility in forwarding and operand scheduling versus the higher costs of doing so in a unified manner, at least in the SMT case where half the operations' independence is trivially proven.
Agner didn't mention anything about AMD's segmented scheduling, which could mean it doesn't matter that much or his testing did not try to push it.
For single-threaded, Skylake's deeper scheduler could go longer, and I am not sure I'd bet against its branch predictor still being better than Zen's for having more valid uops in-flight for a single thread.
If Zen's predictor is provisioned well-enough, it's possible that cutting the distance of speculation in half per-thread might hide more of the diminishing returns even with marginally weaker predictors.