Jaguar seems to be better at everything except L2 latency (at which it's much worse) and clock speed. All the OoOE buffers in Jaguar are much larger, and it has a load + store units, instead of a load/store unit.
The numbers are difficult to compare as far as OoOE resources go but it doesn't look to me like they are "much" larger overall for Jaguar. Maybe slightly larger. A pity the realworldtech article only compares silvermont to saltwell, but not Jaguar.
So far I see both having integer PRF with 64 entries, though Jaguar being slightly more flexible (can use 33-44 entries available for renaming whereas silvermont is just using a fixed 32 for that). Store queues also have similar size (20 for Jaguar, 16 for Silvermont). Float PRF though seems indeed larger on Jaguar (72 entries vs. just 32).
Ok Jaguar seems to be able to have significantly more operations in flight (64 for int pipe, 44 for float pipe, both 32 for silvermont), and similarly the schedulers hold more entries (20 alu, 18 float, 12 mem vs. 8+8 alu, 8+8 float, 6 mem) while being unified (well separate for int/float/mem but not per execution unit) which should also help a bit.
So far I see advantages for Silvermont for L2 cache (while theoretically smaller for single-thread case the much better latency will make much more of an impact), and the smaller branch mispredict penalty (10 vs 14 cycles - nothing to sneeze at).
Jaguar has an advantage due to full store and load pipe (though as far as single load/store pipes go silvermont looks quite spiffy there), and there's some advantage in the simd unit because silvermont seems to retain the half-wide multiplier.
Let's compare some other hard numbers, Silvermont listed first:
L1I TLB: 48/4KB entries vs. 40/4KB + 8/2MB entries, both fully associative
L1D TLB: 48/4KB entries vs. 40/4KB + 8/2MB entries, both fully associative
L2 TLB: 128/4KB/4-way + 16/2MB/4-way entries vs. 512/4KB/4-way + 256/2MB/2-way (serving twice as many cores)
L1I Cache: 32KB/8-way vs. 32KB/2-way, both with 64B line size
L1D Cache: 24KB/6-way vs. 32KB/8-way (both can handle 16B load and store simultaneously)
L2 Cache: 1MB/16-way (13-14 cycles) vs. 2MB/16-way (25 cycles) (serving twice as many cores)
So Jaguar can better handle large pages, but overall these chips look like designed for similar throughput (per clock). Oh and Silvermont has better L1I cache, though I guess unlike on BD the 2-way associativity of Jaguar doesn't hurt that much.
I can't quite judge things like branch predictors, loop buffers etc when comparing these two chips which probably can make quite some difference.
But I guess you're probably right, the OoOE resources are somewhat larger in general for Jaguar so you'd think it should perform a bit better per clock (particularly on the simd side I think - on the int side I'm not sure if the better l2 cache latency, lower branch misprediction penalty (though that would depend on branch predictor quality) couldn't make up for that). In any case, performance of these chips should track much more closely overall unlike Bonnell vs. Bobcat (where you get from Bobcat annihilates Bonnell to about as fast and everything in-between depending on the code).