I figured I'd link to this post concerning a Naughty Dog presentation that contains some discussion of the PS4 CPU and job system.
http://forum.beyond3d.com/showpost.php?p=1832042&postcount=134
The job system and CPU descriptions go with 6 cores being available.
There's some weirdness with the memory hierarchy description, for which I think there is some missing context.
There's discussion of the 2MB L2 for Jaguar, but then the presentation goes on to use split that suggests 1MB chunks, which I do not know the reason for.
That aside the latency numbers from L1 to L2 to DRAM are 3 to 26/190 to 220+ cycles.
A comparison with the Vgleaks numbers for Durango is 3 to 17/120 to 140-160 cycles.
I suspect there is a discrepancy with what is being measured for the L2 numbers, such as whether the ND presentation is going with general data for Jaguar and Durango is using best-case numbers.
Other measurements agree with 26, and Microsoft has stated they didn't mess with the clusters themselves.
The L2 sharing scenario may also be a worst-case number for Naught Dog's presentation, as Durango has something between 100 and 120 depending on how far up the other cache hierarchy you need to go.
Microsoft did claim they did more to update the interface used to share data between the L2s, which might explain some of the disparity.
Not knowing what exactly is measured, and the fact that reporting latencies in cycles leaves an uncontrolled variable, making a definitive comparison is not quite doable.
One thing I do think notable is a measure of the efficacy of AMD's on-die interconnect and the alleged burden of external GDDR5 or DDR3 memory.
We can sort of get a ballpark figure for the DRAM subsystem's latency contributions by looking at the difference between the remote L2 hit scenario and a miss to memory.
For Durango, it is 120 for a remote hit, and 140-160 for DRAM.
For Orbis, it is 190 for a remote hit and 220+ for DRAM.
DRAM has worst-case scenarios that likely fall outside the numbers indicated for both platforms, but the less pathological cases may be what is being reported.
Both get 30 or more cycles for DRAM, or 14-19% of total memory latency.
The in-cluster hierarchy takes another chunk of roughly the same size.
Something over half is taken up by on-chip broadcasts and interconnect traversal.
GDDR5 vs DDR3 is very much in the noise.
The heterogeneous memory subsystem of AMD's APUs is not breaking the overall trend from Llano, Trinity, Bobcat, Kabini, Kaveri, etc. in terms of latency.