ON PS4 Onion+ bus which shares Onion's 10GB/s and bypasses the GPU caches, is coherent. Onion (20GB/s, 10GB/s read and 10GB/s write) Which snoops CPU L2/L1 caches isn't coherent. Onion+ and Onion run over the same I/O controller so they can't access it at the same time. Also CPU bandwidth is 20GB/s on PS4.
On XB1, the GPU has a coherent read/write path (30GB/s) to the CPU’s L2 caches and to DRAM. The CPU requests do not probe any other non-CPU clients, even if the clients (GPU for example) have caches. Coherent read-bandwidth of the GPU is limited to 30 GB/s when there is a cache miss, and it’s limited to 10–15 GB/s when there is a hit. CPU bandwidth is 30GB/s on XB1, they beefed up CPU bandwidth.
The PS4 solution is exactly like the Kaveri.
http://forum.beyond3d.com/showthread.php?t=64406
http://share.csdn.net/uploads/5232b691522ba/5232b691522ba.pdf
http://pc.watch.impress.co.jp/img/pcw/docs/632/794/html/06.jpg.html
http://pc.watch.impress.co.jp/docs/column/kaigai/20140129_632794.html
You don't need 2 more DMA for data transfer (AMD PC GPUs use 2 of them) and two of them are useful for texture streaming. Two of display planes are for games and one of them is for system (PS4 has one for game and one for system). Kinect uses most of audio block right now but in future they could free up part of the resources (DSPs) for games (SHAPE is accessible by games).
Awesome links thanks, that makes things much clearer.
My only question around this is how that relates to this slide:
I'm probably just misunderstanding something here but doesn't that slide suggest that the CPU and GPU caches are coherent where-as they are not if the CPU can't snoop the GPU cache?