AMD Kaveri APU features the Onion + bus like the PlayStation 4

I doubt the CPU on XBox can read or write at a full 68GB/s, 30 seems reasonable, and I doubt that 20GB/s would be much of a constraint to the Jaguar cores.
The 10GB/s coherent write limit is more of an issue if the GPU is involved as it can be a real constraint on compute, but it only impacts GPU->CPU interaction, and there are other ways to get the same effect, notably, have the GPU write to garlic and DMA the results to Onion.

ERP, but is correct to say that when Onion+ is involved, it left only teretical/peak BW of 10gb/s for Onion? Because if I hve understand well Onion+ BW is shared with Onion.
This seems to me a hard challenge x developers.
 
ERP, but is correct to say that when Onion+ is involved, it left only teretical/peak BW of 10gb/s for Onion? Because if I hve understand well Onion+ BW is shared with Onion.
This seems to me a hard challenge x developers.

Onion and Onion+ share a 10GB/s read/write bus.

What remains to be seen is what difference the two approaches make (10GB/s coherent read/write) and (30GB/s coherent read).
 
the only CPU link to the DRAM on the XBONE is the 30GB/s coherent link.

Sorry x the OT and the eventual mistake.
But from VGleaks table is seems clear the 68gb/s BW for CPU.

http://www.vgleaks.com/durango-memory-system-overview/

I read wrong?


Onion and Onion+ share a 10GB/s read/write bus.

What remains to be seen is what difference the two approaches make (10GB/s coherent read/write) and (30GB/s coherent read).

Also it seems 30gb/s Read/write ...
 
I believe that the other console you mentioned has 30 gb/s for coherent read/ write.
Non choerent should be more than twice that amount, 68 gb/s.

No
Xbox_neu_3-5145d885291b2272.png
 
The slide it declares <20gb/s second bidirectional, but only 10gb/s for write and only 10gb/s x read.
It could be something similar to the X1 esram bandwith? (i.e. to have 20 gb/s you have to write and read at the same time?)

For the PS4:
- the CPU has it's own <20GB/s bus.
- onion/onion+ are a separate bus with 10GB/s read and 10GB/s write (you could consider that a 20GB/s coherent read/write "bus", but that might not be entirely true). [onion+ is basically a "flag" to change caching]

For the XB1:
- there is a 30GB/s coherent read/write bus.

In terms of how much bandwidth the CPU is expected to use, AFAIK vgleaks 'borrowed' these docs from MS literature, so the figures are probably realistic (4GB read, 4GB write, 1.5GB write combined per module). [e.g. ~20GB/s for the pair, including coherent usage]
http://www.vgleaks.com/durango-memory-system-example/
 
Also it seems 30gb/s Read/write ...

It IS read/write, on the other thread someone has been trying to spin that the having no full cache coherency thus it means no coherent memory, or as someone like to keep on repeating, coherent read only.

Let's try this one more time, coherent memory means that a pointer can be passed from the CPU to the GPU directly. Whether you need flush the cache or not (cache coherency) is an implementation detail, in X1's case, the code decides whether GPU wants snoop into CPU cache or not, and that the CPU can't snoop other components' cache.
 
For the PS4:
- the CPU has it's own <20GB/s bus.
- onion/onion+ are a separate bus with 10GB/s read and 10GB/s write (you could consider that a 20GB/s coherent read/write "bus", but that might not be entirely true). [onion+ is basically a "flag" to change caching]

For the XB1:
- there is a 30GB/s coherent read/write bus.

In terms of how much bandwidth the CPU is expected to use, AFAIK vgleaks 'borrowed' these docs from MS literature, so the figures are probably realistic (4GB read, 4GB write, 1.5GB write combined per module). [e.g. ~20GB/s for the pair, including coherent usage]
http://www.vgleaks.com/durango-memory-system-example/

This is just an exemple of the memory usage (it is just an exemple) still is visible the connection with The ddr3 block. I suspect that a part from 30gb/s bus there is also a bus direct to DDR3 capable of 68gb/s bandwith.
The diagram is clear!!!

To me it seems "Onion X1" 68GB/s, and Coherent BUS 30GB/S
 
Last edited by a moderator:
This is just an exemple of the memory usage (it is just an exemple) still is visible the connection with The ddr3 block. I suspect that a part from 30gb/s bus there is also a bus direct to DDR3 capable of 68gb/s bandwith.
The diagram is clear!!!

To me it seems "Onion X1" 68GB/s, and Coherent BUS 30GB/S

http://www.vgleaks.com/durango-memory-system-overview/

I know it's probably OT but if we are heading down this path. The other diagram has a bunch of zeros near the bottom, so which one is which?
 
http://www.vgleaks.com/durango-memory-system-overview/

I know it's probably OT but if we are heading down this path. The other diagram has a bunch of zeros near the bottom, so which one is which?

Old story...
One is the peak BW.
The other one is just one exemple on BW usage by the X1, it is not the medium or the working values but just an exemple out of 1000 of the possible BW usage.

Reading the vgleaks memory exemple is clear that there are 2 Bus inside X1 absolutelly indipendent one from each other:
DDR - Northbridge with 68gb/s bandwith
Coherent bus 30gb/ s bandwith.

Let me say that this is quite a huge difference from Sony set up!
 
Last edited by a moderator:
The slide it declares <20gb/s second bidirectional, but only 10gb/s for write and only 10gb/s x read.
I don't recall seeing anything to suggest a hard split into read and write.

Incidentally, what is this odd shorthand you have of using 'x' for 'for'? Where does that come from? It seems very non-standard and an unnecessary confusion.
 
Incidentally, what is this odd shorthand you have of using 'x' for 'for'? Where does that come from? It seems very non-standard and an unnecessary confusion.

Opss, sorry for that... It is an abbreviation in use in my country...
I have never thought before it was not in use in other part of the world!
 
Onion and Onion+ share a 10GB/s read/write bus.

What remains to be seen is what difference the two approaches make (10GB/s coherent read/write) and (30GB/s coherent read).

I think you are still spinning, there's no evidence suggesting that PS4 CPU can snoop GPU cache, in fact, it's quite the contrary.
 
Last edited by a moderator:
Onion is the GPU bus that snoops the CPU caches.
The CPUs rarely draw data from it, although I haven't ruled out a very small time window where the memory pipeline might forward a GPU write to a CPU in the small number of cycles it has before it makes it to memory.

If people want to measure CPU bandwidth, it's the bandwidth of the connections leading out from the CPU section that would need to be measured. One case that doesn't appear to be possible from the descriptions of older APU bandwidth measurements is the case of adding coherent with write-combining bandwidth. There was a common bandwidth ceiling they both had to share.
In those systems, memory bandwidth was much lower, so it isn't clear if that may have been tweaked this time around.
 
Onion is the GPU bus that snoops the CPU caches.
The CPUs rarely draw data from it, although I haven't ruled out a very small time window where the memory pipeline might forward a GPU write to a CPU in the small number of cycles it has before it makes it to memory.

It seems that a few people assume that's the case, that in Liverpool the CPU can snoop GPU cache via Onion, thought I have my doubts because the whole Onion+ bus design implies that GPU cache can't be snooped by the CPU and hence relies on the selective flushing to make the data visible to the CPU.

Few had used this to spin that the other design is not fully coherent, but rather, coherent read only.

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=2%29

Per Cerny:

"First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!
He's saying that to have full coherency at the memory level via uncached access, which is Onion+.

"Next, to support the case where you want to use the GPU L2 cache simultaneously for both graphics processing and asynchronous compute, we have added a bit in the tags of the cache lines, we call it the 'volatile' bit. You can then selectively mark all accesses by compute as 'volatile,' and when it's time for compute to read from system memory, it can invalidate, selectively, the lines it uses in the L2. When it comes time to write back the results, it can write back selectively the lines that it uses. This innovation allows compute to use the GPU L2 cache and perform the required operations without significantly impacting the graphics operations going on at the same time -- in other words, it radically reduces the overhead of running compute and graphics together on the GPU."
He's saying the same thing, that even if compute is cachable in L2, you still need to flush to memory, though there's some optimization there for the graphics so that it doesn't flush entirely.

Therefore, I think I can conclude that the PS4 CPU does NOT snoop GPU cache, I wouldn't go as far as spinning it as coherent read only though ;-)

Also that HUMA requires HW cache coherency between CPU and GPU, neither can be HUMA by definition due the lack of GPU cache -> CPU coherency.
 
Last edited by a moderator:
It seems that a few people assume that's the case, that in Liverpool the CPU can snoop GPU cache via Onion, thought I have my doubts because the whole Onion+ bus design implies that GPU cache can't be snooped by the CPU and hence relies on the selective flushing to make the data visible to the CPU.

Just to clarify my earlier post, the forwarding case I'm talking about is the possibility that the memory request queue that the Onion bus put the GPU write on can forward data if a CPU read to that same cache line hits the queue, or if the memory controllers can satisfy reads from their write buffers.

The GPU caches wouldn't be involved, hence why Onion+ meets AMD's threshold for coherence--although I am not sure that's the same one for HSA because HSA's bar seems a bit lower.
 
Just to clarify my earlier post, the forwarding case I'm talking about is the possibility that the memory request queue that the Onion bus put the GPU write on can forward data if a CPU read to that same cache line hits the queue, or if the memory controllers can satisfy reads from their write buffers.

The GPU caches wouldn't be involved, hence why Onion+ meets AMD's threshold for coherence--although I am not sure that's the same one for HSA because HSA's bar seems a bit lower.

Yea sorry, I re-read your post and got that part, the GPU cache's either already flushed or not involved at this point.

This' different than what some are claiming that CPU can snoop GPU cache on the PS4.
 
Yea sorry, I re-read your post and got that part, the GPU cache's either already flushed or not involved at this point.

This' different than what some are claiming that CPU can snoop GPU cache on the PS4.

I've seen no indication that the CPU can snoop the GPU cache on PS4.
 
Back
Top