(Alleged) Orbis vs Durango memory subsystem comparison

McHuj

Veteran
Supporter
Vgleaks put up a new article on the PS4 architecture evolution. However, there was an interesting picture of the memory subsystem. That shows the breakdown for the memory buses.

http://www.vgleaks.com/playstation-4-architecture-evolution-over-time/

PS4:
lvp2.jpg


Durango:
durango_memory.jpg


(If these are accurate) One thing that jumps out at me is that the CPU on Durango, will have much higher bandwidth available. The coherent access between GPU and CPU is much faster as well on Durango as well.

I would assume that what Sony has for the CPU is sufficient (unless it's supposed to be 20 GB/sec per module) so I wonder why MS would need 2X the bandwidth. Perhaps if the GPU can execute entirely out of the ESRAM (with some streaming consuming the 25.6 GB/sec DMA's), the CPU's can staturate the rest of the bandwidth from the main memory (20.8 + 20.8 + 25.6 = 67.2 GB/sec) .

Seems to me that this would be a better model for full utilization of all resources.
 
Well if that's true then we know some of where MS spent all those years doing expensive customisation: memory and interconnects.

If MS have engineered in 50% more CPU <--> GPU BW then they must be anticipating that the CPU will be feeding the GPU lots of data (or possibly receiving it). But double the CPU <--> main ram BW? That's a mighty big difference...

... 256 bit vector units? :runaway:
 
I'm hesitant to allow this thread as we're not comparing yet, but lets give it a go and only close it if the fanbots start trolling or talking business. ;)

At first I see this as Orbis being BW limited on the CPU. But then I compare it to PS3. PS3 had 8 CPU cores with lots more float power on a more limited bus. Perhaps 20 GBps is all that's needed for the CPU and the limit is a sensible cost saving?

But double the CPU <--> main ram BW? That's a mighty big difference...
I think it's 3x. There are no terminal arrows on the Onion bus (really, Onion and Garlic??) at the CPU. Onion appears to be a CPU passthrough bus with optional access to CPU's L1 and L2 caches. Snooping the caches at the L1 level sounds like very close CPU/GPU integration to me
 
Vgleaks put up a new article on the PS4 architecture evolution. However, there was an interesting picture of the memory subsystem. That shows the breakdown for the memory buses.

http://www.vgleaks.com/playstation-4-architecture-evolution-over-time/

PS4:
lvp2.jpg


Durango:
durango_memory.jpg


(If these are accurate) One thing that jumps out at me is that the CPU on Durango, will have much higher bandwidth available. The coherent access between GPU and CPU is much faster as well on Durango as well.

I would assume that what Sony has for the CPU is sufficient (unless it's supposed to be 20 GB/sec per module) so I wonder why MS would need 2X the bandwidth. Perhaps if the GPU can execute entirely out of the ESRAM (with some streaming consuming the 25.6 GB/sec DMA's), the CPU's can staturate the rest of the bandwidth from the main memory (20.8 + 20.8 + 25.6 = 67.2 GB/sec) .

Seems to me that this would be a better model for full utilization of all resources.

There are some major differences. It appears that there is a dedicated bus between the GPU and CPU in Orbis. Durango goes through the North Bridge to reach the CPU L2, and it shares that bandwidth with everything else in the North Bridge. Latency between the CPU and GPU will probably be less in Orbis, particularly since the Orbis GPU can bybass its caches with Onion+.

So it comes down to if you are bandwidth limited at 10GB/s between the CPU and GPU, or at 20GB/s between the CPU and Main memory. I kind of doubt it, its not like these CPUs are powerhouses. Not even Intel's fastest desktop processors can saturate 20GB/s between the CPU and RAM.
 
There are some major differences. It appears that there is a dedicated bus between the GPU and CPU in Orbis. Durango goes through the North Bridge to reach the CPU L2, and it shares that bandwidth with everything else in the North Bridge. Latency between the CPU and GPU will probably be less in Orbis, particularly since the Orbis GPU can bybass its caches with Onion+.

So it comes down to if you are bandwidth limited at 10GB/s between the CPU and GPU, or at 20GB/s between the CPU and Main memory. I kind of doubt it, its not like these CPUs are powerhouses. Not even Intel's fastest desktop processors can saturate 20GB/s between the CPU and RAM.

It's a simplified diagram for Orbis, it is highly unlikely that memory traffic won't be passing through the "northbridge" when going from CPU core to main memory or GPU core as the "northbridge" is part of the CPU module.

In the Durango diagram the "CPU module" is basically the 2 CPU modules + Northbridge while the "GPU module" is the GPU + GPU memory system. Orbis will be the same with the "GPU memory system" handling memory accesses for the "GPU".

Regards,
SB
 
I'm not certain the overall uncore organization and bus arrangement between the GPU, CPU, and system memory are entirely different between Orbis and Durango.

The Onion and Garlic links are part of the uncore and would be the arrow between the GPU and northbridge and the arrow between the GPU and DDR3 in the Durango diagram.

There could be a variation in the bus speeds and widths, however.
 
I'm not certain the overall uncore organization and bus arrangement between the GPU, CPU, and system memory are entirely different between Orbis and Durango.

The Onion and Garlic links are part of the uncore and would be the arrow between the GPU and northbridge and the arrow between the GPU and DDR3 in the Durango diagram.

There could be a variation in the bus speeds and widths, however.

After researching it further it appears you are correct. What is interesting is that its using the same terminology AMD used two years ago in their Llano fusion devices. The only difference is the addition of Onion+.

http://amddevcentral.com/afds/assets/presentations/1004_final.pdf
 
Onion+ might be something not available to Llano's VLIW5 GPU, which AMD's presentation was based on. GCN's read/write cache structure is different, and that actually creates the need and means to bypass them, although they'd still go out on the same Onion bus.

What questions I do have are whether Durango actually gives a full 20 GB/s port to each CPU cluster, or if the Orbis diagram omitted a similar change.
The Orbis diagram tracks somewhat with the numbers put forward for Llano, whereas Durango's Onion bus numbers might correspond to something stronger or more recent, while Orbis has a significantly larger Garlic bus.

I hope there's some full disclosure on the architectures at some point. Part of my uncertainty is that I don't know how much of the Vgleaks data is being relayed by people who know what any of the words they put out mean. That we keep getting "different" diagrams for the same things makes me think not much.
 
I dont undartsnd much from this :oops: , so looking at those diagrams does durango have a potential to be effective system in compare to orbis?? I dont mean more powerfull but effective att handling tasks?. Is it clever design? Orr maybe this can produce more bottlenecks then it tries to fix?
 
So it comes down to if you are bandwidth limited at 10GB/s between the CPU and GPU, or at 20GB/s between the CPU and Main memory. I kind of doubt it, its not like these CPUs are powerhouses. Not even Intel's fastest desktop processors can saturate 20GB/s between the CPU and RAM.
Since SNB, a single thread can pretty much max out the available bandwidth in burst reads/writes. Of course, Intel's memory pipeline is quite a bit more sophisticated than anything AMD can offer, me thinks.
 
Since SNB, a single thread can pretty much max out the available bandwidth in burst reads/writes. Of course, Intel's memory pipeline is quite a bit more sophisticated than anything AMD can offer, me thinks.

I hear this a lot but i cant really replicate it on my ESXi rig with a 8350. intel can get higher bandwidths to a single core. but the second i fire up a different memory bandwidth test on another guest vm tie it to another module i can get very close to max theoretical bandwidth. But latency intel is quite a bit fast then amd.
 
Onion+ might be something not available to Llano's VLIW5 GPU, which AMD's presentation was based on. GCN's read/write cache structure is different, and that actually creates the need and means to bypass them, although they'd still go out on the same Onion bus.

What questions I do have are whether Durango actually gives a full 20 GB/s port to each CPU cluster, or if the Orbis diagram omitted a similar change.
The Orbis diagram tracks somewhat with the numbers put forward for Llano, whereas Durango's Onion bus numbers might correspond to something stronger or more recent, while Orbis has a significantly larger Garlic bus.

I hope there's some full disclosure on the architectures at some point. Part of my uncertainty is that I don't know how much of the Vgleaks data is being relayed by people who know what any of the words they put out mean. That we keep getting "different" diagrams for the same things makes me think not much.

Is Onion and Onion+ access mutually exclusive ? Can the CPU use both at the same time ?
 
The Park developer has said that the Xbox One and PS4 memory limitations are not optimum for developers.

http://gamingbolt.com/ps4xbox-one-8gb-memory-is-not-optimal-situation-for-devs-the-park-dev

Funcom creative director Joel Bylos, who is working on bringing The Park to current gen platforms, is a bit iffy on whether it will last for the next 5 odd years. “Hmmm, that’s a difficult question. It will last because it has to. Is it an optimal situation to put developers in? Absolutely not.

“And VR is coming – that’s going to come with hefty requirements. Who knows what will happen in the end? There are rumors that the PS4 VR has a separate box with unknown hardware inside. Maybe they’ve added a little extra hardware to help with the requirements.”
 
Last edited:
What would the 'optimum' be then?
Exactly. Optimum would be the exact amount of memory you need at any instant and this can change thousands of times during the generation of a frame, let alone frame to frame. I fully expect Sony and Microsoft to employ MagicMemory(tm) in the next generation of consoles. It would be lazy not too! :yep2:
 
Back
Top