Why does Frostbite engine perform relatively better on PS4 than XB1? *spawn

The theory is that even when contention is affecting PS4's BW, there's plenty to spare. If XB1's DDR3 is being affected to the same degree, there's far less available for working.
All shared memory systems have the same contention problem, and never reach close to the theoretical maximum memory BW when both processors are active at the same time.

The ESRAM improves things, as most BW heavy GPU operations can use the dedicated ESRAM instead of the shared main memory. This reduces the contention to main memory (reducing the BW loss problem). Big caches, such as Intel's CPU/GPU shared L3 cache (and the L4 Crystalwell) serve a similar purpose. The other processor (CPU or GPU) has free reign to the memory system when the other processor is reading/writing data to/from the cache. Memory contention is one reason why we still see dedicated GPU memories in the lowest tier discrete mobile GPUs. Both CPUs and GPUs would need quite big caches to avoid the contention problem. Only Intel has big enough caches to really achieve this (64/128 MB Crystalwell).
 
It would be interesting to know how much faster a system with a separate 4 GB of fast GDDR5 GPU memory + 4 GB DDR3 (slow dual channel) CPU memory would have been. Of course this system would have been more awkward to program, there would have been more memory capacity issues due to fragmentation (two pools practically lead to two time as much wasted memory) and some algorithms requiring tight CPU<->GPU communication would have been impossible.

I would assume with Zen, AMD will have a big shared LLC for both the CPU and the GPU. This should reduce the APU memory contention in the future. Intel's L3+L4 caches clearly show how much a big LLC helps the performance in bandwidth limited scenarios. Intel's (Broadwell 5775C) scores in gaming benchmarks are awesome compared to their tiny 29.8GB/s total bandwidth (shared between the CPU and the GPU).
 
Yes, the contention problem is something we will see more often in future games, when the cpu and gpu get's maxed out. Is it somehow possible to reserve one part of the memory-pool/interface for the cpu and one part to the gpu. So they would only. Just as an example,192 Bit for the GPU and 64 Bit for the CPU... something like that? Or can one component only require all or nothing?

The "bad" thing about a shared memory pool is, that there is so much data in the memory, that only one component requires. Those things could be better handled in separate memory pools. But that would waste much of the resources in some situations.
A good thing would be separate memory pools, that all components can access, but each component has its preferred pool. As far as I know, the xbox one gpu does exatly that. The esram could be used by the cpu, but shouldn't (also the API doesn't allow it), because of the contention problem.
 
PS3 did that very much, although one of the pools was gimped for one of the processors (CPU accessing GDDR or RSX accessing DDR). At the end of the day there are no ideal solutions and only compromises. Each has their pros and cons. Ultimately, a massive amount of BW is all that's really needed and then contention doesn't matter. Who cares if you lose 50 GB/s when CPU and GPU share memory access when you have 500 GB/s available? ;)
 
PS3 did that very much, although one of the pools was gimped for one of the processors (CPU accessing GDDR or RSX accessing DDR). At the end of the day there are no ideal solutions and only compromises. Each has their pros and cons. Ultimately, a massive amount of BW is all that's really needed and then contention doesn't matter. Who cares if you lose 50 GB/s when CPU and GPU share memory access when you have 500 GB/s available? ;)
If you have 500GB/s with todays chips, yes, but you loose more than 10% through contention.
When we've got 500GB/s we also get stronger chips, that uses this bandwidth. What use has bandwidth, that is not used ;)

PS3 had 2 big pools, but cpu/gpu were not able to really use both. Best solution would be small pools for the components (heavy work) and a big pool for every. This would minimize contention. Contention is not only bad for the bandwidth, it is also bad for timings. Your memory-access attempts just needs longer when something interrupts.
Well, but that wouldn't make it easier for developers.
 
If you have 500GB/s with todays chips, yes, but you loose more than 10% through contention.
When we've got 500GB/s we also get stronger chips, that uses this bandwidth.
I don't think that's the case. More powerful chips move towards compute, not data gobbling. We still need assets in high res, but we're seeing more and more clever ways to use BW like tiled assets, while displacement-type detail techniques remove the need for higher res models than we have already. The only area of real BW growth that's unavoidable is 4k and stereoscopic rendering and buffer sizes. So I believe it is possible to have BW in excess of what's useful, and ultimately UMA, single pool RAM on a massive bus will be happy enough with contention, especially coupled with sizeable LLC as Sebbbi says. Perhaps stacked RAM with a split pool for cache on one tier and wide IO on the rest shared access?
 
Back
Top