Why does Frostbite engine perform relatively better on PS4 than XB1? *spawn

Discussion in 'Console Technology' started by Recop, Nov 17, 2015.

  1. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    All shared memory systems have the same contention problem, and never reach close to the theoretical maximum memory BW when both processors are active at the same time.

    The ESRAM improves things, as most BW heavy GPU operations can use the dedicated ESRAM instead of the shared main memory. This reduces the contention to main memory (reducing the BW loss problem). Big caches, such as Intel's CPU/GPU shared L3 cache (and the L4 Crystalwell) serve a similar purpose. The other processor (CPU or GPU) has free reign to the memory system when the other processor is reading/writing data to/from the cache. Memory contention is one reason why we still see dedicated GPU memories in the lowest tier discrete mobile GPUs. Both CPUs and GPUs would need quite big caches to avoid the contention problem. Only Intel has big enough caches to really achieve this (64/128 MB Crystalwell).
     
  2. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    It would be interesting to know how much faster a system with a separate 4 GB of fast GDDR5 GPU memory + 4 GB DDR3 (slow dual channel) CPU memory would have been. Of course this system would have been more awkward to program, there would have been more memory capacity issues due to fragmentation (two pools practically lead to two time as much wasted memory) and some algorithms requiring tight CPU<->GPU communication would have been impossible.

    I would assume with Zen, AMD will have a big shared LLC for both the CPU and the GPU. This should reduce the APU memory contention in the future. Intel's L3+L4 caches clearly show how much a big LLC helps the performance in bandwidth limited scenarios. Intel's (Broadwell 5775C) scores in gaming benchmarks are awesome compared to their tiny 29.8GB/s total bandwidth (shared between the CPU and the GPU).
     
  3. Allandor

    Regular

    Joined:
    Oct 6, 2013
    Messages:
    844
    Likes Received:
    881
    Yes, the contention problem is something we will see more often in future games, when the cpu and gpu get's maxed out. Is it somehow possible to reserve one part of the memory-pool/interface for the cpu and one part to the gpu. So they would only. Just as an example,192 Bit for the GPU and 64 Bit for the CPU... something like that? Or can one component only require all or nothing?

    The "bad" thing about a shared memory pool is, that there is so much data in the memory, that only one component requires. Those things could be better handled in separate memory pools. But that would waste much of the resources in some situations.
    A good thing would be separate memory pools, that all components can access, but each component has its preferred pool. As far as I know, the xbox one gpu does exatly that. The esram could be used by the cpu, but shouldn't (also the API doesn't allow it), because of the contention problem.
     
  4. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,107
    Likes Received:
    16,899
    Location:
    Under my bridge
    PS3 did that very much, although one of the pools was gimped for one of the processors (CPU accessing GDDR or RSX accessing DDR). At the end of the day there are no ideal solutions and only compromises. Each has their pros and cons. Ultimately, a massive amount of BW is all that's really needed and then contention doesn't matter. Who cares if you lose 50 GB/s when CPU and GPU share memory access when you have 500 GB/s available? ;)
     
  5. Allandor

    Regular

    Joined:
    Oct 6, 2013
    Messages:
    844
    Likes Received:
    881
    If you have 500GB/s with todays chips, yes, but you loose more than 10% through contention.
    When we've got 500GB/s we also get stronger chips, that uses this bandwidth. What use has bandwidth, that is not used ;)

    PS3 had 2 big pools, but cpu/gpu were not able to really use both. Best solution would be small pools for the components (heavy work) and a big pool for every. This would minimize contention. Contention is not only bad for the bandwidth, it is also bad for timings. Your memory-access attempts just needs longer when something interrupts.
    Well, but that wouldn't make it easier for developers.
     
  6. JPT

    JPT
    Veteran

    Joined:
    Apr 15, 2007
    Messages:
    2,507
    Likes Received:
    944
    Location:
    Oslo, Norway
    I would so buy the ShiftyBox1337 :D
     
  7. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,107
    Likes Received:
    16,899
    Location:
    Under my bridge
    I don't think that's the case. More powerful chips move towards compute, not data gobbling. We still need assets in high res, but we're seeing more and more clever ways to use BW like tiled assets, while displacement-type detail techniques remove the need for higher res models than we have already. The only area of real BW growth that's unavoidable is 4k and stereoscopic rendering and buffer sizes. So I believe it is possible to have BW in excess of what's useful, and ultimately UMA, single pool RAM on a massive bus will be happy enough with contention, especially coupled with sizeable LLC as Sebbbi says. Perhaps stacked RAM with a split pool for cache on one tier and wide IO on the rest shared access?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...