Why does Frostbite engine perform relatively better on PS4 than XB1? *spawn

Discussion in 'Console Technology' started by Recop, Nov 17, 2015.

  1. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,838
    Likes Received:
    18,641
    Location:
    The North
    the possibilities are endless!
     
  2. AFAIK, it's 102GB/s full duplex. Meaning you can write data to the eSRAM at 102GB/s while reading data at 102GB/s at the same time, but you can never read data at over 102GB/s. This means latencies will be much better, but raw bandwidth isn't as large as the other GDDR5 implementation.

    Furthermore, trying to correct an Anandtech article with an arsetechnica one won't go very well in 99 out of 100 times.
     
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    It's not quite full duplex. Besides the fact that getting dual-issue is apparently dependent on some rather onerous banking considerations, it was indicated that the interface will not dual-issue a write alongside a read every 8th cycle.
     
    Globalisateur likes this.
  4. turkey

    Veteran

    Joined:
    Oct 21, 2014
    Messages:
    1,112
    Likes Received:
    883
    Location:
    London
    they said not quite full duplex as there is a write bubble so you can do both operations simultaneously almost all the time but not quite, which coupled with the upclock has lead to many different figures being banded about. 109 (clock speed 853 * 128) full read or write, I believe its 204GB/s as the final value according to the eurogamer interview as they said every 8th write is impacted.

    Edit: Too slow on the typing it seems.
     
  5. Recop

    Veteran

    Joined:
    Aug 28, 2015
    Messages:
    1,319
    Likes Received:
    649
    Even 102 GB/s is the theoretical peak... but in real life scenarios ?
     
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Real life scenarios generally have reasons why they do not always draw theoretical peak bandwidth in the pure read case.
    We do not have enough information about the internal workings of the implementation to know how much that can fall short, however, at least theoretically the ESRAM has far less reason to fall below peak (and in theory not fall as far) than GDDR5 would.
     
  7. turkey

    Veteran

    Joined:
    Oct 21, 2014
    Messages:
    1,112
    Likes Received:
    883
    Location:
    London
    Real life of placing values into memory and just reading the same things over and over to verify on hardware or real life such as what figure a game has recorded over a frame and extrapolated to a full second?
     
  8. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,857
    Likes Received:
    4,414
    Location:
    Wrong thread
    The ~145 GB/s figure was from a presentation with figures seen a real game, although we don't know what proportion of frame time those access rates were over. A synthetic test would probably get higher figures, but not be so interesting.

    That leaves the ~50 GB/s main memory BW untouched btw, for compute, CPU, other buffers, whatever.

    PS4 memory setup seems better in vast majority of cases, but it doesn't necessarily mean X1 has no areas of relative strength.
     
    temesgen likes this.
  9. Seven-eights duplex, then :)
     
  10. HTupolev

    Regular

    Joined:
    Dec 8, 2012
    Messages:
    936
    Likes Received:
    564
    The correct marketing term is "suplex."
     
    I.S.T., TheAlSpark, DSoup and 2 others like this.
  11. forumaccount

    Newcomer

    Joined:
    Jan 30, 2009
    Messages:
    140
    Likes Received:
    86
    They use 16 byte per pixel gbuffer (http://www.frostbite.com/wp-content/uploads/2014/11/course_notes_moving_frostbite_to_pbr_v2.pdf) so it fits fine even in 1080, albeit with very little room for anything else.

    ROP bound doesn't really mean anything on this hardware... probably you mean bandwidth bound. But there's no such thing as a single bound for a frame. If I had to guess they're vgpr bound on their important shaders just like everyone else this gen.

    As for why they're 720: they find the image quality acceptable for their goals and there's not enough consumer pressure to get them to change their goals.
     
    Grall, Recop and TheAlSpark like this.
  12. Globalisateur

    Globalisateur Globby
    Veteran Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    4,592
    Likes Received:
    3,412
    Location:
    France
    Yep, that's it, that's what they explained back then. Hence the maximum ~145GB/s number for operations that can read and write from the same location.

    Probably not your typical scenario though. For the others operations the theoretical max bandwidth (similar to the 176GB/s PS4 number) is 109G/s, not 102G/s, don't forget the overclocking guys (800 -> 853). :yep2:
     
  13. Barbarian

    Regular

    Joined:
    Jun 27, 2005
    Messages:
    289
    Likes Received:
    15
    Location:
    California, USA
    From page 15, they show 4 GBuffer MRTs (16 bytes), but since they'll also need a depth buffer, that would put the total at 20 byets per pixel.
    At any rate, that should be enough to fit in ESRAM at 900p but since it would leave very little space for other things (shadows, lit buffers etc) it will likely impact performance negatively, which I'm guessing is the main reason to not go there.
     
    TheAlSpark likes this.
  14. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,838
    Likes Received:
    18,641
    Location:
    The North
    just curious, but does compiler matter for shader performance on the consoles?

    Found Sébastien's blog
    https://seblagarde.wordpress.com/tag/atan/

    He talks about the lack of vgpr's available in particular with certain math functions. Interesting, so the new lighting is basically killing the GPUs since you're forced to use the inverse functions. This is what he got from Playstation compiler 2.0
    I'm not really sure what is considered bloat. any thoughts?
    acos: 48 FR (40 FR, 2 QR), 2 DB, 12 VGPR
    asin: 48 FR (40 FR, 2 QR), 2 DB, 1 scalar instruction, 12 VGPR
    atan: 23 FR (19 FR, 1 QR), 2 scalar, 8 VGPR

    – VGPR count are more important than instruction count
     
    #34 iroboto, Nov 18, 2015
    Last edited: Nov 18, 2015
  15. Metal_Spirit

    Regular

    Joined:
    Jan 3, 2007
    Messages:
    632
    Likes Received:
    397
    Yes... You are correct. I admit talking about 204 GB/s is wrong because of that limit. Besides it would never be 204 but 192 GB because eSRAM cannot read and write on all clock cycles.

    eSRAM is a strange beast
     
  16. temesgen

    Veteran

    Joined:
    Jan 1, 2007
    Messages:
    1,680
    Likes Received:
    486
    My recollection from some of the early debate here is that XB1 should be capable of some very good particle effects which we might not see replicated on PS4, hopefully developers showcase some of the unique advantages of the hardware on first party exclusives soon.
     
    #36 temesgen, Nov 18, 2015
    Last edited: Nov 19, 2015
  17. forumaccount

    Newcomer

    Joined:
    Jan 30, 2009
    Messages:
    140
    Likes Received:
    86
    Compiler is a highly important part of shader performance on all hardware.

    (Or hardware/software stack if you want to think of it like that, this is one reason PC driver venders replace shaders that generate terrible microcode using the default compiler with better ones.)
     
    iroboto, Clukos and chris1515 like this.
  18. Metal_Spirit

    Regular

    Joined:
    Jan 3, 2007
    Messages:
    632
    Likes Received:
    397
    Can you be more specific? We are talking about eSRAM, and bandwidth! Xbox One bandwidth is funny since the large memory pool is slow, and the very small one is fast. But not as fast as we may think. Average its the same as PS4, but with limits on both read and writes!

    So, how come you say that? Besides, the use of GPGPU should shoud allow for less memory bandwidth usage, and PS4 has more GPGPU!

    With the exception of the limits placed by the RAW power, I dont see any differences between the consoles, and I only see the Xbox internal memory and bandwidth fragmentation as an aditional problem!

    Also, as many games have shown Xbox usually has aditional performance problems alpha effects!
     
  19. hesido

    Regular

    Joined:
    Mar 28, 2004
    Messages:
    553
    Likes Received:
    85
    #39 hesido, Nov 19, 2015
    Last edited by a moderator: Nov 19, 2015
  20. Globalisateur

    Globalisateur Globby
    Veteran Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    4,592
    Likes Received:
    3,412
    Location:
    France
    I don't think so, even by the opinion of Microsoft engineers. Actually esram low latency already brings you simultaneous read and write for some operations within the (7/8) stuff ratio, see @3dilettante post.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...