Technical Comparison Sony PS4 and Microsoft Xbox

Discussion in 'Console Technology' started by BRiT, May 21, 2013.

Thread Status:
Not open for further replies.
  1. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    Well one advantage of the eSRAM is that DDR3 is dirt cheap in comparison to GDDR5. Im not smart enough to know the advantages of the low latency in most situations, nor do I know the average size of textures per frame in a game.

    But if we are going to start working out that kind of stuff we should at least start taking some bandwidth away for the CPU's.

    So we have 68GB/s - 10GB/s (For the 8 CPU's)

    58 / 30 GB/f
    1.93 * 1024 MB/f
    1976.32 MB/f
    1976.32 / 32
    61.76 fills / frame

    half that 60FPS games of course.
     
  2. Ketto

    Newcomer

    Joined:
    Jul 30, 2012
    Messages:
    39
    Likes Received:
    0
    Location:
    Winter Park, Florida; and London UK.
    nvm I saw it was addressed. Honestly there isn't any real performance advantage with Xbox's setup, but I'd expect it to be cheaper, especially since eSRAM can be shrunk as MS goes along.
     
  3. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,296
    Location:
    Helsinki, Finland
    That might be true for a few games, but majority of the games run just fine on 1 GB card at 1080p when default (medium) quality settings are used (no AA, no AF). Of course if you start ramping up the antialiasing and/or detail levels, 1 GB cards start to stutter badly in the most demanding games. However these 1 GB cards aren't even powerful enough to run those games at ultra settings (or at high resolutions / antialiasing), so the limited memory doesn't hurt that much in reality.

    Next gen will likely change the situation a bit. I expect 2 GB cards to become the minimum requirement to run the games at console texture+AA quality at 1080p. But the developer still needs to design their game to downgrade flawlessly to the 1 GB cards. It would be insane to throw away 70% of the existing PC market.
    Yes, it seems that many games are either TEX or ALU bound on 7850 (not bandwidth bound). I don't think that 7850 is fill bound, since it has 72% extra fill compared to 7770, but only a very few games show improvements of that magnitude. 7870 on the other hand seems to be a very well balanced card (extra TEX and ALU seem to be helping it in many recent games).
     
  4. vjPiedPiper

    Newcomer

    Joined:
    Nov 23, 2005
    Messages:
    136
    Likes Received:
    88
    Location:
    Melbourne Aus.
    OK,
    Thinking about DDR3 + Edram vs GDDR5 bandwidth and game engines..

    So i am possibly very wrong here, but i'm hoping someone can point me in the right direction. But isn't a typical transaction going to be more like...

    PS4 / standard PC setup
    1 GPU needs texture
    2 Get texture @ correct mipmap from memory
    3 GPU needs next mipmap of same texture
    4 GPU goes to GDDR again for next mip map.
    repeat for every texel......

    XB1
    1 GPU needs texture
    2 Texture + all mipmaps get copied to Esram (or exist from previous frame)
    3 GPU get texture from Esram
    4 GPU needs next mipmap of same texture
    5 GPU gets next mipmap direct from Esram
    repeat for every texel......

    The idea being that steps 3, 4 and 5 are MUCH faster on the XB1, so that overcomes the extra time that step 2, for xb1 takes.
    This allows for more operations/transfers to happen at once, making better use of the available system resources.
    Also this completely ignores the more obvious benefits of keeping your working set of render targets in a a local cache (ie the esram).
    Of course more render targets = less space as a texture cache.....

    I'm talking way over my heard here, but i would be really interested if anyone could provide some more info on likely rendering optimisations for a rendering engine with a large local cache.

    Cheers,
     
    #384 vjPiedPiper, May 27, 2013
    Last edited by a moderator: May 27, 2013
  5. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    Faster in what way though? latency wise sure, but bandwidth wise it wouldn't be any faster. and do GPU's really only read in a single mipmap at a time?
     
  6. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Each SIMD in a GCN CU can have 10 different wavefronts in flight, it takes four cycles to complete one instruction. That's a total of 40 cycles of latency tolerance, or 50ns. Hitting in caches is thus critical for performance.

    Worst case is a workload where all wavefronts miss the texture caches. If they sample the texture atlas and miss, they almost certainly miss the actual tile pointed to by the atlas as well.

    The ESRAM isn't of infinite size either, so I expect XB1 developer to work hard to fit the PRT atlas and active tiles into the part of the ESRAM not used by rendering buffers (let's say half, 16MB).

    Cheers
     
  7. liolio

    liolio Aquoiboniste
    Legend

    Joined:
    Jun 28, 2005
    Messages:
    5,724
    Likes Received:
    195
    Location:
    Stateless
    One of the thing that "bothers" me and I would like to read about is more on the software side of things or more precisely how some techniques would map to different memory configurations.

    Lets put aside everything else for a moment the difference in raw throughput (different numbers of CUs, ROPs, etc.). so let assume those 2 set-ups :
    System 1&2 X cpu cores Y GPU "cores", Z ROPs

    System 1 is UMA as the ps4, system 2 is like Durango with the matching characteristics wrt bandwidth figures.

    This gen we've seen a couples of developers teams vouched for tight G-buffer, though I don't know what are the pro/con of a large and a tight G-buffer.

    How could things like in order transparencies could affect the memory footprint in realtime graphics?
    It is a bit unclear what A (accumulation?) buffers are but same question?

    Overall I'm less concerned by the difference in raw throughput than the "level" of freedom offered by both systems (not a statement more a question).
     
  8. Strange

    Veteran

    Joined:
    May 16, 2007
    Messages:
    1,698
    Likes Received:
    428
    Location:
    Somewhere out there

    Still missing something I believe.

    Assuming that you read each of the entire 32MB dataset 3 times,
    you're only getting 32*10*3= 960MB of real bandwidth available for each frame.
    Upping it to the entire dataset to be read 5 times we get 1600MB/frame
    (if you don't use the data set multiple times, you might as well as NOT read it to the eSRAM for bandwidth purposes)

    Adding it to the DDR3 that provides 3.08GB/frame after reading 10 32MB data sets to the eSRAM,
    we will only have 4.04GB/frame for 3 full reads and 4.68GB/frame for 5 full reads in the best of best situations, which I'm sure is utterly impossible to do.

    This also ignores the amount of scheduling you'd have to do to properly accomplish to allow each dataset to get 3 ~5 full reads for each dataset before replacing it with another set.
     
  9. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Please explain to me why anybody would copy data to ESRAM unless it is a win (ie. data-reuse >2).

    Or is your basic assumption is that MS and all XB1 developers are idiots?

    Cheers
     
  10. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    All its meant to show is that its disengenious to add up the bandwidth numbers and call it a day, because everytime you copy something into the eSRAM it takes away bandwidth that would be avalible to the DDR3, and it uses 3x the bandwidth that would be required if reading from the DDR3.

    Of course its going to be a win and read/written to but that doesnt stop it from taking bandwidth from the DDR3.

    Its not meant to show anything else other then that.
     
  11. Qrius

    Newcomer

    Joined:
    Feb 24, 2013
    Messages:
    18
    Likes Received:
    0
    There might be an interesting tidbit in that statement...unless I'm reading too much into it!

    Extra work is required by devs to make the most of the ESRAM then? It's not an 'automagic' performance boosting feature?

    Will MS be in the position that PS3 was last generation? An opportunity for developers to make use of an interesting and potentially powerful hardware feature (as with SPU's but in a lesser way), but the need to spend time on optimisation to get the best from the hardware?

    That approach didn't help Sony much last time round...especially with 3rd party developers.
     
  12. blakjedi

    Veteran

    Joined:
    Nov 20, 2004
    Messages:
    2,985
    Likes Received:
    88
    Location:
    20001
    [​IMG]

    Looking at the diagram from vgleaks I have a hard time squaring brads and betas description of esram activity and bandwidth usage. Esram is local to the gpu and not sitting within the general memory subsystem paths at all. There is no system bandwidth usage incurred while using ESRAM. It doesn't have to copy or paste anything through main system ram.

    The gpu gets 170gb read which is parallel read from esram and ddr. 102 gb of write from the gpu.

    I'm not seeing what you guys are saying at all since access to both memory pools are, simultaneously parallel, asynchronous and non-linear. You don't have to copy anything into ddr memory first then into esram to work on it. You can copy directly to esram, ddr3 or both and read at the full data rate of each path. Can anyone clarify?
     
    #392 blakjedi, May 27, 2013
    Last edited by a moderator: May 27, 2013
  13. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    You move data from the DDR3 to the eSRAM. As you can see the only fast bus to eSRAM from the outside world is the DDR3, its the only practical way to get data to it.
     
  14. almighty

    Banned

    Joined:
    Dec 17, 2006
    Messages:
    2,469
    Likes Received:
    5
    I think multi platform ports would be very interesting between these 2, if the game is made for PS4's memory layout I can see it being a complete pain in the ass to get it running at the same speed/quality on XboxOne.

    On the plus side Xbox One made me go laugh out loud.... PS4 on the other hand... Now that machine has my attention.... At the end of this year I could end up buying my first console since PS2 released.....
     
  15. blakjedi

    Veteran

    Joined:
    Nov 20, 2004
    Messages:
    2,985
    Likes Received:
    88
    Location:
    20001
    No. Northbridge.
     
  16. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    The northbridge doesnt have any RAM it needs to get its data from somewhere, and where would that be, you have 3 choices, the CPU (thus reducing CPU bandwidth and reducing your memory space to about ~8/10MB), the DDR3 (reducing DDR3 bandwith) or the 9GB/s HDD controller (thus really negating any benifit at all).
     
  17. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Rendering of shadow buffers is completely GPU<->ESRAM
    Filling lighting/materials/G-buffer is read texture from DDR3 or ESRAM and write to ESRAM
    Accumulate light, read materials/G-buffer is strictly GPU<->ESRAM

    The only time you copy something from DDR3 into ESRAM is when it is a definite win compared to just reading from DDR3.

    Cheers
     
  18. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    Yep that all makes sense, but you can still only read from the DDR3 at 68GB/s when doing that unless the data is already in the eSRAM and either way be it from the GPU ROP's or from the northbridge itd have to get there via another bus (the ROPs being relative free). So the entire eSRAM + DDR3 bandwidth addition is still bunk, even more so when you think about size.
     
  19. blakjedi

    Veteran

    Joined:
    Nov 20, 2004
    Messages:
    2,985
    Likes Received:
    88
    Location:
    20001
    The gpu can read from virtually any cache anywhere on the system. Reading/Writing directly to/from esram is the point. You can do that through the graphics controller and the NB totally skipping the ddr3. Why are you missing that?

    I'm probably wrong, but the 30gb/s coherent read/write between gpu and Northbridge may be the source of missing 30gb/s bandwidth talked about by MS and alluded to by either Gubbi or sebbbi I can't remember which.
     
  20. Betanumerical

    Veteran

    Joined:
    Aug 20, 2007
    Messages:
    1,763
    Likes Received:
    280
    Location:
    In the land of the drop bears
    Yes the GPU can read from virtually any cache (seems to be a big thing in both systems), but it doesnt mean its quick. The coherent bus is only 30GB/s. You'd be better off doing what Gubbi said and writing to it from the GPU's output. Reading from the CPU caches you get a max of 20GB/s.
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...