Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by Proelite, Mar 16, 2020.

Thread Status:
Not open for further replies.
  1. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    I think NAND used as SLC will have better lifetime too.
     
  2. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,400
    Location:
    Wrong thread
    Yeah, I think that makes a good case for keeping the two SE. Perhaps having a setup that could be shared as much as possible between the two was part of their decision making (assuming Lockhart actually exists!)

    Interesting figures! Are you including dropping the L2 cache along with the memory controllers?

    With so many fewer CUs per SA in the hypothetical setup, might they also be able to halve L2 cache on the remaining controllers (down to 2MB from the 4MB in RDNA1)? That might save them a bit more area still.

    I do wonder about memory though. 6 x GDDR5 would be a lot of bandwidth for a budget device around 4TF, even with 8 Zen 2 cores. Although the CPU and file IO on the XSX only seeing 336GB/s across the whole 16GB, maybe that's a (tenuous) indicator that Lockhart might have the same....

    Back on the subject of CUs per shader array ... I remembered that the PS5 audio solution is a customised CU. Looking at the RDNA 1 whitepaper, their True Audio Next solution is running on CUs partitioned on a shader array, with ACE managed queues. Supports software ray traced audio and hundreds of sounds and everything.

    So ... maybe Sony's specialised CUs for audio are just bunged on the end of a normal shader array. They won't be counted along with the 36 "regular CUs", so that means that ... maybe ... you don't need equal numbers of CUs per shader engine (maybe).
     
    blakjedi and BRiT like this.
  3. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    Just the MC portion in the die shot. I was not sure how to identify the GPU L2, so I might be rather conservative unless the cache is in the MC. (I just drew a rectangle around the MCs)

    Probably better to just keep the per-slice L2 the same since that will have some effect on external bandwidth pressure + power consumption related to off-die requests.

    Maybe they could get away with a cheaper bin (e.g. 12Gbps). ;)
     
    DSoup, function and BRiT like this.
  4. blakjedi

    Veteran

    Joined:
    Nov 20, 2004
    Messages:
    2,985
    Likes Received:
    88
    Location:
    20001
    I found this an interesting part of the tweaktown article regarding the controller speed:"
    Based on this, the Xbox Series X's SSD can come up in to 2TB capacities, and theoretically deliver up to 3.75GB/sec sequential reads and writes..." I take it that that is Raw speeds and not compressed?

    if so why does MS advertise 2.4 raw and 4.8 compressed instead? Is that why the HW decompressor chip is listed at 6GB/s rating well above MS' listed speeds?

    Very confusing unless the difference is for overhead.
     
  5. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,632
    Location:
    The North
    Because the speed described is guaranteed bandwidth. Which means under all load conditions (heat) that is what to expect. MS never gave out optimal speeds, we just assumed they were the same.

    I am also curious if this implies random read speed. Which would be too fast.
     
  6. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    To get 3750 they need to buy the 1200 MT nand chips.

    With that specific single core controller, the overhead and signalling and ECC etc... Gives 3750 (out of a 4800 nand bus) available to the host on the other side of the controller.

    So they would be using 800 MT parts, which add up to 2500 after removing the same overhead, making 2400 "guaranteed" reasonable. These are widespread and are much less expensive than the cream of the crop.

    Sony must be using 533 MT or 667 MT, saving more money on the nand, but spending more on the controller.
     
    chris1515, Pete, Ika and 6 others like this.
  7. blakjedi

    Veteran

    Joined:
    Nov 20, 2004
    Messages:
    2,985
    Likes Received:
    88
    Location:
    20001
    and then create 12 channels compared to 4 to reach the speeds touted in their solution?
     
  8. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    Yep, the bandwidth they wanted seems to be the foundation of the entire design. The flash parts, the custom controller, and the decompression block in the SoC.
     
    blakjedi likes this.
  9. RagnarokFF

    Newcomer

    Joined:
    Mar 22, 2020
    Messages:
    57
    Likes Received:
    146
    Won't happen. OS needs RAM for background tasks and you want to reduce write with SSD.

    MS choose the RAM setup, because developers like such a trade-off when it gives them more bandwith. Goosen talked about this in the Inside XsX Digital Foundry article.
     
    PSman1700 likes this.
  10. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,029
    Likes Received:
    3,428
    I think that a lot of people believe that it will be handled by the developers hence the concerns.
    They see all this low level discussion when it will all sit behind MMU and OS.
    So all they need to do is either put it in slow or fast section of memory.
    Even if MS allows devs to implement their own memory management, that would just be cutting out part of the OS, The MMU will still expose it as a chunk of memory address.

    Even the slow memory is actually relatively fast, what is it, 30GB/s faster than the 1X? Could even use it for graphics stuff, just dont use it for intermediate render targets etc.
    Fast enough for textures that are read by the gpu at the start of the frame? Although I don't see any reason would need to do it, just making a point.
     
    milk and VitaminB6 like this.
  11. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Very unlikely, IMO. Those will cost a premium and console are very cost sensitive.

    I would be surprised if both MS and Sony solutions don't use 8 channels, it doubles the amount of IOPS your storage device can handle, and will be crucial to how the devices are used.

    The 2.4GB/s firgure might be a limit of the decompression block, ie. 4.8GB/s, decompressed, is quite a lot. In both cases there is plenty of bandwidth.

    Cheers
     
  12. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    627
    Likes Received:
    414
    The channel counts for both flash controller chips are known. Sony uses a custom 12-channel design, while Microsoft uses a PS5019-E19T.
     
  13. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Alright, surprised by this.

    Cheers
     
  14. fehu

    Veteran

    Joined:
    Nov 15, 2006
    Messages:
    2,067
    Likes Received:
    992
    Location:
    Somewhere over the ocean
    What does means
    CE #
    Max: 16​
    ?
     
  15. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,502
    Likes Received:
    24,397
    That's assuming the linked in is correct. Has it actually been seen on teardowns?
     
  16. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    Chip Enable lines, it allows to put more chips on the same channels to grow the capacity, but only one chip can be enabled per channel at a time, exactly like using more dimms on a PC above the physical channel count. But here it's limited to 2TB total.

    The 1.6W controller barely needs a heatsink, the simple conducting tabs to the case are starting to make sense.
     
    #1976 MrFox, Apr 21, 2020
    Last edited: Apr 21, 2020
    TheAlSpark and fehu like this.
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The architect for the Series X gave a >6 GB/s throughput for the decompression block, though the decision to not use that as the official number seems to indicate it's not common.
    https://www.eurogamer.net/articles/digitalfoundry-2020-inside-xbox-series-x-full-specs
     
    BRiT, PSman1700, tinokun and 2 others like this.
  18. zupallinere

    Regular Subscriber

    Joined:
    Sep 8, 2006
    Messages:
    768
    Likes Received:
    109
    From the piece:
    That is pretty neat.
     
    PSman1700 likes this.
  19. MrFox

    MrFox Deludedly Fantastic
    Legend

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,996
    Within a dataset the compression ratio will vary a lot to reach an average cited as "typical". I think that's the reason for mentioning the peak throughput, since the 1:1 incompressible data gets averaged with the 4:1 things like geometry, or masks, or alphas, or lossy-optimised BCn textures.

    Some examples with an imaginary dataset spread as 25% at 1:1, 50% at 2:1, and 25% at 4:1....
    WARNING: May contain traces of Cheap Napkin Math (tm)

    Reading 100MB uncompressed
    25% : 1:1 @ 5.5GB/s (4.54ms)
    50% : 2:1 @ 11GB/s (4.54ms)
    25% : 4:1 @ 22GB/s (1.14ms)
    Total 1.77x compression average
    56.25MB on disk
    100MB/10.22ms = 9.78GB/s

    Reading 100MB uncompressed
    25% : 1:1 @ 2.4GB/s (10.41ms)
    50% : 2:1 @ 4.8GB/s (10.41ms)
    25% : 4:1 @ 6GB/s (4.16ms)
    Total 1.77x compression average
    56.25MB on disk
    100MB/24.98ms = 4.00GB/s

    And let's suppose there's a 10% better lossless compression on XBSX reaching a 2x average....

    Reading 100MB uncompressed
    25% : 1.1:1 @ 2.64GB/s (9.47ms)
    50% : 2.2:1 @ 5.28GB/s (9.47ms)
    25% : 4.4:1 @ 6GB/s (4.16ms)
    Total 1.96x compression average
    51.13MB on disk
    100MB/23.1ms = 4.33GB/s

    Here the problem is that the more they try to raise the compression ratio with BCn optimisers, or BCPack, or RDO, the more it becomes useless if the output is limited to 6GB output rate. That doesn't seem to be the case on Sony's platform, they can use RDO to crank up a "lossy" repacking, and they will get closer and closer to 22GB/s effective.

    Caveat: What Cerny called "data that compress particularly well" might not necessarily mean anything that compressed at 4:1, it could be the amount of processing required to decompress specific data which may or may not be a linear relationship with data in/out bandwidths. There might be iterative or recursive operations in the decompression algorithm varying based on the data, and I also have no idea how that works for ASIC implementations, but it seems to matter on CPU.

    Caveat 2: We don't know exactly what the 6GB/s peak represent on XBSX either.

    Caveat 3: Sony have a 3x advantage in IOPS in addition to the bandwidth, block size might be allowed to be smaller.
     
    megre, Mitchings and BRiT like this.
  20. tinokun

    Newcomer Subscriber

    Joined:
    Jul 23, 2004
    Messages:
    70
    Likes Received:
    87
    Location:
    Peru
    Nitpick: remember that the peak is "over 6GB/s", not exactly 6.
     
    PSman1700 likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...