How to understand the 560 GB/s and 336 GB/s memory pools of Series X *spawn*

Discussion in 'Console Industry' started by Metal_Spirit, Apr 10, 2020.

Thread Status:
Not open for further replies.
  1. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    587
    Likes Received:
    361
    NOTE: Post corrected changing some parts as doubts are cleared.

    Can someone help me with a doubt, please?

    I'm going to place the thing as I see it, you guys fell free to correct me, if anything wrong.

    Although the use of the word pool is not appropriate, I will use it regardless since it helps a lot in typing.

    Xbox séries X as 4 1 GB modules, and 6 2 GB modules. That's 10 modules.
    Each module is connected via a 2x16 bits channel, A total of 5 64 bits controllers.
    The 4 1 GB modules have the 2x16 bits channels dedicated. They belong to the 10 GB fast RAM pool, and the 4 of them have a channel total of 128 bits, suplying up to 224 GB/s.

    But the remaining 6 2 GB modules, they divide their capacity. The first 1 GB adds to the remaining 4 GB and constitute the 10 GB fast Pool. These 6 chips have eah a 2x16 bits channel, com we have a total os 293 biits. Add these withthe 128 bits from the 4 1 GB modules, and we have the 10 GB 320 bits pool, with 560 GB/s bandwidth.
    Now, the second upper 1 GB from these 2 GB modules, constitutes de 6 GB slower pool. These do not add to anything, so we have a total of 192 bits channel, with 336 GB/s maximum bandwidth.

    And so far so good!

    Question is: The 2x16 bits channels we have on each of the 2 GB modules, cannot be used at the same time on both pools. If both are dedicated to the upper 1 GB on all modules we do have 336 GB on the 6 GB pool, but the 10 GB pool stops rceiving any data from these 2 GB modules, since it has no channels dedicated to it. So the fast RAM pooll decreases it's bandwidth to 244 GB/s.
    Both pools till keep adding up to 560 GB/s, even though the bandwidth of each pool varies a lot.

    But if we look at the upper 6 GB pool we see that we only have 3.5 GB free. The remaining memory is used by the OS.

    Since the demand on this RAM will never be that big (CPU only), there is never a need to dedicate the 2x16 channels to the 6 GB slow pool.
    So we can only dedicate one of these 16 bits channels, and access both pools at the same time.

    In this case we have 392 GB/s on the fast ram pool, and 168 GB/s on the slow ram pool.

    As I see it there are no other alternatives, unless we can acess the upper part of the 2 GB modules separately, and not on all chips at once. Is this possible?

    Accepting it is not, that access to the upper part of the 2 GB requires acess to all the six at once, then 392 GB/s on the fast RAM pool, and 168 on the slow ram pool is what we can count on!

    We can keep most access on the fast RAM, limiting acess to the slow RAM, but even if that increases the bandwidth available on the fast RAM, as soon as there is an acess to the slow ram (and they will need to exist), bandwidth will decrease.
    This can lead to uncontrolable stutter in games. And if CPU is placed working on the slow memory, the bandwidth will vary a lot, and counting on having more than 392 GB/s on the fast memory can lead to sudden performance decreases.

    But if not all of the 168 GB/s are used, the remaining is just lost, not added to the fast pool.

    Now... will the GPU get 392 GB/s?

    Well... no! Not even that! Off course this will depend on the game, but GPU/CPU memory usage can be 70/30 on a more demanding game. And this means, 3.5 GB will not be enough for the CPU, and he will need to use some memory on the fast pool, stealing extra bandwidth from the GPU.

    Since Xbox séries X has 52 CUs, the ending result (492 GB/s) seems to be worse that the 448 GB/s available on the PS5.

    Is there anything wrong with what I'm saying? I see lot of persons saying this like: "If we the CPU uses 50 GB on the slow RAM, we still have 510 GB on the fast RAM for the GPU ", but as I'm seeing it, this will not happen. As I see it 68 GB/s are wasted since those 3.5 GB will not output more that 100 GB/s, but we will need to reserve 168 GB due to the usage of the 6 16 bits channels. As such, if the CPU is working in the upper 6GB slow memory the global bandwidth will be 492 GB.
    As I'm seing it, only with a very limited usage of those 6 GB, we can get more than 392 GB/s on the fast memory pool, and effectively aproaching the 560 GB/s.

    Please fell free to correct me, giving your explanation.

    Thank you!
     
    #1 Metal_Spirit, Apr 10, 2020
    Last edited: Apr 11, 2020
    abcgamer, blakjedi and Shifty Geezer like this.
  2. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    16,847
    Likes Received:
    16,672
    @Metal_Spirit Yeah, you're making it too complicated and turning out with the wrong conclusions. I believe it was already explained in the DF article or our discussion about it.
     
    PSman1700 likes this.
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,402
    Likes Received:
    4,111
    Location:
    Well within 3d
    Pool is a term others have used. It's fine if you're discussing the memory space, or the overall capacity as a characteristic separate from the bus width.

    The fast pool is a range of memory addresses that when accessed can have a theoretical peak of 560 GB/s bandwidth. Bandwidth depends on channel width and speed, which is independent of the capacity of the chips.
    It would be best to not try assigning modules to a given pool. All the GDDR6 modules contain portions of the address range that makes up the fast pool. If they didn't, then it would exclude their channels's bandwidth from being usable for the fast pool, and it wouldn't be the fast pool.
    The slow pool is the range of storage addresses that are not distributed over all chips because of the capacity difference, and so only the chips that have addresses in that range can supply requests at those addresses.

    Channels don't care what pool they are being asked to access from. It's only a question of whether the chip will make a request over them, and for the slow pool the chip will only generate requests on 6 out of 10 chips.
    A channel could burst data from the fast pool in one transaction, and then burst data from the slow pool immediately afterwards. This isn't a special restriction for the pool setup, just a result of a channel only being able to transmit data for one burst at a time.

    The system should be distributing memory spaces evenly across the channels. Any range of addresses long enough to stretch across multiple chips can satisfy as many parallel transactions as there are channels available.
    Past the requirement that an address range be distributed across the chips at a granularity far smaller than a GB, there's no point in discussing a given amount of capacity only providing an X amount of bandwidth.
    Requesting a stretch of some hundreds of KB from the channels that are associated with addresses in the slow pool will generate enough parallel accesses to give a rate of 448 GB/s. (edit: correction 336, juggled wrong number in calculations)
    Requesting a single memory location in the fast pool will top out at 28GB/s.

    The OS probably doesn't need as much bandwidth, but since the game also has part of the pool you cannot apply that logic to the physical distribution of channels.

    That would limit the OS to a fraction of one GDDR6 chip's capacity. A channel is physically attached to half the storage in a GDDR6 module. It cannot service a request in another chip's address range.
     
    #3 3dilettante, Apr 10, 2020
    Last edited: Apr 10, 2020
    blakjedi, function, VitaminB6 and 8 others like this.
  4. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    587
    Likes Received:
    361
  5. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    16,847
    Likes Received:
    16,672
    Wouldn't that be 336 GB/s for the slow pool addresses?
     
  6. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    16,847
    Likes Received:
    16,672
    Just read 3dilettante's latest reply, so you don't have to go searching for anything.
     
    PSman1700 likes this.
  7. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    587
    Likes Received:
    361
    If this is like you say then we are fixing acesses at 392 GB/s on the Fast Memory and 168 GB on the slow memory.
    There is no 560 GB/s on the fast pool and no 336 Gb/s on the slow poll, just 560 GB/s on both!
     
  8. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    11,483
    Likes Received:
    12,335
    Location:
    The North
    @3dilettante reply to you is fairly thorough. I would go back and read it over again.
     
    tinokun and PSman1700 like this.
  9. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    16,847
    Likes Received:
    16,672
    No. Wrong again.
     
    tinokun and PSman1700 like this.
  10. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    587
    Likes Received:
    361
    Yes... in this case the first 1 GB of all modules. 56 GB/s per module, over a 32 bits bus, makes the 560 GB/s.

    Fact!

    Since adress ranges for the fast Memory poll cover the first 1 GB only, the 1 GB modules are excluded from the 6 GB pool since that's their entire capacity!

    Yes... the capacity over the 1st GB!

    Never said otherwise. But when fetching data from one pool thay cannot read at the same time from the other pool. So you cannot account for the same channel on both pools at once.

    Fact!

    Each 16 bits channel will top out at 28 GB/s. Shure!
    Did not catch how you reach the 448 GB/s though! The slow pool cannot provide more than 336 GB/s.


    Of course you cannot acess other chip, never questioned that! Is it physically attached. Never said that!
    BTW, I missundertood this phrase when replying to iroboto!

    But after reading your reply, I seem to have found nothing that contradicts what I wrote! Did I miss your point?
     
  11. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,957
    Likes Received:
    2,706
    I think in this case he was referring to a physical address on one of the chips.
     
    BRiT likes this.
  12. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    587
    Likes Received:
    361
    Sorry... I misunderstood what was beeing said.
    Indeed my reply made no sense and was wrong!
    But as far as what I wrote on the first message, from what I understood from 3dilettante answer he didn't seem to contradict anything I said! Maybe my wording was not the most correct, but I wasn´t saying anything different from what 3dilettante said.
     
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,402
    Likes Received:
    4,111
    Location:
    Well within 3d
    Correct, I was running some other comparisons at the time of writing and input the wrong number. I added a note in my post.

    There's 560 GB/s max for the whole memory subsystem.
    Bandwidth is determined by physical channel width and speed, not memory pool.

    The range of addresses for the fast pool is distributed across all the chips in the system. The slow pool is distributed across the additional capacity that 6 of the modules have over the other four.
    Accesses to all locations can total up to 560 GB/s, in theory. Of that mix, up to 336 could be from accesses to the slow pool.
    Both the OS and game can make use of the slow pool, and I'm not entirely sure that all of the fast pool is out of reach of the OS. Even if there's no kernel memory in the fast pool, there could be functions that transfer data into game memory that could be OS-related.
     
    blakjedi, function, VitaminB6 and 6 others like this.
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,402
    Likes Received:
    4,111
    Location:
    Well within 3d
    That doesn't automatically translate into a loss of bandwidth for the game. The game has part of the slow memory pool as part of its allocation, so a game reading from the slow pool at max bandwidth and reading from the remaining modules would be getting peak bandwidth regardless.

    That was a math error on my part, I added a correction.


    That was part of the point of contention I had with the portion of your post concerning:
    "How much bandwidth can these 2.5 GB really output? If the full 2GB module can output 56 GB/s, these 3.5 GB (regardless of being on 2 modules or a pieve on all modules) can only output 98 GB/s."
    The last claim about 3.5GB of storage only producing 98 GB/s regardless of whether being on all modules (all 6? of the larger chips) would require that to happen.
     
    function, tinokun, PSman1700 and 2 others like this.
  15. Proelite

    Veteran Regular Subscriber

    Joined:
    Jul 3, 2006
    Messages:
    1,483
    Likes Received:
    865
    Location:
    Redmond
    @3dilettante Is it accurate the only downside of this "split" pool is loss of flexibility for the dev?

    According to MS, CPU audio and file IO requires no more than 336GB/s. I wonder what non-graphical work would benefit from 560GB/s. If there are little or none, then this "split" pool architecture might be the design for consoles going forward?

     
    VitaminB6 and disco_ like this.
  16. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,957
    Likes Received:
    2,706
    And that only in that you don't want to exceed 10GB of fast data. Overflowing the slow partition doesn't really hurt anything given you have a finite total amount of memory anyway.
     
    disco_ and PSman1700 like this.
  17. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    587
    Likes Received:
    361
    I know... That was a mistake on my part
     
  18. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,402
    Likes Received:
    4,111
    Location:
    Well within 3d
    The primary downside I can see at this point is that devs need to pay attention to where the memory resources for the most bandwidth-intensive functions are being placed, and maybe some kind of capacity pressure if they have a renderer that needs more that 10 GB of data on-hand. However, I haven't seen indications that this a pressing issue at this stage.
    Unlike with the Xbox One's ESRAM, or the EDRAM for the Xbox 360 or PS2, the ratio of "fast" to total memory is extremely generous. Most memory is fast, the slow bandwidth is still generous, and 10 GB is a lot of memory to need for high-bandwidth functionality. Most of the time, a minority of memory is hit very many times, rather than tens of GB of memory being run through in a row.

    As far as why CPU audio and file I/O don't need more than 336 GB/s, I interpreted the Goossen quote to mean that the CPU and IO blocks have infinity fabric interfaces that have bandwidth on the order of 32B at ~1.8 GHz, which means their bandwidths are lower than even the "slow" value of 336 GB/s. Only a client capable of generating more than 336 GB/s of traffic would know the difference, and per the interview that would be the GPU--which makes sense.
     
    function, VitaminB6, DSoup and 6 others like this.
  19. Proelite

    Veteran Regular Subscriber

    Joined:
    Jul 3, 2006
    Messages:
    1,483
    Likes Received:
    865
    Location:
    Redmond
    RIP Xbox One.

    For a mid gen upgrade, a SOC with 2+GB of HBM at 1024+GB/s and unified pool of 32+GB DDR5 8400 at 260+GB/s might give the best perf / dollar cost? This is assuming that the HBM can be added on substrate and not via a silicon interposer.
     
  20. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,957
    Likes Received:
    2,706
    I saw you put this forward before and it seems like a reasonable explanation for something I was confused by initially.
     
    blakjedi and function like this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...