Next Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by Proelite, Mar 16, 2020.

  1. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,141
    Likes Received:
    1,030
    Location:
    Earth
  2. turkey

    Regular Newcomer

    Joined:
    Oct 21, 2014
    Messages:
    921
    Likes Received:
    634
    Location:
    London
    Don't forget the average temp is probably higher than room ambient as it's in a poorly ventilated av cabinet
     
    egoless, PSman1700 and disco_ like this.
  3. dobwal

    Legend Veteran

    Joined:
    Oct 26, 2005
    Messages:
    5,435
    Likes Received:
    1,497
    There is no reason to store the OS in fast RAM. MS explicitly states that the only component that sees 560 GBs is the gpu. And even if audio or file i/o data ends ups in local ram it only has access to 336 GBs at most.

    One thing has always been true about AMD apus. If you are not the gpu you don’t get full access to bandwidth offered by gddr. You have unified memory but three separate memory pools, cpu cacheable, uncacheable and local. Each with their own max bandwidth because the granularity is different so only gpu related data is interweaved in a way that fully exploits gddr.
     
  4. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    973
    Likes Received:
    129
    Location:
    On my rock
    You can keep saying it but that doesn't help it make sense. The "old way" of selecting clock speeds was simply estimating for the worst-case game, ie. God of War, in Cerny's example. So, all that is being said is that when the next God of War type power usage game shows up, it would struggle to run the CPU and GPU at 3GHz and 2GHz respectively based on his same comments. But rather than holding all games to that lower bound, they are allowing the clocks to ramp up in lower power usage scenarios (ie. all the other games that don't push the system as hard). The lower the activity in the processor, the faster the clock. It doesn't change the fact that at the higher processor activity level, you hit the same clock speed limits as if you were using the "old" method of setting them.
     
    PSman1700 and megre like this.
  5. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,209
    Likes Received:
    5,634
    PSman1700 likes this.
  6. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,078
    Likes Received:
    5,628
    I think the missing information here that is causing some confusion is that no data of significant size (which won't fit the L2 at least) is ever going into one specific memory chip. Memory controllers distribute the data in a RAID0-like manner along all chips to guarantee maximum throughput. In SeriesX, the "fast data" gets distributed among the 10 chips and the "slow data" gets distributed into 6 chips.




    Those 35º are probably referred to room temperature, meaning there's probably some headroom for those in hot rooms who still put their consoles inside cabinets and such.
    Still, there's probably very few people living on 35º rooms. I couldn't handle over 30º room temperature more than 10 minutes, it's way too hot!



    Cheap (and super air-polluting) AC is widespread in hot first-world countries even.
    Who has money for a PS4 but no money for a cheap AC unit?
     
    see colon and turkey like this.
  7. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,141
    Likes Received:
    1,030
    Location:
    Earth
    Those cheap places getting hot have very poor insulation. The heat comes straight in and coldness goes straight out. It's potentially hundreds of dollars a month for electricity to run ac. It's kind on unbelievable what poor construction quality can be like. Places like LA tend to get cooler at night. Some folks just don't run ac and they use fans to get cooler air in for nighttime(easier to sleep)
     
    Silent_Buddha, PSman1700 and BRiT like this.
  8. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,829
    Likes Received:
    10,870
    Location:
    The North
    Sure I totally respect that.
    So if I told the GPU that it's memory address space is from 0000-16GB
    And I told the CPU that it's memory address space is from 10GB-16GB.

    What would happen in this case? Because this is what it sounds like to me.
     
    #1528 iroboto, Mar 31, 2020
    Last edited: Mar 31, 2020
    PSman1700 and AzBat like this.
  9. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,209
    Likes Received:
    5,634
    So as far as memory addresses go, are you saying the first 2 GB chip wouldn't have sequential addresses 0x00000000 - 0x7FFFFFFF? Those addresses would be spread across multiple chips?
     
    PSman1700, AzBat and BRiT like this.
  10. Globalisateur

    Globalisateur Globby
    Veteran Regular Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    3,496
    Likes Received:
    2,190
    Location:
    France
    PS5 GPU and CPU will have variable clocks depending of total APU power consumption
    XBX main ram will have variable bandwidth depending of CPU load and accesses to the main ram

    The irony!
     
    megre likes this.
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,365
    Likes Received:
    3,955
    Location:
    Well within 3d
    At least for GPU-targeted memory, it should be interleaved. I recall one granularity given some time ago was changing channels after 128B.
    CPU memory can have varying functions for assigning addresses. For example, if there is NUMA involvement there can be per-node or more general interleaving.
    Memory systems sneak in multiple levels of indirection, even for physical addresses. Caches can have hash functions that can be adjusted to give different slices responsibility for a given physical address range, but the memory controllers themselves are given ranges and interleaving patterns either in hardware or when they are initialized.
     
    jgp, TheAlSpark and Scott_Arm like this.
  12. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,209
    Likes Received:
    5,634
    Interesting. So I’m the case of series x if they’re saying there is higher bandwidth to 10 GB, is that 10 GB spread across all of the chips or would virtual addressing basically map it across the chips with the larger bus?
     
    blakjedi likes this.
  13. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,947
    Likes Received:
    2,690
    The 10GB is comprised of 1 GB of the 2GB total on each of the 6 higher capacity chips plus the entire 1GB of each of the 4 lower capacity chips. 32 bit access * 10 total chips = 320bit. The 6GB of slow memory is comprised of the remaining 1GB on each of the 6 higher capacity chips. 6 * 32 = 192

    Data would be spread across all of the chips in all each pool to maximize bandwidth.
     
  14. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,994
    If we use an extreme case, say the CPU takes 50GB/s, it imbalance the controllers distribution and costs an equivalent 83GB from the ideal maximum of 560GB/s, so the max drops to a 527GB/s on average because of the stalls.

    The reason is that some chips have fewer operations queued (some serve the GPU only, others serve both CPU and GPU). They serve the same proportion of requests as all the others because they server the same percentage of the GPU memory space, but they don't have the burden of serving the CPU too, so they WILL stall in proportion to the addition CPU requests the others have to serve.

    More reasonably, if it's 25GB/s which I think it more of a normal 8 core bandwidth, it's 543GB/s average. If MS were just using the whole space randomly without any such partitioning, they'd end up with the average between 560 and 336. It becomes obvious that partitioning the upper part is the best solution and reaches close enough to be worth the trouble.

    Interestingly, this problem applies equally to the idea of putting mismatched nand capacities on the 12 controllers PS5, but they could have done exactly the same concept as MS, and put the OS partition on the upper half of the higher capacity chips, and by not accessing it much during gameplay it would have mitigated the possible drop in performance.
     
  15. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    558
    Likes Received:
    341
    As for he part you quote me:

    There are six 2 GB modules... Each conected via a 32 bits lane... 32*6=192 bits (hence the 336 GB/s)

    These will occupy 4 of the 64 bits controllers. And will use six 32 bits lanes from a total of 8 available.

    The remaining two lanes from these controllers are connected to two of the 1 GB modules (making a 256 bits total). The remaining two 1 GB modules will use the remaining fifth controller.

    Now for the rest.

    Yes... you will get 560 GB at all time... 224 + 336 as an example. But this is just mathematics. Fact is you can have a combined total of 560 GB/s, but acess bandwith will fluctuate on each of the memory parts depending on access..

    This could fluctuate between 560 GB/s + 0 GB/s, 392 GB/s + 168 GB/s or 224 GB/s + 366 GB/s, and a lot of combinations in between.

    So yes... a total of 560 GB/s, but lots of bandwidth fluctuation on the parts. This puts a lot of constrains on the usage you can give to the 6 GB.

    And this is why I was talking about access in alternated cicles. In that way you would lock bandwidths on 392 + 168, and know what to expect from each part.
     
  16. KeanuReeves

    Newcomer

    Joined:
    Sep 30, 2017
    Messages:
    59
    Likes Received:
    27
    You are ignoring why they were struggling to keep 2GHz and 3GHz. Remember they stress test their systems in extreme environments with extreme/unrealistic workloads.
    Those extra powerhungry workloads? Yeah, those are outlier workloads that disproportionately bog down the system.
    In order to manage those outlier workloads, they test the system using them at peak theoretical capacity, and that was the scenario they were having trouble keeping the clocks above 2GHz and 3GHz.
    And this was a problem because those workloads are seemingly not used extensively anyway. Likely no game is going to use those instructions at peak theoretical numbers, so why cater so strongly to that scenario?
    IF the devs choose to use those workloads in the usual small numbers, it will not bog down performance all that much, just a few frequency points.
    It is much preferable to do this than to lock the clocks at something way lower that ultimately only serves the purpose of dragging down performance of typical workloads.
     
    Mitchings likes this.
  17. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,947
    Likes Received:
    2,690
    But would lead to unbalanced wear on the flash memory.
     
  18. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    6,488
    Likes Received:
    5,994
    Wear is proportional to capacity. Twice the flash size can do twice the write volume with wear levelling. So it's would be no different than having 4 or 6 additional chips for the OS.
     
  19. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,711
    Likes Received:
    904
    Why didn't sony see this coming? Would it cost that much more resources to devide the memory?

    Like the part of the world where many have their devices, under the tv in a cabinet with other devices warming the whole thing. I dont have it like that personally, but i know many that do. One has their Pro in a cabinet together with a switch and a tv-box, i gets very hot in there, the Pro sounding like the jet it already does. Floor heating under the whole setup.... And then, even in sweden it can get close to 40 degrees celcius in the summer. Last summer we had some days where it was close to 40. Summer 2018 was hot the whole summer

    AC air units are not expensive, but operating them day and night is. And yes you need to, you cant just turn it on to play. Maybe if you cool just one room. Let's hope AC units are not a requirement for any console ever, it has never been before so :p
     
  20. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,365
    Likes Received:
    3,955
    Location:
    Well within 3d
    Total bandwidth is based on physical channels and their bit rate, and GDDR6 modules have 2 16-bit channels each. The 10 GB needs to be on all the chips to have the peak bandwidth amount.
    Virtual addressing for x86 works on a 4KB granularity at a minimum, so there's lower-level details I'm not sure of about where the additional mapping is done. Caches past the L1 tend to be based on physical address, and they themselves might have striping functions in order to handle variable capacity like the L3s in the ring-bus processors. It might delay the final determination of the responsible controller to a when packets go out onto the fabric, which should have various routing policies or routing tables to destinations.
     
    BRiT likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...