Next Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by Proelite, Mar 16, 2020.

  1. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    6,485
    Likes Received:
    5,990
    egoless and ToTTenTranz like this.
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,365
    Likes Received:
    3,955
    Location:
    Well within 3d
    Could you clarify what Sony did not see coming?
    The PS5's memory chips are symmetric, so that part of the conversation isn't relevant.
    If you are talking about the matrix unit discussion, there isn't evidence that it applies to anything but the compute-only Arcturus.
     
  3. Rockster

    Regular

    Joined:
    Nov 5, 2003
    Messages:
    973
    Likes Received:
    129
    Location:
    On my rock
    Not ignoring and not true. Cerny said they attempt to estimate for the worst case "game", not some theoretical or unrealistic possibility. And if they underestimate that, then it could result not only in an extremely loud system but potential overheating and shutdown. You are trying to read into his comments the notion that no applications realistically hit that upper power band which is simply not true. And selectively quoting scenarios such as a map screen or the use of AVX instructions as the only reasons for incurring high power draw. I think the variable clock solution they have come up with is ingenious and an excellent idea, as it provides the best possible performance given their acoustic and power design targets in ALL scenarios. Again, however, this doesn't change the fact that their acoustic and power design targets ARE going to limit overall system performance as games push greater utilization of the hardware and it reduces clock speeds to compensate as designed. His expectation is simply that "most" games, whatever the definition of that is, aren't going to push that hard and as such run at or near the max clocks often.
     
    Silenti, PSman1700 and BRiT like this.
  4. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,947
    Likes Received:
    2,690
    The assumption would be that the configuration they chose allowed for adequate bandwidth and adequate capacity to service the needs of their particular design. They didn't divide the RAM because they didn't need to.
     
    egoless, tinokun and PSman1700 like this.
  5. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,209
    Likes Received:
    5,634
    I think I just hadn't remembered correctly about how the memory bandwidth had been specified. It's all making sense now.
     
  6. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    6,485
    Likes Received:
    5,990
    I think the misunderstanding here is that when Cerny talks about "the previous technique" he doesn't mean the part about fixed clock, it's the part about designing for unknowns, which is indirectly caused by fixed clocks. The new technique doesn't have to design for unknowns but it requires variable clocks. In turn he adds that it allows higher clocks (peak) and better average.

    If they estimate 1800mhz can cause peaks up to 180W even 1% of the time, they have to add a margin just in case some game ever reach 200W. They end up with a 200W design cost, a normal consumption of 150W, and only a 1800mhz performance.

    I f they cap the power at 200W and vary the clock to keep it at 200W, they end up with 2230mhz most of the time, and 1800mhz less than 1% of the time. the average becomes ridiculously advantageous for the exact same design cost.
     
    #1546 MrFox, Mar 31, 2020
    Last edited: Mar 31, 2020
  7. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    556
    Likes Received:
    341
    Now... try to access data on both at the same time and keep bandwidths.
    Don´t forget the 192 bits on the slow RAM are shared with the 320 bits on the fast RAM. It's not 320+192!
     
  8. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,209
    Likes Received:
    5,634
    So on PS5 if both the CPU and GPU try to access memory at the same time, how much bandwidth does the GPU get?

    Or are you talking about the GPU accessing memory in address ranges that split across both pools?
     
  9. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,947
    Likes Received:
    2,690
    You don't. You're either doing a 320-bit access or a 192-bit access depending on which client is asking for that particular segment of data and which pool it exists in.
     
  10. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,076
    Likes Received:
    5,626
    Maybe you can, if the system found a way to fill some latency-critical data for the GPU using the memory that isn't connected to the same 6 channel bus that the CPU can access, though that leaves you with a 4-channel 128bit bus.

    I really don't know if modern memory controller units allow that level of granularity though, or if it makes practical sense.
     
    function likes this.
  11. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    556
    Likes Received:
    341
    The real problem here is not the bandwidth decrease caused by the CPU or GPU... CPU and GPU usage will decrease bandwidth in any system.
    Problem here is that to keep 560 GB/s steady you need to read from both pools. But on the correct percentages, because the more usage you give to the second pool, the more bandwidth on the fist decreases.
     
  12. zupallinere

    Regular Subscriber

    Joined:
    Sep 8, 2006
    Messages:
    750
    Likes Received:
    94
    Whatever the PS5 cooling situation is the power supply only has to hit the power limit (or a tad better ) by design of the console. Does that save much in money from the power supply standpoint at the very least ?
     
  13. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,947
    Likes Received:
    2,690
    To get 560GB/s you would have to exclusively read from one pool at full load for an entire second. Where you are getting tripped up is you are using a metric of capacity to do work over a period of time and trying to apply it to moment to moment, cycle to cycle usage. If you were to precisely track the amount of data that was actually transferred over any given second when running a game, I'll bet it would be some lesser number than the theoretical max bandwidth. So whether this theoretical max number goes up or down with any particular usage pattern is irrelevant. What's relevant is whether this particular setup delivers sufficient available bandwidth to meet the needs of the system as it needs it.
     
  14. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,897
    Likes Received:
    14,807
    Location:
    Cleveland
    Just like the more the CPU reads from the 448 GB/s the more the GPU speeds will be lowered.
     
    egoless and PSman1700 like this.
  15. Esrever

    Regular Newcomer

    Joined:
    Feb 6, 2013
    Messages:
    768
    Likes Received:
    532
    There isn't 2 pools. There is some chips with 2GB and some with 1GB but to get the speed, you are reading from all chips. It makes no sense to think that you lose bandwidth from the fast memory when reading the slow memory. If you need to read the higher addressed bits, it will be slow during that operation. If you are reading the lower bits, it will be fast. The GPU will be able to access all 10GB at full speed if the CPU isn't holding up the memory controller/bus and if it needs the slower memory, it will obviously be a slower average but that is where a game dev can potentially design their memory access to not make this a problem for memory access that requires bandwidth.
     
    see colon, PSman1700 and BRiT like this.
  16. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    556
    Likes Received:
    341
    On both systems the GPU gets what remains after the CPU usage.
    The question is that on series X for the GPU to use to the full the bandwidth that remains, he must have to be constantly reading from both pools. Otherwise Bandwidth usage will be sub-optimal.
     
    KeanuReeves likes this.
  17. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,897
    Likes Received:
    14,807
    Location:
    Cleveland
    No. GPU only has to read from the 10GB pool to get the full bandwidth that's left. This is no different than the PS5. Why does everyone think it's any different?
     
    RagnarokFF, egoless, blakjedi and 3 others like this.
  18. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    10,783
    Likes Received:
    10,800
    Location:
    The North
    I think you'd run into these issues anyway whether it was 10x1GB chips, 10x2GB chips, PS5's setup, or XSX current setup. Both of you are describing a challenge of having memory contention between CPU and GPU, not a contention problem between 2GB and 1GB. Memory Contention by _default_ is sub optimal. Having to share memory between the CPU and GPU has it's pros and cons.
    Cons: you have contention
    Pros: you waste less cycles copying back and forth, you can access the same memory locations, you can efficiently have 2 processors work on the same memory locations without needing a copy and burdening other buses. You guys were absolutely there when HUMA was the big thing to talk about for PS4 and XBO didn't have it because of split pools of memory. Didn't seem like an issue then, why is it now?

    The advantages outweigh the cons and that's why we do it.

    The only difference here is that with PS5 you run into full memory contention, and we have a graph of how bandwidth drops significantly as the CPU uses more bandwidth on PS4. Which makes sense, while CPU is getting the data it needs, it's not going somehow mix the request and fill the gaps with GPU data. It's going to get it's pull, then the GPU gets what it needs, and vice versa.

    On XSX it's just very obvious now what is happening you know exactly which chips will be under contention and which memory won't be.

    It's not about GB/s, that doesn't even make sense, you're never going to maintain 560 GB/s every single second anyway, you're just going to pull data as you request it. It's not like for a full second you can keep pulling and putting 560GB/s of data back and forth on system. That's means your memory is running a full 32bits of data every single clock cycle from every single chip for a full second no breaks. No imperfect pulls.
    If you can't saturate a CPU or GPU to 100% saturation, there was no way your memory would be like this as well. You'll always grab an imperfect amount of data. Developers can make things better by making texture sizes to be exactly the amount that memory would grab from each chip of course; but I mean there is only so much you can do.

    You're going to have inefficiency somewhere.

    On PS5 when the CPU needs its work, their GPU will be locked out until the CPU gets what it needs except for the chips it can pull from.

    ON XSX when the CPU needs its work, their GPU will be locked out until the CPU gets what is needs, except for the chips that are still available to pull from. In this case, there will always be 4 chips dedicated for the GPU that the CPU can't touch.
     
  19. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    6,485
    Likes Received:
    5,990
    It all add up together, but probably more the cooling cost than the PSU, there's almost always standardized designs from PSU manufacturers, they adapt well tested designs (already passed all regulation worldwide, dirt cheap to make), there's very little difference in cost between say a 300W and a 350W.

    The problem I see with cooling is that it can reach a threshold where different materials and assembly becomes required. Like going from just aluminum, then alu+copper slug, then add heatpipes, then more heatpipes, then a large vapor chamber is required.

    So maybe Sony decided to stay with two or three heatpipes instead of a vapor chamber?
     
    zupallinere, shiznit and BRiT like this.
  20. Globalisateur

    Globalisateur Globby
    Veteran Regular Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    3,493
    Likes Received:
    2,189
    Location:
    France
    On Xbox Series X there is also going to have the usual memory contention problem on top of the aformentionned reduced bandwidth when CPU is accessing the memory.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...