Current Generation Hardware Speculation with a Technical Spin [post launch 2021] [XBSX, PS5]

Discussion in 'Console Technology' started by pjbliverpool, Feb 9, 2021.

Thread Status:
Not open for further replies.
  1. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,405
    Location:
    Wrong thread
    Yeah, I know the 10 GB over the whole 320 bit bus is "GPU optimal" and not solely for the GPU. I should have made that clear, my fault entirely. Likewise the other 6GB can in theory be accessed by the GPU (and perhaps is, for some OS operations).

    Thanks, I'll try my best to understand all of this tomorrow when I'm a bit more with it. I really appreciate your efforts to share what you understand.

    My thought when (trying to) read it was that I'd underestimated the complexity and also flexibility of modern chip power management features. It did seem that within the existing Zen power management platform there was already a huge opportunity to implement the kind of power management that Cerny had talked about.
     
    PSman1700 likes this.
  2. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    13,878
    Likes Received:
    4,724
    perhaps with the higher density chips for ddr 5 and faster speeds we could see it make a come back for a ps6 or xbox whatever. 8 or 16gigs of ddr 5 and then on its own bus gddr of whatever speed is avalible , perhaps 8-16 gigs on that side or more.
     
    PSman1700 likes this.
  3. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,405
    Location:
    Wrong thread
    Yeah, I think this is definitely a brag moment for you. :D

    Just wanted to come back to this to suggest it could again simply be about power. 50% / 33 % less work being done should surely mean a somewhat corresponding drop in power used and heat generated in those areas. It might make Sony's strategy of boosting less likely to see huge drops due to AVX operation.

    There may even be hints in the die shots that this cut happened during development of the PS5 APU. I think that another one of Nemez's tweets perhaps shows this:



    "The full featured Renoir CCXs would only be margin-of-error larger, they would probably fit without major issues or redesigns."

    I think, quite possibly, that PS5 started out with full fat FPUs but moved to these skinnier units later, and the footprint is still there. PS5 was probably deep into development and tons of layout work had already been done at this time.

    Lets say Sony were at the point of trying to balance performance, area and power with a given set of technologies. The cuts are probably nothing to do with area, and they're actually costing performance (in some areas), so that'd mean the gain was in the peak power they could consume. And that could be benefit maintaining boost locks across the rest of the system.

    TL : DR - Hypothesis: Sony started out with full fat 256-bit units, reduced them well into development to suit their power / frequency strategy, and the footprint of the original units remains.

    Maybe this is what MS were having a pop at when they talked about having a "server class" Zen 2 implementation, and people were like "u wot?"
     
  4. Allandor

    Regular

    Joined:
    Oct 6, 2013
    Messages:
    842
    Likes Received:
    879
    Sounds plausible.
    My guess would be, that they cut out some of the 256-bit units, because they are not often used but draw much power when they are used. This way the instructions still work but need more cycles, but at the same time they don't need more power than their power-envelop for the CPU allows.
    Not much lost (because those instructions are not really often used in games) but therefor they have a more stable power envelop to clock higher or let the GPU have a bit more power.

    The thing I find really odd is, that decompression (in BC games) seems to work even a bit slower than on PS4 Pro (with an SSD). The Xbox does not seem to have that problem. But maybe the cuts hit especially in those cases although for the general performance it is irrelevant.
     
    PSman1700 likes this.
  5. Jay

    Jay
    Veteran

    Joined:
    Aug 3, 2013
    Messages:
    4,029
    Likes Received:
    3,428
    Is the CPU speed clocked lower in BC mode?
     
  6. Allandor

    Regular

    Joined:
    Oct 6, 2013
    Messages:
    842
    Likes Received:
    879
    Not for titles after may 2020 if I remember correctly.
    Or titles that got some kind of PS5 update patch.
     
  7. Globalisateur

    Globalisateur Globby
    Veteran Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    4,592
    Likes Received:
    3,411
    Location:
    France
    I think this is exactly this. Because with 4 ports of FPU too much power used in a short time would maybe create a drop of frequency (that would impact the whole CPU). So I think the idea is to force developers at doing the same job but slower using 2 ports ideally without dropping the frequency. As 3dilettante wrote the very robust cooling should be enough to take care of heat density.
     
    function likes this.
  8. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,405
    Location:
    Wrong thread
    Well whatever they're doing I agree it's got to be because of power. Zen 2 is one 256-bit unit per core, so I don't think they could have cut out any of the FPUs as such, but limiting the ability in some other way would physically guarantee lower power demands in some other way. I like the port reduction idea because I don't think it would cause a complete redesign of the entire unit, it would be more like selectively removing duplicated elements. Plus you'd still be left with additional room for any small layout changes (I guess).

    I hadn't picked up on some PS4 BC games having slower decompression on PS5. That's curious, but interesting. Could there be some kind of hardware decompression unit in PS4 that's been removed or bypassed in PS5 due to it being superseded? Some kind of single threaded CPU fallback on PS5?

    Yeah, and I don't think it'd necessarily be just to reduce / prevent CPU clock drops. Power not being used by the CPU is directed to the GPU to sustain high boost rates. Guaranteeing that a chunk of power could no longer be taken by the CPU under any circumstances would mean you can reliably deliver higher lowest and average clocks to the GPU, all while staying in your existing power and cooling capability that you've been planning on.

    With the move to a potentially less power demanding FPU, perhaps the CPU doesn't have problems maintaining 3ghz under certain 256-bit loads any more. If that was under an old system and before the current FPU, PS5 might now be a in better position to keep CPU clocks high or at max whatever you throw at it.
     
    PSman1700 likes this.
  9. davis.anthony

    Regular

    Joined:
    Aug 22, 2021
    Messages:
    423
    Likes Received:
    147
    Maybe a latency penalty from having to go through the I/O complex without actually using it?
     
  10. Theeoo

    Newcomer

    Joined:
    Nov 13, 2017
    Messages:
    173
    Likes Received:
    85
    Correct me if I'm wrong but does that mean if developers start making heavy use of AVX instructions in game engines then PS5 will potentially perform worse than XBSS/XBSX/PC?
     
    #430 Theeoo, Oct 1, 2021
    Last edited: Oct 1, 2021
    PSman1700 likes this.
  11. If they started pushing FP256 instructions a lot, then yes it would, because the CPU cores would start throttling down heavily.
    Realistically they won't, because Sony knows how often these instructions come up and that's why they probably used density-optimized transistors on those blocks.
     
    Inuhanyou likes this.
  12. Allandor

    Regular

    Joined:
    Oct 6, 2013
    Messages:
    842
    Likes Received:
    879
    Well, yes it would, but AVX is really only an edge case in games.
    AVX instructions do also consume much more power, so this would be another thing why it would hurt PS5 performance (more power for the CPU less for the GPU). But heavy usage of AVX is really nothing for game so far.
     
    PSman1700 likes this.
  13. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    Games that leverage entity component system with burst compiler like Unity, support various types of instructions up to AVX512.

    Even though the CPU will be impacted by it; it’s going to be one hell of a game lol. If a game is pushing ECS to high loads, it will be a sight regardless if GPU performance takes a hit. Imagine a lot of active stuff moving on the screen at once. Graphics will need to downgrade anyway.
     
    BRiT and PSman1700 like this.
  14. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    So when games do use AVX extensively, what are the main functions of it? I think BFV5 MP somehow used it to some extend which made OCing and temperatures go somewhat more unstable/higher. DICE never officially stated BFV uses AVX instructions, though CP2077 certainly does make use of it since there was a patch to fix issues regarding AVX, altering code so older CPU's lacking AVX could run the game. No idea what concessions where made though..

    https://www.dsogaming.com/mods/cyberpunk-2077-patch-1-3-avx-mod-fixes-the-game-on-older-cpus/

    Found this
    https://www.prowesscorp.com/what-is-intel-avx-512-and-why-does-it-matter/

    ''Intel AVX-512 can accelerate performance for workloads and use cases such as scientific simulations, financial analytics, artificial intelligence (AI)/deep learning, 3D modeling and analysis, image and audio/video processing, cryptography, and data compression.''

    It seems that while AVX(512) hasnt been used in games all that much but it sure can assist in certain tasks that seem applicable in future modern games.
     
  15. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    AVX instructions are designed to do a lot of things in parallel. And the smaller the size, the more you can cram into the SMID unit.
    Typically for games, it would be to access say, do a collision check on a lot of objects. doors, NPCs moving etc.
     
    PSman1700 likes this.
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Going by the instruction profiling, it's not just AVX. The FPU is half as effective at 128-bit and 256-bit code, hence the same performance drop in SSE operations.
    Vector loads tend to stress AMD's boost speeds the most on the desktop, so power reduction would seem to be the motivator. However, whether this needed such a significant re-plumbing of the FPU points to a very significant constraint, like the GPU leaving an unusually limited amount of power for the CPU section.
    Microsoft didn't resort to this, and promises consistent clocks with the apparently standard Zen2 FPUs even with higher clock speeds.
    If there's ever a salvage SKU for that, perhaps we can get similar profiling to see if it's really that consistent or other less drastic methods were used to limit power, like instruction issue throttling or duty-cycling of the hardware.

    Why something like those measures wouldn't be good enough versus a thinned custom FPU is a point of curiosity for me.
    Perhaps AMD's method isn't consistent enough for a fully-featured vector FPU for what Sony wanted for its model SOC, or that power ceiling is notably constrained even against another console APU.



    Maybe that's the case, since there may have been at least one notable revision in the PS5 validation hardware leak, with no clear indications as to what was changed.
    Another is that Sony may have only paid for a revamping of the FPU, and if AMD kept the rest of the core and CCX with the same layout, there's going to be spare space.

    I'm holding out for more instruction analysis at some point. The cuts are pretty significant even outside the 256-bit realm Cerny mentioned.

    The 50% loss in SSE points to removing whole ports and the ALUs on them. However, doing this would require rebalancing the units on the remaining ports, as I don't think you can cut one or two ports from the Zen2 FPU without needing to put some functionality on other ports that would be lost entirely, or would lose more than 50%.
    The vector division benchmarking so much slower is a sign of potentially other hardware changes in the unit, since AMD's FPUs only have one port for that.

    Which leaves me to wonder how much more generous the Series X power budget is for its Zen2 FPUs, or if they did something else to constrain consumption. They're promising constant and higher clocks without a liquid metal TIM.

    I think Zen2 has more than one 256-bit unit. Depending on the instruction mix, it could go to 4 256-bit operations per clock. A 50% drop from that is still 2 256-bit operations per clock. The 50% drop in SSE points to losing whole units, and probably needing a re-balance of what's left.

    The PS5 has a superset of the PS4's compression support. Perhaps a conservative emulation of the low-level functionality or APIs is going through extra steps, or the backwards compatibility leads to a thicker container or worse data layout than native?

    The raw numbers for non-AVX are substantially worse than similar Zen2 CPUs, not going into other things like higher memory latency and smaller L3 cache. Zen 3 is another class entirely in terms of FP performance.
    There are some indications of CPU-limited scenarios where there is sometimes a modest shortfall versus the Series X, but it's not something that shows up as consistently as the FPU numbers would indicate.
    There are other bottlenecks that both consoles would have, but we may need to keep an eye out for later games that could push AVX or non-AVX vector throughput in a way that's more obvious than early titles.

    AVX 512 is unlikely to find much use in games because AMD flat-out doesn't support it and Intel does not consistently implement it in consumer hardware (or even its server hardware for that matter).

    I'm not sure about the density-optimized transistor claim, or rather I'm not sure if there was an additional tier of high-density transistor beyond the HD process AMD utilized for Zen2 already.
    The math shortfall in 256 and 128 bits points to wholesale removal of hardware, which saves in ALU area, wiring for fewer ports, and smaller register cells because they don't need as many bit lines due to the cut in ports.
     
  17. Theeoo

    Newcomer

    Joined:
    Nov 13, 2017
    Messages:
    173
    Likes Received:
    85
    Slightly disappointing that they gimped the CPU like that. I'm guessing this stems from the decision to clock those CU's as high as possible and use fewer compared to XSX as a cost saving exercise.
     
    PSman1700 likes this.
  18. Globalisateur

    Globalisateur Globby
    Veteran Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    4,592
    Likes Received:
    3,411
    Location:
    France
    But why? Do you see any impact on any games performance? MS designed their machine with a dual purpose in mind: gaming and cloud services (focus on compute and CPU FPU but supposedly less CU efficiency). And Sony designed their box as a purely gaming device and a focus on 120hz gaming and for now they have succeeded at that.
     
  19. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,118
    Likes Received:
    3,090
    You wont notice much of that for this cross-gen period anyway, as games do not use much of the more advanced features that new generations bring (be it AVX, ray tracing, mesh shading etc).
     
  20. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,405
    Location:
    Wrong thread
    Thanks for pointing this out. I'd got swept up in the AVX thing, but yeah that does seem pretty important. SSE takes a hammering too, and as far as I'm aware that's widely used in games engines.

    I've been trying to have a look for clues about this. As far as I can tell from what's out there, it's just regular Zen 2 ("server class" as MS puzzlingly said). At Hotchips MS simply said of the CPU:

    "2x SIMD FP/ pipes/core: 2 MUL and 2 ADD AVX256 per clock -> 32x SPFP ops/clk"

    Looking on that wikichip place, it says of Zen 2:

    "This improvement doubles the peak throughput of AVX-256 instructions to four per cycle, or in other words, up to 32 FLOPs/cycle in single precision or up to 16 FLOPs/cycle in double precision."

    Which would appear to be the same, unless I'm missing something. At Hotchips MS also reckoned "AVX256 gives 972 GFLOP over CPU" (quoting Anand's live notes on the presentation).

    972 / 8 cores / 3.8 = 31.97 FLOPs / cycle. Or basically the 32 FLOPs / cycle.

    I think they'd have to be engaging in shenanigans if this wasn't basically true (in as much as any peak figures are) over a period of time.

    I've been thinking about the power sharing between GPU and CPU - iirc the balanced can be adjusted every 2 ms (can't find source now). Perhaps Sony found that a relatively small use of vector instructions could cause a relatively (i.e. up to 2 ms) window where the GPU was spending periods getting less power than was strictly necessary?

    This whole thing is getting even more interesting thing now, as the possible implications are quite widespread. I would have thought most games are using at least 128-bit vector operations to accelerate their physics engines.

    Thanks for the correction, I should have checked before posting.
     
    DavidGraham, Allandor and BRiT like this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...