Next Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by Proelite, Mar 16, 2020.

  1. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,277
    Likes Received:
    2,613
    Location:
    Wrong thread
    It's hopefully not all bad!

    Assuming next gen consoles are based on something like the 4xxx series APUs (Renoir), there's some good news in terms of latencies.

    https://www.anandtech.com/show/1570...k-business-with-the-ryzen-9-4900hs-a-review/2

    Inter CCX cache accesses are faster for the monolithic APUs than the chiplet designs. For the chiplet based desktop processors, inter-CCX access goes off-chip even if the other CCX is on the same physical chiplet, as it's done via IF routed through the big "hub" IO chip containing the memory controller.

    So Renoir takes about 1/4 off the inter CCX latency. Perhaps in terms of games, this could make up somewhat in terms of IPC for having less L3. I suspect the huge L3 on Ryzen 3xxx series desktops is due to a common chiplet with server targetted stuff, and that for purely desktop and gaming purposes it's not possibly optimal use of the die area (not all workloads receive the same benefits from cache scaling).

    Anandtech (Ian Cutress writing) also had this to say about the smaller L3 in Renoir:

    It would also be interesting to know how main memory latency in consoles compared to Renoir and Matisse, particularly under heavy load.

    If infinity fabric speed is tied to the memory clock as in Matisse, then 14 Gbps might be quite close to the 3733 mhz "sweet spot" that AMD talked about for that setup...?
     
    Pete, chris1515 and Inuhanyou like this.
  2. Barrabas

    Regular Newcomer

    Joined:
    Jul 29, 2005
    Messages:
    316
    Likes Received:
    272
    Location:
    Norway
    It seems that one change we can se more of in games because of SSD's is more use of unique animations of NPC characters. This is good to make let's say a city more real and life like with people doing many different things in contrast to "robots" marching and bumping into each other. Maybe we will se more advanced series of animation, let's say passing a character doing some paintwork on a wall suddenly falls down his ladder and raising up again to brush of his clothes:wink:. I guess it all falls down to budget and the amount of work put in to the games, but at least it seems SSD's open up more possibilities for this.

    "the SSD storage speed means we can offer many unique motion-captured animations"
    http://thisgengaming.com/2020/04/23...realistic-environments-unique-npc-animations/
     
  3. RobertR1

    RobertR1 Pro
    Legend

    Joined:
    Nov 2, 2005
    Messages:
    5,735
    Likes Received:
    926
    Maybe we can use this gen to get rid of the plastic shine on everything made famous by unreal engine.
     
  4. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,520
    Likes Received:
    782
  5. Barrabas

    Regular Newcomer

    Joined:
    Jul 29, 2005
    Messages:
    316
    Likes Received:
    272
    Location:
    Norway
    Was it not obvious in the article?
    "Whilst this was a discussion focused on the new Series X console, it’s safe to assume this applies to the PS5 as well"
     
    PSman1700 likes this.
  6. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,520
    Likes Received:
    782
    Yes, true. I was referring to the tweet below it, nvm :p
     
  7. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    21,578
    Likes Received:
    7,130
    Location:
    ಠ_ಠ
    That's somewhat entirely up to the artists & dev schedule for tweaking the shader input/outputs.

    There are examples of devs using UE3 or UE4 to provide a unique look, but it comes down to budget and direction.
     
  8. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    3,922
    Likes Received:
    2,636
    Being able to target a specific CPU architecture will mitigate the theoretical performance difference with PC parts, though. PC CPUs are designed to make even non-optimal code run fast. These consoles aren't going to benefit as much from a high single or few core turbo frequency, for example, since you'd expect console games to be trying to use all the available threads whenever possible.
     
  9. Mitchings

    Newcomer

    Joined:
    Mar 13, 2013
    Messages:
    113
    Likes Received:
    172
    I recall a lot of rumours regarding a reduction in L2 Cache on the CPU to 1/2 or 1/4 of its desktop counterparts.

    In regards to PS5 at least, would that really be a good idea given the 448GB/s that has to feed the CPU, GPU and Tempest while dealing with contention?

    I'd assume a fat cache would help mitigate the CPU bandwidth requirements. It was my understanding that the larger caches on Zen2 played a significant part in its performance gains over previous iterations..
     
  10. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    43,525
    Likes Received:
    15,981
    Location:
    Under my bridge
    I don't think so. You still need to read the data into the CPU and write it out. Cache's are to reduce latency, not help with bandwidth. You populate the cache with a chunk of working data to save direct reads from RAM, and large caches mean less cache misses and less stalls, resulting in better performance.. If you want to avoid accessing RAM, you need scratch pad memory like EDRAM where the CPU will work from and only write to RAM with the results of the workload.

    Now if modern caches can do that and provide a transparent scratchpad, it would be beneficial, but that'd be news to me.
     
    Mitchings likes this.
  11. Vhatt

    Joined:
    Mar 19, 2020
    Messages:
    8
    Likes Received:
    5
    The discussion on the CPU core caching for the XSX made me go back to a little counting I did on my own from the spec reveal. MS indicated that the total onboard cache for the XSX APU was 76mb of SRAM. I was curious as to how that would be broken down for the CPU & GPU so did some counting based on available information and I came up with the following:

    CPU: per core, we have 64kb L1, 512kb L2 for both desktop and mobile versions of Zen 2. The difference as has been noted by others is in the L3 where we have 32mb for the desktop and 8mb for the mobile version.
    GPU: potentially per CU we have 32kb L0, 128kb L1. If I understood correctly they then allocate 4mb L2 which is shared across all the CUs (RDNA 2 hasn't been launched as yet so the quoted cache sizes are from RDNA1). (If I missed any other caches in the GPU please let me know)

    Based on the above, we could end up with the following:
    Door #1 Desktop CPU (512kb L1, 4.096mb L2 and 32mb) + 52 CU GPU (1.66mb L0, 6.65mb L1 & maybe 6mb L3) for a total of 50.918mb of cache.
    Door #2 Mobile CPU (512kb L1, 4.096mb L2 and 8mb) + 52 CU GPU (1.66mb L0, 6.65mb L1 & maybe 6mb L3) for a total of 28.918mb of cache.

    Variables:
    CPU: As was stated both in the DF arch piece and by members here the large L3 of the desktop CPU would be reduced. But would they reduce the L3 to the size of the mobile counterparts or something more (16mb)?
    GPU: For me this is the more interesting area as RDNA 2 hasn't been launched and the cache sizes are as yet an unknown. Would a doubling of the cache sizes for each CU increase performance and better manage the addition of RT? What cache type and size could the newly add RT parts of RDNA 2 require? Would MS add more cache for color information (lets say more L0 or L1 to help mitigate the slower I/O performance of their SSD)?

    Also, while this post is XSX minded in it's nature any caching changes for RDNA 2 would also be the same for the PS5 as it is also RDNA 2 based so whatever is speculated should apply in most part to each console less specific choices by each company.
     
  12. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,573
    Likes Received:
    14,165
    Location:
    Cleveland
    @TheAlSpark did you ever get more refinement on what/where the Cache actually is on SeriesX?
     
  13. Barrabas

    Regular Newcomer

    Joined:
    Jul 29, 2005
    Messages:
    316
    Likes Received:
    272
    Location:
    Norway
    I am under the impression that during the customization process they remove unnecessary and keep what's needed for the CPU and and GPU. Why is there a reduction in cache in the console APU's? Cost? Not needed?
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,944
    Location:
    Well within 3d
    The L3 has been quartered for Rendoir, but the L2 is the same. There's not much to remove from the L2.

    The L3 is a big consumer of die space for Zen2. Much of the more general Zen to Zen2 IPC improvement (not related to specialized changes like vector width) could be attributed to cache capacity, although the question faced by a constrained platform is how much is small percentage of performance worth in terms of cost or potentially lost area for other features?
    The large L3 matters more for server loads, while the impact for the workloads consoles experience may not have turned up as significant a dependence on capacity.

    The bandwidth savings vs area cost need to weight what the console vendors expect CPU bandwidth needs to generally be. If a Zen2 CCD consumes 10 GB/s in a given game, is 10 GB/s (edit: additional) out of 448 GB/s worth the die space?

    Old rule of thumb is that misses tend to fall with the square root of capacity. This affects a subset of all miss types, and there are loads that do not rely on cache much, so there would be diminishing returns.

    SRAM is a broadly used circuit type, not just for caches. The register files for the GPU are a large contributor, and there are many small buffers, internal caches, internal controllers, and registers throughout the chip. AMD's given large SRAM counts for Vega GPUs in excess of the universally recognized register, cache, and LDS totals.
     
    Pete, Inuhanyou, function and 7 others like this.
  15. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    21,578
    Likes Received:
    7,130
    Location:
    ಠ_ಠ
    Not really, no.

    There's probably a bunch associated with infinity fabric / interconnect, the GPU front-end, display controllers/encoders/decoders that doesn't get the spotlight. Maybe a bunch of it is redundancy as well (apart from the disabled CUs).
     
  16. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,162
    Likes Received:
    5,463
    Unreal Engine 4.25 is the release version with support for next-gen consoles. There's a 4.25 Plus stream that will be kept up to date with features for next-gen releases this year. Ray tracing is out of beta.
     
    DSoup, PSman1700 and BRiT like this.
  17. disco_

    Newcomer

    Joined:
    Jan 4, 2020
    Messages:
    215
    Likes Received:
    170
    They use cache in desktop parts to help with the latency caused by the chiplet design. With consoles not using chiplets, I'd assume the latency issues aren't as pronounced and less cache is needed.
     
  18. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,088
    Likes Received:
    2,955
    Location:
    Finland
    AMDs desktop cache size is dictated by Epycs, not because latencies would be suboptimal on desktop. The savings you'd get from cutting the cache in half or even 1/4ths isn't worth the cost of developing new chiplet for it.
     
    Dictator likes this.
  19. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    14,162
    Likes Received:
    5,463
    Game consoles shouldn't need large caches like desktops because they're not really multi-tasking like a PC and data accesses should be predictable. As long as devs are thinking about cache alignment of data, and making good use of cache line reads with linear data, a smaller cache should not be a big issue.
     
    vjPiedPiper, pharma, DSoup and 7 others like this.
  20. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,277
    Likes Received:
    2,613
    Location:
    Wrong thread
    Yep!

    The sheer range of cache sizes that best suit (bang for buck, proportion of die area) different workloads is really quite crazy.

    Just with different Zen 2 products, on the lower end you have something like Renoir, with what equates to 1MB of L3 per core (4 cores, 4MB per CCX). Reviews show Renoir to be leading edge for performance within its market segments.

    ... but on the other hand you have an absolute L3 belly-buster like the "Large cache" EPYC 7532:

    https://www.anandtech.com/show/15528/amd-expands-epyc-lineup-with-epyc-7662-epyc-7532-cpus

    32 cores and 256 MB of L3. That's 8MB L3 per core!!
     
    BRiT likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...