Next Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by Proelite, Mar 16, 2020.

  1. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    2,613
    Likes Received:
    1,674
    One of the reasons I just use the figures provided with view that may make a tangible difference but what they are we don't know.
    But then you get into things like how much difference does SFS make etc.

    Now MS employee saying that 4.8 was conservative. What does that mean, 4.81, 5.5? So I'll just keep with the 4.8 until we know what's what.
    I just think adding x which could mean anything muddies the water a lot, because could be negligible, could be significant.
     
  2. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,804
    Likes Received:
    1,092
    Location:
    Guess...
    So does this mean data can DMA'd directly from storage into GPU cache - bypassing main memory?
     
  3. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,544
    Likes Received:
    14,095
    Location:
    Cleveland
    People need to stop rounding numbers when actual specifications were provided. Rounding is the work of evil.
     
  4. MrFox

    MrFox Deludedly Fantastic
    Legend Veteran

    Joined:
    Jan 7, 2012
    Messages:
    6,427
    Likes Received:
    5,836
    I would assume it always need to be written to ram.
     
  5. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    21,578
    Likes Received:
    7,129
    Location:
    ಠ_ಠ
    hm... indeed. I'm not sure I understand the (tangible) differences that may arise in sending out work to multiple units at a slower rate vs fewer at higher rate.

    Let's say that for a given workload of ten credits, a single extreme unit completes 1 credit in 1/10th of a time. What does that mean for memory transactions at the LLC (i.e. GDDR6)? Would that be 10 memory transactions? Thus when distributing the 10 credits to ten units at 1/10th speed, would they complete this workload at the same time for a single memory transaction?

    ----

    Anyways, MS gave the # of intersections, which corresponds to the texel rate, so I assume it would be a similar meaning for PS5.

    I don't really understand how nVidia goes about its RT cores vs rays calculation either. e.g. 72 RT cores * 1.455GHz = 104.76 Giggidies / sec, but the Ray Rate is 10 GigaRays/sec. There seems to be a similar factor of 10 difference in other GPUs.

    1 "RT Core" per DCU (WGP)? That would maybe mean (1.825*26/10) ~4.7B rays per sec on the SX, which is close to what 2060 performs.

    ¯\_(ツ)_/¯

    Minecraft RT -> 1080p60 (4k DLSS) on 2080Ti. 1080p30-60 on SX (no DLSS). :huh::embarrased::confused::???:
     
    #2585 TheAlSpark, May 25, 2020
    Last edited: May 25, 2020
    iroboto likes this.
  6. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,544
    Likes Received:
    14,095
    Location:
    Cleveland
    Nope. Those are still not accurate specs and uses truncation on one side and rounding up for the other.

    First one I see is the TF numbers for both sides is wrong.
    SeriesX is 12.155 TF, not 12.
    PS5 is 10.28 TF max, not 10.3.
     
    PSman1700 likes this.
  7. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,512
    Likes Received:
    776
    Here we go again rounding numbers ,) Also the comparison to the GTX970 Vram seems abit off.
     
    John Norum likes this.
  8. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,804
    Likes Received:
    1,092
    Location:
    Guess...
    I'm trying to work out why the cache scrubbers add particular benefit now and why this is only being done now given that the speed of the data feed from main memory to cache hasn't changed just because the system has a faster SSD.

    My assumption is that the data in cache is changing much more frequently thanks to the data in VRAM also changing more frequently although surely that's always been the case as graphics data has increased in size and the way that's been dealt with in the past is through larger caches.

    So are we saying cache scrubbers are now needed because cache sizes are disproportionately small compared with the amount of data that's being used per frame as enabled by the new storage designs?

    Are cache scrubbers essentially a hack because RDNA2 hasn't been designed to cope with the amount of data per frame that the PS5 will allow as a result of the huge jump in streaming speed?
     
    DSoup, TheAlSpark and PSman1700 like this.
  9. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,512
    Likes Received:
    776
    Maybe as much as the ID buffer from the Pro.
     
  10. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,084
    Likes Received:
    2,952
    Location:
    Finland
    They haven't really told how the accessing works, have they? I'm pretty sure it's something more elegant than 970, where using the last .5GB has only fraction of memory bandwidth to it.
     
    John Norum and PSman1700 like this.
  11. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,544
    Likes Received:
    14,095
    Location:
    Cleveland
    John Norum, AzBat, jlippo and 2 others like this.
  12. chris1515

    Veteran Regular

    Joined:
    Jul 24, 2005
    Messages:
    4,667
    Likes Received:
    3,561
    Location:
    Barcelona Spain
    This is not important. This is the memory reserved to the CPU but you need to access it and during the time you access it the bus is only 192 bits. it decreases the overall bandwidth because it is a unified bus.
     
  13. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,031
    Likes Received:
    5,576
    There's nothing else to tell. Once they told the two bandwidth levels you can determine how the RAM is placed.

    There are 10 GDDR6 chips with a 32bit bus each (rather 2x16 given GDDR6's parameters, but AFAIK you can't really split those).
    6 of those chips have a capacity of 16Gbit / 2GByte (let's call it group A), and 4 of those chips have a capacity of 8Gbit / 1GByte (group B).
    Data is always interleaved among all chips to maximize bandwidth, though you can only do that while all chips have free capacity to receive data (usually you can always do that because all the chips have the same capacity).

    This means you get 10*32bit = 320bit bandwidth (560GB/s) when the memory controller is using Group B and the first half (1GByte) of Group A.
    Once Group B gets full (they only have 1GByte each), you can only use the second half of Group A, which is 6*32bit = 192bit (336GB/s).


    It's exactly like nvidia did with the 660 Ti and the 550 Ti before it, only with different proportions.


    [​IMG]

    [​IMG]



    EDIT: The 970 is different because nvidia simply stopped interleaving from working to one 32bit channel, by cutting off its L2. It was essentially a 224bit GPU with one extra 32bit 512MB chip that worked very slowly.
     
    #2593 ToTTenTranz, May 26, 2020
    Last edited: May 26, 2020
    Pete, Kaotik and TheAlSpark like this.
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,944
    Location:
    Well within 3d
    Cerny's presentation, and some of the past presentations on Sony's compute goals hint at a sensitivity to latency. Latency helped defeat the GPU and DSP's general use in most audio for the PS4, and now there is Tempest. Cerny gave as part of his justification for the high-clock strategy scenarios where the GPU could not fully utilize its width, but could complete smaller tasks faster if the clock speed was raised.

    If there is a memory range that may exist in the GPU caches that gets overwritten by a read from the SSD, the old copies in the GPU do not automatically update. RDNA2 is not unique in this, as in almost all situations the GPU cache hierarchies are weakly ordered and slow to propagate changes. In fairness, most data read freshly from IO need additional work to keep consistent even for CPUs.
    If you don't want the GPU to be using the wrong data, the data in the GPU needs to be cleared out of the caches before a shader tries to read from those addresses. The PS4's volatile flag was a different cache invalidation optimization, so there does seem to be a history of such tweaks in the Cerny era.
    The general cache invalidation process for the GCN/RDNA caches is a long-latency event. It's a pipeline event that blocks most of the graphics pipeline (command processor, CUs, wavefront launch, graphics blocks) until the invalidation process runs its course. This also comes up when CUs read from render targets in GCN, particularly after DCC was introduced and prior to the ROPs becoming L2 clients with Vega. The cache flush events are expensive and advised against heavily.

    In the past, a HDD's limited parallelism and long seek times would have eclipsed this process and kept it at a lower frequency.
    If the PS5's design expects to be able to fire off many more accesses and use them in a relatively aggressive time frame, then the scrubbers may reduce the impact by potentially reducing the cost of such operations, or reducing the number of full stalls that need to happen.
     
    Pete, jgp, Shifty Geezer and 12 others like this.
  15. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,544
    Likes Received:
    14,095
    Location:
    Cleveland
    To be pedantic, none of the memory is reserved for CPU only or GPU only use, but it would be wise to have the GPU use the 556 GB/s and the CPU use the 336 GB/s. The entire memory is fully accessible by either.
     
    John Norum, AzBat, pharma and 4 others like this.
  16. Inuhanyou

    Veteran Regular

    Joined:
    Dec 23, 2012
    Messages:
    1,098
    Likes Received:
    271
    Location:
    New Jersey, USA
    Ms's strategy to have two different speeds of ram at once is weirding me out. Can they both be used by the same tasks at once?
     
  17. BRiT

    BRiT Verified (╯°□°)╯
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    15,544
    Likes Received:
    14,095
    Location:
    Cleveland
    Yes. The only difference is the bandwidth speed.
     
    tinokun and PSman1700 like this.
  18. Inuhanyou

    Veteran Regular

    Joined:
    Dec 23, 2012
    Messages:
    1,098
    Likes Received:
    271
    Location:
    New Jersey, USA
    Thats what confuses me. So the bandwidth is 560+336 in these instances or...? I would have thought ms would have mentioned being able to add them together but again i dont know anything
     
  19. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    21,578
    Likes Received:
    7,129
    Location:
    ಠ_ಠ
    It's two chunks of memory space, not two different speeds of RAM. Data that fits into the first 10GB can be striped across 10 chips, ergo up to 560GB/s. For data that is put into the upper 1GB addresses of the 2GB chips, there can naturally only be striping across 6 chips, hence 336GB/s access.
     
    Pete, tinokun, pharma and 3 others like this.
  20. Inuhanyou

    Veteran Regular

    Joined:
    Dec 23, 2012
    Messages:
    1,098
    Likes Received:
    271
    Location:
    New Jersey, USA
    I mean of course, everyone knows about that, its first grade stuff..

    -cough-
     
    zupallinere likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...