Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by Proelite, Mar 16, 2020.

Thread Status:
Not open for further replies.
  1. snc

    snc
    Regular Newcomer

    Joined:
    Mar 6, 2013
    Messages:
    614
    Likes Received:
    411
    ok thx for info so maybe some other improvement towards ml in rdna2 that ps5 is missing
     
  2. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,707
    Likes Received:
    3,944
    Location:
    Wrong thread
    That seems to be talking about the vector registers, which won't necessarily translate into the ability to do accelerated rate operations using those unless supported by the ALU. For example:

    "Some variants of the dual compute unit expose additional mixed-precision dot-product modesin the ALUs, primarily for accelerating machine learning inference. A mixed-precision FMA dot2 will compute two half-precision multiplications and then add the results to a single-precision accumulator. For even greater throughput, some ALUs will support 8-bit integer dot4 operations and 4-bit dot8 operations, all of which use 32-bit accumulators to avoid any overflows."

    MS talked about their in8 and in4 stuff as being a customisation that they specifically requested. There's been no mention so far that Sony have requested the same, although they've been rather right lipped about certain aspects of the GPU.
     
  3. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    603
    Likes Received:
    378

    "Some variants of the dual compute unit expose additional mixed-precision dot-product modes in the ALUs, primarily for accelerating machine learning inference. A mixed-precision FMA dot2 will compute two half-precision multiplications and then add the results to a single-precision accumulator. For even greater throughput, some ALUs will support 8-bit integer dot4 operations and 4-bit dot8 operations, all of which use 32-bit accumulators to avoid any overflows."

    Maybe Microsoft requested ALL ALU to do this... But Int8 and int 4 is supported!
     
  4. turkey

    Veteran Newcomer

    Joined:
    Oct 21, 2014
    Messages:
    1,092
    Likes Received:
    875
    Location:
    London
    I though
    I thought similar but was pointed to the response from Andrew Goossen who specifically says they added it.

    We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."

    RDNA variations that do or do not include this?
     
    tinokun, function, BRiT and 1 other person like this.
  5. ToTTenTranz

    Legend Veteran

    Joined:
    Jul 7, 2008
    Messages:
    11,909
    Likes Received:
    6,851
    What the hell did I just read?

    Looks at bio.

    Oh...
     
  6. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    603
    Likes Received:
    378

    Apparently what they did was to add it to every single ALU... while RDNA only did that on some of the ALU.
     
    VitaminB6 likes this.
  7. function

    function None functional
    Legend Veteran

    Joined:
    Mar 27, 2003
    Messages:
    5,707
    Likes Received:
    3,944
    Location:
    Wrong thread
    Normally you would have a feature on all ALUs on the GPU or none of the ALUs. They tend to be uniform across the entire GPU for the purpose of simplifying scheduling and load balancing (with the possible exception of PS4Pro iirc which had some probably BC related dissymmetry across the two sides of the GPU).

    So when the RDNA whitepaper talks about "Some variants of the dual compute unit" it's almost certainly talking about variants of the ALUs across different chip designs rather than different ALUs on the same GPU. Different customers and different segments of the market can have different requirements.

    MS will have been talking to AMD for years about DirectML and their vision for inference acceleration. So whatever customisations MS have requested are likely to be absent in PS5. What the difference in capabilities will ultimately be I don't know, but it's likely to be across all ALUs on the respective chips.
     
    turkey, tinokun, TheAlSpark and 2 others like this.
  8. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,795
    Likes Received:
    15,309
    Location:
    The North
    This aspect is pretty critical. If you don't support mixed operations you're bound to run into overflow issues so either you spend additional cycles to resolve that, or your network will fail.
    It's possible that RDNA supports RPM for int8 and int4, as a default but how useful that is in practice, is rather unknown.

    FP16 is more than adequate for ML and will likely represent a majority of weights in a network. Mixed precision is even better, but that doesn't make FP16 useless just because another GPU can do mixed.
     
    #4408 iroboto, Nov 3, 2020
    Last edited: Nov 3, 2020
  9. Metal_Spirit

    Regular Newcomer

    Joined:
    Jan 3, 2007
    Messages:
    603
    Likes Received:
    378
    Well.. I was basing myself on the white paper...

    Page 14:

    "For even greater throughput, some ALUs will support 8-bit integer dot4 operations and 4-bit dot8 operations..."
     
  10. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,549
    Likes Received:
    20,621
    Yeah, I think that means more "some implementations" and not "3 out of 10 ALUs".
     
    function, tinokun, Jay and 1 other person like this.
  11. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,386
    Likes Received:
    2,705
    PS5 may support reduced precision formats, doesn't follow that it supports RPM operations on them though.

    Reading straight into gpu from ssd discussion from couple weeks ago.

    I couldn't think of why, how, reason to do it.
    But I was just thinking that a reason could be ML compressed textures.
    Could read it straight into the gpu, uncompress it into memory for use in next frame etc.
    Could SFS work with such compressed textures?
     
  12. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,795
    Likes Received:
    15,309
    Location:
    The North
    Wasn't sure where to put this, but this went against what I thought the results would be.
    In nearly all examples shown, XSX was faster. In some benchmarks, notably faster (like 2x)

    I can't come up with any reasons. Going to need to wait to hear some thoughts from others.
    edit: going to chalk this one up to BC restrictions on CPU or something for PS5. And XSX and XSS running unlocked all the way for their BC. Its the only thing I can think of for now. The alternative would be much worse.

    The real battleground for load times will be when those 3P games come out this week, but this was an interesting look at things.

    article:
    https://www.gamespot.com/articles/h...ompatibility-load-times-compare/1100-6484057/

    BC load time comparisons between PS5 and XSX


    Compared to itself this is interesting, so I think there is a BC bottleneck somewhere, but I'm not sure what.

    I'm a little perplexed.

     
    #4412 iroboto, Nov 6, 2020
    Last edited: Nov 6, 2020
    Silent_Buddha, AzBat, DSoup and 2 others like this.
  13. AbsoluteBeginner

    Regular Newcomer

    Joined:
    Jun 13, 2019
    Messages:
    960
    Likes Received:
    1,301
    PS5 will have to be faster in loading for sure. Even if MS has better software stack on top of it, better compression etc. that SSD in PS5 will load faster. Now, the question is if it will be noticable. That is, is 2x the difference going to be 2s vs 4s (which is pretty meaningless IMO) or will it actually be even smaller since games are not bound by data loading 100%. In either case, I think SSD will be least of a difference next gen.

    It will be GPU power and features and DS controller. If XSX, in best case scenario, is outperforming PS5 by more then 20% + has few additional bells and whistles, I'd say they got a winner. If it is closer then that, and there is feature parity, I'd say Sony did better job given DS.
     
    Johnny Awesome and ThePissartist like this.
  14. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,549
    Likes Received:
    20,621
    Maybe inefficiencies in I/O in libraries used by BC titles for PS4/4Pro are showing up here? Where as newer libraries for I/O on PS5 are substantially better. Like you, I can't think of a solid reason why the discrepancies nor why SeriesX would ever be faster.

    From what I saw the timings seem to be as follows:

    RDR 2
    • XSX: 1m4s
    • PS5: 1m5s
    FFXV
    • XSX: 48s
    • PS5: 1m10s
    Destiny 2
    • XSX: 42s
    • PS5: 57s
    MH World
    • XSX: 35s
    • PS5: 51s
    Arkham Knight
    • XSX: 58s
    • PS5: 1m7s
     
    AzBat, DSoup, PSman1700 and 1 other person like this.
  15. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,795
    Likes Received:
    15,309
    Location:
    The North
    Raw boot times: (going to need more tests of course)
    once again, perplexing.
    I'm wondering for PS5 standby if it's the controller sync that's waiting around to happen first.


    Push Square Times:
    Cold Boot to User Login 18.19 seconds
    Rest Mode to User Login 4.52 seconds
    User Login to Main Menu and Suspended Game 2.87 seconds

    They're going to need to align on standby states to get a better line up if they want to do this.
     
    #4415 iroboto, Nov 6, 2020
    Last edited: Nov 6, 2020
    AzBat, PSman1700, BRiT and 1 other person like this.
  16. Jay

    Jay
    Veteran Regular

    Joined:
    Aug 3, 2013
    Messages:
    3,386
    Likes Received:
    2,705
    This loading stuff is what I would consider a surprise more so than PS5 BC.

    It may not last long, as in next gen titles may behave totally differently.
    It really is a turn up for the books to be fair.

    Wonder if for BC it has timings that has direct effect to how the SSD is used.

    But general boot times aren't faster either. I wonder if the IO stack is limiting it, and will only really see its full potential when accessing it for next gen games via a different IO API stack?

    Edit : I removed section that sounded a bit too console waring...
     
    #4416 Jay, Nov 6, 2020
    Last edited: Nov 6, 2020
    PSman1700 and iroboto like this.
  17. goonergaz

    Veteran

    Joined:
    Jun 3, 2005
    Messages:
    4,160
    Likes Received:
    1,486
    Weird that DF didn’t mention an issue, they also said about the faster loading being a bonus?
     
  18. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    18,549
    Likes Received:
    20,621
    Well both have faster loading within their own family of devices (PS5 faster than PS4/4Pro; SeriesX faster than One/OneS/OneX).
     
    Michellstar, PSman1700 and Jay like this.
  19. mpg1

    Veteran Newcomer

    Joined:
    Mar 5, 2015
    Messages:
    2,250
    Likes Received:
    1,996
    Got the impression they are going to do another video for loading speeds..
     
  20. iroboto

    iroboto Daft Funk
    Legend Regular Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    12,795
    Likes Received:
    15,309
    Location:
    The North
    I thought about this as being an issue. But it doesn't explain how the CPU runs so much faster in game, moving the framerates up to 60fps locked, but at the same time be barely be able to load faster.
    Unless the CPU is locked and the GPU is the reason unlocked modes couldn't get higher. That's just perplexing, but we saw in CPU lmited areas PS5 was fine.

    IT's really confusing. I think it may have to do with how many threads can access I/O at the same time, in which to gather the full 5.5GB/s, you may need to send a lot of requests all at once. This is my idea right now, perhaps single threaded output is not very fast.
     
    PSman1700 likes this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...