Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by Proelite, Mar 16, 2020.

Thread Status:
Not open for further replies.
  1. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    There has been research into using INT4 for scaling, so there may be enough performance on XSX compared to DLSS using INT8.

    Also recall that MS mentioned resolution scaling in context with ML at Hotchips,so it's definitely something they've been looking at.
     
    Silent_Buddha, RagnarokFF and scently like this.
  2. mpg1

    Veteran

    Joined:
    Mar 5, 2015
    Messages:
    2,250
    Likes Received:
    1,996
    They already showed an example over a year ago with their DirectML Super Resolution:

     
  3. Globalisateur

    Globalisateur Globby
    Veteran Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    4,592
    Likes Received:
    3,411
    Location:
    France
    Can they efficiently do Machine Learning with FP16?
     
  4. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    The speed of the solution will be more dependent on the developers of the model than the hardware itself.
    being 2x or 4x slower is not a big deal unless you’re running unlocked framerate. The current dlss solution takes 2.5ms? Or so. 4x that is 10ms. Once again tight for 16.6ms but fair game for 33.3ms.
     
  5. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,236
    Likes Received:
    4,259
    Location:
    Guess...
    That's just DLSS running through DirectML. Note the thanks to Nvidia for "supplying the model". It's also run on Nvidia hardware/tensor cores.
     
    TheAlSpark likes this.
  6. Maybe this is why they're implementing GDDR6X?
    Then again, GA102 seems to be power limited most of all, and I don't know if running Tensor+RT+CUDA in parallel forces a downclock.
     
  7. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    Yes.
     
  8. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,410
    What's the scaling order of throughput though? Assuming RPM, is the following accurate?

    1 FP32 operation ~= 2 FP16 operation ~= 4 INT8 ~= 8 INT4 operation
     
  9. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    if you're strictly only looking RPM yea, but these are fixed calculations.
    You want to also look at mixed precision, as you're actually keeping the quality while increasing the performance. If you lower down to int4 fixed, you are losing quality while gaining performance.
     
    #4389 iroboto, Nov 2, 2020
    Last edited: Nov 2, 2020
  10. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,511
    Likes Received:
    24,410
    Right, mostly wanted to confirm if using FP16 it would be half the speed of INT8.
     
  11. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    Yea which is fine ;)
    I think most models will still largely be FP16 imo. Perhaps I'm old fashioned. I don't know how much of a mixed network will drop as low as int4. That's like.. supppper lower precision.
     
    Alucardx23 and BRiT like this.
  12. Alucardx23

    Regular

    Joined:
    Oct 7, 2009
    Messages:
    549
    Likes Received:
    81
    Yeah, I guest that the Tensor cores in Turing are more like an investment for the future. We should see more games using them going forward. I agree with the highlighted part but some people answer that with "Yeah, but having to run the ML upscaling step on the CU, is robbing the GPU of cores that could be used for actual traditional rendering tasks."
     
    #4392 Alucardx23, Nov 2, 2020
    Last edited: Nov 2, 2020
  13. QPlayer

    Newcomer

    Joined:
    May 17, 2019
    Messages:
    52
    Likes Received:
    27
    I don't know why some people think that TOPS and INT4 / 8 data measurable on a PC can be directly compared to the closed and more efficient architecture of the consoles. Anyway, MS stated that with minimal silicone modification, 10x more effective ML and 10x more effective raytracing can be achieved on Series consoles. This is why the super resolution technique can be very efficient, e.g. it may use only 1 CPU core, or maybe it can work efficiently on the GPU alone.
     
  14. RagnarokFF

    Newcomer

    Joined:
    Mar 22, 2020
    Messages:
    57
    Likes Received:
    146
    He is fanboy looking at his post history, including this tweet. He literally did make a statement that racist gravitate to Xbox. Wtf.


    Another gem:
     
  15. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,406
    Location:
    Wrong thread
  16. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,833
    Likes Received:
    18,633
    Location:
    The North
    That job requirement list is ... lol
    Good luck
     
    RagnarokFF and function like this.
  17. cheapchips

    Veteran

    Joined:
    Feb 23, 2013
    Messages:
    2,493
    Likes Received:
    2,665
    Location:
    UK
    "Do all the ML and then tell everyone else how to also do all the ML" :-D
     
  18. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,406
    Location:
    Wrong thread
    It's just asking for your standard expert specialised ML researcher, game developer, senior game tech lead, Unreal developer, mathematician, passionate about AAA games type person.

    P. S. Must be a team player, and mentor everyone else on the team.

    Don't see the problem. It's not like they're asking for a lot. :nope:
     
  19. Metal_Spirit

    Regular

    Joined:
    Jan 3, 2007
    Messages:
    632
    Likes Received:
    397
    Why would you think that?

    May I quote from the RDNA white paper?

    https://www.amd.com/system/files/documents/rdna-whitepaper.pdf

    "To accommodate the narrower wavefronts, the vector register file has been reorganized. Each vector general purpose register (vGPR) contains 32 lanes that are 32-bits wide, and a SIMD contains a total of 1,024 vGPRs – 4X the number of registers as in GCN. The registers typically hold single-precision (32-bit) floating-point (FP) data, but are also designed for efficiently RDNA Architecture | 13 handling mixed precision. For larger 64-bit (or double precision) FP data, adjacent registers are combined to hold a full wavefront of data. More importantly, the compute unit vector registers natively support packed data including two half-precision (16-bit) FP values, four 8-bit integers, or eight 4-bit integers."
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...