Nvidia Turing Architecture [2018]

Discussion in 'Architecture and Products' started by pharma, Sep 13, 2018.

Tags:
  1. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    404
    Likes Received:
    431
    Yes but performance must have to be seen as XBSX RDNA2 has no tensor core and relay on shaders. No idea about PC RDNA2 version tough
     
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,060
    Likes Received:
    2,929
    Location:
    Finland
    RDNA2 on PC probably won't have tensors either, but what at least XSX version of RDNA2 has is support for faster 4- and 8-bit precisions (also included in Vega 20 for PC but not for example RDNA1, RDNA1 w/ DeepLearning stuff then again probably does have them)
    Also tensors aren't a necessity for performance, for example Controls version of DLSS is running on CUDA-cores, not tensors (until 26th when they release the DLSS 2.0 patch for it)
     
    chris1515 likes this.
  3. troyan

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    151
    Likes Received:
    241
    Current DLSS in Control doesnt use DL. It is an improved upscale filter which doesnt create new information based on a DL network.
     
    ethernity likes this.
  4. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,613
    Likes Received:
    3,672
    Location:
    Pennsylvania
    Wasn't it an early version of what became DLSS 2.0 that was based on all the training done previously? Just wasn't ready for running on Tensors? Nvidia certainly touted it as a method derived from deep learning.
     
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,060
    Likes Received:
    2,929
    Location:
    Finland
    To my understanding it was just meant to imitate the results, but the computational tasks to get there having nothing to do with AI training, old or new, in any form
     
    BRiT likes this.
  6. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,613
    Likes Received:
    3,672
    Location:
    Pennsylvania
    Yeah, they do use the phrasing that it imitates the results. Glad they're updating it.
     
  7. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    217
    Likes Received:
    38
    Tensors are what Nvidia uses, because tensors are left-over transistors from the hand-me-down enterprise chips. Tensors are not game related or engineered into chips for Games. Again, just that Nvidia likes to try and use them for games, otherwise they can't tout, or upsell their Enterprise chips as premium gaming cards.

    Secondly, I said this before but DLSS is because Nvidia can't push 4k with Turing. So the are promoting 1440p and upscaling using AI to fake it. No need to be coy about this, it's a fact. Additionally, what is going to happen when People don't want to play their games with DLSS. and use native resolution...? That has to be considered.
     
  8. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,463
    Likes Received:
    831
    Location:
    France
    Every f***** week...
     
    Lightman, sonen, neckthrough and 6 others like this.
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,579
    Likes Received:
    622
    Location:
    New York
    I think what happens when you turn off DLSS is you have the fastest native 4K performance available today. What do you think happens? The card explodes?
     
    BRiT and Picao84 like this.
  10. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,493
    Likes Received:
    773
    Comon we need this in those dark times ;)
     
  11. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,463
    Likes Received:
    831
    Location:
    France
    I'm quite tense at the moment, I should relax and not losing my sh** for a forum post : D
     
    sir doris and PSman1700 like this.
  12. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    753
    Likes Received:
    79
    Seeing your post reminds me of a great British dark comedy The End of the F***ing World.
     
  13. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,463
    Likes Received:
    831
    Location:
    France
    Liked this show. Awesome first season, second was pretty good too.
     
  14. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,528
    Likes Received:
    2,214
    Accelerating WinML and NVIDIA Tensor Cores
    April 3, 2020

    Models that run on Windows Machine Learning (WinML) using ONNX can benefit from Tensor Cores on NVIDIA hardware, but it is not immediately obvious how to make sure that they are in fact used. There is no switch or button labeled Use Tensor Cores and there are certain constraints by which the model and input data must abide.
    ...
    To maximize the throughput and keep all the respective units busy, there is a constraint when working with floating point operations that the input to the Tensor Core be FP16. The A and B operands of the matrix are multiplied together to produce either FP16 or FP32 output. In the latter case, where you produce a 32-bit output, there is a performance penalty. You end up running the operation at half the speed that you could be, if you did not mix precision.

    While it is possible to get other APIs such as cuDNN to consume FP32 into a Tensor Core operation, all that this is really doing is reducing the precision of the input immediately before the Tensor Core operation. In contrast, when you use WinML and ONNX, the input to the model and the model parameters (weights) must be FP16.
    ...
    WinML is a very powerful tool but can be quite abstract. In some respects, this is both a blessing and a curse. On the one hand, WinML with ONNX provides a straightforward solution to move from research to production quickly. On the other hand, to achieve optimum performance, you must take care to make sure that ONNX files are well-generated.

    Checklists are helpful when it comes to the production phase of any project. To leverage NVIDIA hardware effectively and make sure that Tensor Cores effectively execute a model using WinML, use the following checklist:


      • Use FP16 for the model and the input.
        • Avoid mixed precision.
        • Fuse any format conversion with other operations, if you can.

      • Fuse any format conversion with other operations, if you can.
        • Stick to the NHWC layout. Precompute any necessary transposition into the model.
        • Avoid transposes at runtime.

      • Fully use the GPU.
        • Make sure that input/output filter counts are at least a multiple of eight. Ideally, make them a multiple of 32 or more.
    https://devblogs.nvidia.com/accelerating-winml-and-nvidia-tensor-cores/
     
    PSman1700 likes this.
  15. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,060
    Likes Received:
    2,929
    Location:
    Finland
    So, as we know NVIDIA added dedicated FP16 units to TU11x in place of tensors. However, recently it was found out (AIDA64) that NVIDIA is planning new GTX 1650's using TU106 core instead, which doesn't have dedicated FP16 units.
    Now Galax has released GeForce GTX 1650 Ultra, Ultra being just their branding on it rather than NVIDIA naming. Supposedly they've disabled both RT and Tensor cores, since it's a GTX product.
    Is there any educated guesses how the lack of FP16 units might affect the card compared to "normal" GTX 1650?`



    http://www.szgalaxy.com/__ZH_GB__/Product5/ProductDetail?proID=562
     
  16. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    11,019
    Likes Received:
    5,558
    So the new GTX 1650 has no 2xFP16 throughput?
     
  17. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,060
    Likes Received:
    2,929
    Location:
    Finland
    Sadly Galax doesn't seem to go that deep into details, but surely the dedicated FP16 units were there for some reason in TU11x's?
     
  18. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,810
    Likes Received:
    211
    Location:
    Taiwan
    According to this, both TU106 and TU116 are able to dual issue FP32 and FP16 operations (it's done through the tensor cores in TU106).
    The dedicated FP16 units in TU116 are there because TU116 do not have tensor cores like TU106. So as long as the "replacement" TU106 still have tensor cores it should be fine.
     
  19. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,060
    Likes Received:
    2,929
    Location:
    Finland
    Supposedly tensors are disabled in this, or at least TPU claims they are. https://www.techpowerup.com/269191/galax-designs-a-geforce-gtx-1650-ultra-with-tu106-silicon
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...