Nvidia Turing Architecture [2018]

Discussion in 'Architecture and Products' started by pharma, Sep 13, 2018.

Tags:
  1. xpea

    Regular Newcomer

    Joined:
    Jun 4, 2013
    Messages:
    399
    Likes Received:
    413
    Yes but performance must have to be seen as XBSX RDNA2 has no tensor core and relay on shaders. No idea about PC RDNA2 version tough
     
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,850
    Likes Received:
    2,772
    Location:
    Finland
    RDNA2 on PC probably won't have tensors either, but what at least XSX version of RDNA2 has is support for faster 4- and 8-bit precisions (also included in Vega 20 for PC but not for example RDNA1, RDNA1 w/ DeepLearning stuff then again probably does have them)
    Also tensors aren't a necessity for performance, for example Controls version of DLSS is running on CUDA-cores, not tensors (until 26th when they release the DLSS 2.0 patch for it)
     
    chris1515 likes this.
  3. troyan

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    149
    Likes Received:
    237
    Current DLSS in Control doesnt use DL. It is an improved upscale filter which doesnt create new information based on a DL network.
     
    ethernity likes this.
  4. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,532
    Likes Received:
    3,572
    Location:
    Pennsylvania
    Wasn't it an early version of what became DLSS 2.0 that was based on all the training done previously? Just wasn't ready for running on Tensors? Nvidia certainly touted it as a method derived from deep learning.
     
  5. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,850
    Likes Received:
    2,772
    Location:
    Finland
    To my understanding it was just meant to imitate the results, but the computational tasks to get there having nothing to do with AI training, old or new, in any form
     
    BRiT likes this.
  6. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,532
    Likes Received:
    3,572
    Location:
    Pennsylvania
    Yeah, they do use the phrasing that it imitates the results. Glad they're updating it.
     
  7. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    216
    Likes Received:
    38
    Tensors are what Nvidia uses, because tensors are left-over transistors from the hand-me-down enterprise chips. Tensors are not game related or engineered into chips for Games. Again, just that Nvidia likes to try and use them for games, otherwise they can't tout, or upsell their Enterprise chips as premium gaming cards.

    Secondly, I said this before but DLSS is because Nvidia can't push 4k with Turing. So the are promoting 1440p and upscaling using AI to fake it. No need to be coy about this, it's a fact. Additionally, what is going to happen when People don't want to play their games with DLSS. and use native resolution...? That has to be considered.
     
  8. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,436
    Likes Received:
    813
    Location:
    France
    Every f***** week...
     
    Lightman, sonen, neckthrough and 6 others like this.
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,558
    Likes Received:
    599
    Location:
    New York
    I think what happens when you turn off DLSS is you have the fastest native 4K performance available today. What do you think happens? The card explodes?
     
    BRiT and Picao84 like this.
  10. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    2,312
    Likes Received:
    715
    Comon we need this in those dark times ;)
     
  11. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,436
    Likes Received:
    813
    Location:
    France
    I'm quite tense at the moment, I should relax and not losing my sh** for a forum post : D
     
    sir doris and PSman1700 like this.
  12. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    747
    Likes Received:
    60
    Seeing your post reminds me of a great British dark comedy The End of the F***ing World.
     
  13. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,436
    Likes Received:
    813
    Location:
    France
    Liked this show. Awesome first season, second was pretty good too.
     
  14. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    3,412
    Likes Received:
    2,070
    Accelerating WinML and NVIDIA Tensor Cores
    April 3, 2020

    Models that run on Windows Machine Learning (WinML) using ONNX can benefit from Tensor Cores on NVIDIA hardware, but it is not immediately obvious how to make sure that they are in fact used. There is no switch or button labeled Use Tensor Cores and there are certain constraints by which the model and input data must abide.
    ...
    To maximize the throughput and keep all the respective units busy, there is a constraint when working with floating point operations that the input to the Tensor Core be FP16. The A and B operands of the matrix are multiplied together to produce either FP16 or FP32 output. In the latter case, where you produce a 32-bit output, there is a performance penalty. You end up running the operation at half the speed that you could be, if you did not mix precision.

    While it is possible to get other APIs such as cuDNN to consume FP32 into a Tensor Core operation, all that this is really doing is reducing the precision of the input immediately before the Tensor Core operation. In contrast, when you use WinML and ONNX, the input to the model and the model parameters (weights) must be FP16.
    ...
    WinML is a very powerful tool but can be quite abstract. In some respects, this is both a blessing and a curse. On the one hand, WinML with ONNX provides a straightforward solution to move from research to production quickly. On the other hand, to achieve optimum performance, you must take care to make sure that ONNX files are well-generated.

    Checklists are helpful when it comes to the production phase of any project. To leverage NVIDIA hardware effectively and make sure that Tensor Cores effectively execute a model using WinML, use the following checklist:


      • Use FP16 for the model and the input.
        • Avoid mixed precision.
        • Fuse any format conversion with other operations, if you can.

      • Fuse any format conversion with other operations, if you can.
        • Stick to the NHWC layout. Precompute any necessary transposition into the model.
        • Avoid transposes at runtime.

      • Fully use the GPU.
        • Make sure that input/output filter counts are at least a multiple of eight. Ideally, make them a multiple of 32 or more.
    https://devblogs.nvidia.com/accelerating-winml-and-nvidia-tensor-cores/
     
    PSman1700 likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...