Nvidia Turing Architecture [2018]

This specific implementation is NVIDIA specific, but Microsoft has demonstrated similar super-resolution tech on DirectML which would run on any compatible hardware, including XSX.
Yes but performance must have to be seen as XBSX RDNA2 has no tensor core and relay on shaders. No idea about PC RDNA2 version tough
 
Yes but performance must have to be seen as XBSX RDNA2 has no tensor core and relay on shaders. No idea about PC RDNA2 version tough
RDNA2 on PC probably won't have tensors either, but what at least XSX version of RDNA2 has is support for faster 4- and 8-bit precisions (also included in Vega 20 for PC but not for example RDNA1, RDNA1 w/ DeepLearning stuff then again probably does have them)
Also tensors aren't a necessity for performance, for example Controls version of DLSS is running on CUDA-cores, not tensors (until 26th when they release the DLSS 2.0 patch for it)
 
Current DLSS in Control doesnt use DL. It is an improved upscale filter which doesnt create new information based on a DL network.
Wasn't it an early version of what became DLSS 2.0 that was based on all the training done previously? Just wasn't ready for running on Tensors? Nvidia certainly touted it as a method derived from deep learning.
 
Wasn't it an early version of what became DLSS 2.0 that was based on all the training done previously? Just wasn't ready for running on Tensors? Nvidia certainly touted it as a method derived from deep learning.
To my understanding it was just meant to imitate the results, but the computational tasks to get there having nothing to do with AI training, old or new, in any form
 
Yeah, they do use the phrasing that it imitates the results. Glad they're updating it.
 
Tensors are what Nvidia uses, because tensors are left-over transistors from the hand-me-down enterprise chips. Tensors are not game related or engineered into chips for Games. Again, just that Nvidia likes to try and use them for games, otherwise they can't tout, or upsell their Enterprise chips as premium gaming cards.

Secondly, I said this before but DLSS is because Nvidia can't push 4k with Turing. So the are promoting 1440p and upscaling using AI to fake it. No need to be coy about this, it's a fact. Additionally, what is going to happen when People don't want to play their games with DLSS. and use native resolution...? That has to be considered.
 
Secondly, I said this before but DLSS is because Nvidia can't push 4k with Turing. So the are promoting 1440p and upscaling using AI to fake it. No need to be coy about this, it's a fact. Additionally, what is going to happen when People don't want to play their games with DLSS. and use native resolution...? That has to be considered.

I think what happens when you turn off DLSS is you have the fastest native 4K performance available today. What do you think happens? The card explodes?
 
Accelerating WinML and NVIDIA Tensor Cores
April 3, 2020

Models that run on Windows Machine Learning (WinML) using ONNX can benefit from Tensor Cores on NVIDIA hardware, but it is not immediately obvious how to make sure that they are in fact used. There is no switch or button labeled Use Tensor Cores and there are certain constraints by which the model and input data must abide.
...
To maximize the throughput and keep all the respective units busy, there is a constraint when working with floating point operations that the input to the Tensor Core be FP16. The A and B operands of the matrix are multiplied together to produce either FP16 or FP32 output. In the latter case, where you produce a 32-bit output, there is a performance penalty. You end up running the operation at half the speed that you could be, if you did not mix precision.

While it is possible to get other APIs such as cuDNN to consume FP32 into a Tensor Core operation, all that this is really doing is reducing the precision of the input immediately before the Tensor Core operation. In contrast, when you use WinML and ONNX, the input to the model and the model parameters (weights) must be FP16.
...
WinML is a very powerful tool but can be quite abstract. In some respects, this is both a blessing and a curse. On the one hand, WinML with ONNX provides a straightforward solution to move from research to production quickly. On the other hand, to achieve optimum performance, you must take care to make sure that ONNX files are well-generated.

Checklists are helpful when it comes to the production phase of any project. To leverage NVIDIA hardware effectively and make sure that Tensor Cores effectively execute a model using WinML, use the following checklist:


    • Use FP16 for the model and the input.
      • Avoid mixed precision.
      • Fuse any format conversion with other operations, if you can.

    • Fuse any format conversion with other operations, if you can.
      • Stick to the NHWC layout. Precompute any necessary transposition into the model.
      • Avoid transposes at runtime.

    • Fully use the GPU.
      • Make sure that input/output filter counts are at least a multiple of eight. Ideally, make them a multiple of 32 or more.
https://devblogs.nvidia.com/accelerating-winml-and-nvidia-tensor-cores/
 
So, as we know NVIDIA added dedicated FP16 units to TU11x in place of tensors. However, recently it was found out (AIDA64) that NVIDIA is planning new GTX 1650's using TU106 core instead, which doesn't have dedicated FP16 units.
Now Galax has released GeForce GTX 1650 Ultra, Ultra being just their branding on it rather than NVIDIA naming. Supposedly they've disabled both RT and Tensor cores, since it's a GTX product.
Is there any educated guesses how the lack of FP16 units might affect the card compared to "normal" GTX 1650?`


http://www.szgalaxy.com/__ZH_GB__/Product5/ProductDetail?proID=562
 
So, as we know NVIDIA added dedicated FP16 units to TU11x in place of tensors. However, recently it was found out (AIDA64) that NVIDIA is planning new GTX 1650's using TU106 core instead, which doesn't have dedicated FP16 units.
Now Galax has released GeForce GTX 1650 Ultra, Ultra being just their branding on it rather than NVIDIA naming. Supposedly they've disabled both RT and Tensor cores, since it's a GTX product.
Is there any educated guesses how the lack of FP16 units might affect the card compared to "normal" GTX 1650?`


http://www.szgalaxy.com/__ZH_GB__/Product5/ProductDetail?proID=562
So the new GTX 1650 has no 2xFP16 throughput?
 
According to this, both TU106 and TU116 are able to dual issue FP32 and FP16 operations (it's done through the tensor cores in TU106).
The dedicated FP16 units in TU116 are there because TU116 do not have tensor cores like TU106. So as long as the "replacement" TU106 still have tensor cores it should be fine.
 
According to this, both TU106 and TU116 are able to dual issue FP32 and FP16 operations (it's done through the tensor cores in TU106).
The dedicated FP16 units in TU116 are there because TU116 do not have tensor cores like TU106. So as long as the "replacement" TU106 still have tensor cores it should be fine.
Supposedly tensors are disabled in this, or at least TPU claims they are. https://www.techpowerup.com/269191/galax-designs-a-geforce-gtx-1650-ultra-with-tu106-silicon
 
Back
Top