Nvidia Turing Architecture [2018]

xpea · Mar 24, 2020

Kaotik said:
This specific implementation is NVIDIA specific, but Microsoft has demonstrated similar super-resolution tech on DirectML which would run on any compatible hardware, including XSX.

Yes but performance must have to be seen as XBSX RDNA2 has no tensor core and relay on shaders. No idea about PC RDNA2 version tough

Kaotik · Mar 24, 2020

xpea said:
Yes but performance must have to be seen as XBSX RDNA2 has no tensor core and relay on shaders. No idea about PC RDNA2 version tough

RDNA2 on PC probably won't have tensors either, but what at least XSX version of RDNA2 has is support for faster 4- and 8-bit precisions (also included in Vega 20 for PC but not for example RDNA1, RDNA1 w/ DeepLearning stuff then again probably does have them)
Also tensors aren't a necessity for performance, for example Controls version of DLSS is running on CUDA-cores, not tensors (until 26th when they release the DLSS 2.0 patch for it)

troyan · Mar 24, 2020

Current DLSS in Control doesnt use DL. It is an improved upscale filter which doesnt create new information based on a DL network.

Malo · Mar 24, 2020

troyan said:
Current DLSS in Control doesnt use DL. It is an improved upscale filter which doesnt create new information based on a DL network.

Wasn't it an early version of what became DLSS 2.0 that was based on all the training done previously? Just wasn't ready for running on Tensors? Nvidia certainly touted it as a method derived from deep learning.

Kaotik · Mar 24, 2020

Malo said:
Wasn't it an early version of what became DLSS 2.0 that was based on all the training done previously? Just wasn't ready for running on Tensors? Nvidia certainly touted it as a method derived from deep learning.

To my understanding it was just meant to imitate the results, but the computational tasks to get there having nothing to do with AI training, old or new, in any form

Malo · Mar 24, 2020

Yeah, they do use the phrasing that it imitates the results. Glad they're updating it.

w0lfram · Mar 26, 2020

Tensors are what Nvidia uses, because tensors are left-over transistors from the hand-me-down enterprise chips. Tensors are not game related or engineered into chips for Games. Again, just that Nvidia likes to try and use them for games, otherwise they can't tout, or upsell their Enterprise chips as premium gaming cards.

Secondly, I said this before but DLSS is because Nvidia can't push 4k with Turing. So the are promoting 1440p and upscaling using AI to fake it. No need to be coy about this, it's a fact. Additionally, what is going to happen when People don't want to play their games with DLSS. and use native resolution...? That has to be considered.

Rootax · Mar 26, 2020

Every f***** week...

trinibwoy · Mar 26, 2020

w0lfram said:
Secondly, I said this before but DLSS is because Nvidia can't push 4k with Turing. So the are promoting 1440p and upscaling using AI to fake it. No need to be coy about this, it's a fact. Additionally, what is going to happen when People don't want to play their games with DLSS. and use native resolution...? That has to be considered.

I think what happens when you turn off DLSS is you have the fastest native 4K performance available today. What do you think happens? The card explodes?

PSman1700 · Mar 26, 2020

Rootax said:
Every f***** week...

Comon we need this in those dark times

Rootax · Mar 26, 2020

PSman1700 said:
Comon we need this in those dark times

I'm quite tense at the moment, I should relax and not losing my sh** for a forum post : D

Man from Atlantis · Mar 26, 2020

Rootax said:
Every f***** week...

Seeing your post reminds me of a great British dark comedy The End of the F***ing World.

Rootax · Mar 26, 2020

Man from Atlantis said:
Seeing your post reminds me of a great British dark comedy The End of the F***ing World.

Liked this show. Awesome first season, second was pretty good too.

Deleted member 2197 · Apr 5, 2020

Accelerating WinML and NVIDIA Tensor Cores
April 3, 2020

Models that run on Windows Machine Learning (WinML) using ONNX can benefit from Tensor Cores on NVIDIA hardware, but it is not immediately obvious how to make sure that they are in fact used. There is no switch or button labeled Use Tensor Cores and there are certain constraints by which the model and input data must abide.
...
To maximize the throughput and keep all the respective units busy, there is a constraint when working with floating point operations that the input to the Tensor Core be FP16. The A and B operands of the matrix are multiplied together to produce either FP16 or FP32 output. In the latter case, where you produce a 32-bit output, there is a performance penalty. You end up running the operation at half the speed that you could be, if you did not mix precision.

While it is possible to get other APIs such as cuDNN to consume FP32 into a Tensor Core operation, all that this is really doing is reducing the precision of the input immediately before the Tensor Core operation. In contrast, when you use WinML and ONNX, the input to the model and the model parameters (weights) must be FP16.
...
WinML is a very powerful tool but can be quite abstract. In some respects, this is both a blessing and a curse. On the one hand, WinML with ONNX provides a straightforward solution to move from research to production quickly. On the other hand, to achieve optimum performance, you must take care to make sure that ONNX files are well-generated.

Checklists are helpful when it comes to the production phase of any project. To leverage NVIDIA hardware effectively and make sure that Tensor Cores effectively execute a model using WinML, use the following checklist:

- Use FP16 for the model and the input.
  - Avoid mixed precision.
  - Fuse any format conversion with other operations, if you can.

- Fuse any format conversion with other operations, if you can.
  - Stick to the NHWC layout. Precompute any necessary transposition into the model.
  - Avoid transposes at runtime.

- Fully use the GPU.
  - Make sure that input/output filter counts are at least a multiple of eight. Ideally, make them a multiple of 32 or more.

https://devblogs.nvidia.com/accelerating-winml-and-nvidia-tensor-cores/

Kaotik · Jun 29, 2020

So, as we know NVIDIA added dedicated FP16 units to TU11x in place of tensors. However, recently it was found out (AIDA64) that NVIDIA is planning new GTX 1650's using TU106 core instead, which doesn't have dedicated FP16 units.
Now Galax has released GeForce GTX 1650 Ultra, Ultra being just their branding on it rather than NVIDIA naming. Supposedly they've disabled both RT and Tensor cores, since it's a GTX product.
Is there any educated guesses how the lack of FP16 units might affect the card compared to "normal" GTX 1650?`

https://twitter.com/x/status/1277601168265338885

http://www.szgalaxy.com/__ZH_GB__/Product5/ProductDetail?proID=562

Deleted member 13524 · Jun 29, 2020

Kaotik said:
So, as we know NVIDIA added dedicated FP16 units to TU11x in place of tensors. However, recently it was found out (AIDA64) that NVIDIA is planning new GTX 1650's using TU106 core instead, which doesn't have dedicated FP16 units.
Now Galax has released GeForce GTX 1650 Ultra, Ultra being just their branding on it rather than NVIDIA naming. Supposedly they've disabled both RT and Tensor cores, since it's a GTX product.
Is there any educated guesses how the lack of FP16 units might affect the card compared to "normal" GTX 1650?`

https://twitter.com/x/status/1277601168265338885

http://www.szgalaxy.com/__ZH_GB__/Product5/ProductDetail?proID=562

So the new GTX 1650 has no 2xFP16 throughput?

Kaotik · Jun 29, 2020

ToTTenTranz said:
So the new GTX 1650 has no 2xFP16 throughput?

Sadly Galax doesn't seem to go that deep into details, but surely the dedicated FP16 units were there for some reason in TU11x's?

pcchen · Jun 30, 2020

According to this, both TU106 and TU116 are able to dual issue FP32 and FP16 operations (it's done through the tensor cores in TU106).
The dedicated FP16 units in TU116 are there because TU116 do not have tensor cores like TU106. So as long as the "replacement" TU106 still have tensor cores it should be fine.

Kaotik · Jun 30, 2020

pcchen said:
According to this, both TU106 and TU116 are able to dual issue FP32 and FP16 operations (it's done through the tensor cores in TU106).
The dedicated FP16 units in TU116 are there because TU116 do not have tensor cores like TU106. So as long as the "replacement" TU106 still have tensor cores it should be fine.

Supposedly tensors are disabled in this, or at least TPU claims they are. https://www.techpowerup.com/269191/galax-designs-a-geforce-gtx-1650-ultra-with-tu106-silicon

Nvidia Turing Architecture [2018]

xpea

Kaotik

Drunk Member

troyan

Malo

Yak Mechanicum

Kaotik

Drunk Member

Malo

Yak Mechanicum

w0lfram

Rootax

trinibwoy

Meh

PSman1700

Rootax

Man from Atlantis

idk

Rootax

Deleted member 2197

Guest

Kaotik

Drunk Member

Deleted member 13524

Guest

Kaotik

Drunk Member

pcchen

Moderator

Kaotik

Drunk Member

Similar threads