Nvidia Turing Speculation thread [2018]

Status
Not open for further replies.
Just in case you are not being sarcastic: That's just a grossly oversimplified illustration to show how different parts of the calculation can overlap. In fact, RT- and Tensor-Cores are integrated into the individual SMs. Even under the assumption that the chip shot is just an equally oversimplified artists interpretation and not resembling reality at all, it would take large amounts of energy to move all that data around for a single frame. The 24 bright spots in the upper and lower horizontal middel for example are most likely the Raster Backends/ROPs.
I’m pretty sure he was using the slide to show that RT has been accelerated (the time graphs), not what the real layout looks like.

i learned not to believe anything that nvidia or others affiliated with them say unless i see a poc that fiasco of the 5xxx series was literally the last nail for me when it comes to that company
If your technical judgement is so clouded by emotions, why bother to engage in these kind of discussion?
 
Yeah it would have to be almost real-time to be used in instant replays.

Anything based on Deep Learning has also the potential of generating artefacts.
This for rare cases not seen during the training.
If you look at the DL slomo footage of the falling ice hockey skater, there are huge tearing artefacts on the skates.
As good as it looks, it might not be good enough to cover everything.
 
That deep slomo network requires at least an order of magnitude more calculations and cannot be done in real time. (At least not on a single GPU.)

It’s a large, deep network.

Here’s the paper: https://arxiv.org/pdf/1712.00080.pdf
Cue Jensen "The more you buy the more you save" :)
I assume comes down to if the spanking new large DGX node can be centralised with the broadcasting station and away from the hires TV/film camera feed giving greater flexibility.
Those hires broadcast cameras used for best fidelity in sports I think (used to be) are shockingly expensive, even without considering aspects of high speed motion capturing with slo-mo playback.
 
I am very confused about what part of the denoising is done with the help of AI instead of normal shaders doing the work.

In the quadro turing presentation they showed that only global illumination is being denoised by a AI based denoiser and reflections and all other light effect with compute. But the slide at the turing event had no single mention of AI denoising , not even for global illumination.

Are the tensor cores too slow for it in realtime gaming situations ?
 
I think the tensor cores are used for what they're calling DLSS (deep-learning super-sampling), but which seems to be an upscale instead of real super sampling.
 
I think the tensor cores are used for what they're calling DLSS (deep-learning super-sampling), but which seems to be an upscale instead of real super sampling.
Yes they are used for that but Jensen also suggested it was used for raytrace denoising or attempt could be in some capability. Nvidia is a bit vague about it.

Maybe it just something they are still developing and we will see AI based denoising of raytracing at a later time.
 
Yes they are used for that but Jensen also suggested it was used for raytrace denoising or attempt could be in some capability. Nvidia is a bit vague about it.

Maybe it just something they are still developing and we will see AI based denoising of raytracing at a later time.
You seem to imply Tensors couldn't be used for both?
 
The Tensor Cores are used for Denoising is practically all of the RT demos shown Pica Pica, Star Wars, Cornell box, etc etc) Now can they do both Denoising and DLSS at the same time? We don't know yet.
 
Honestly, that's what I expect from the 7nm TU102 replacement:
  • Increased clockspeed to around 2350-2500 MHz boost, 2050 base
  • 2 RT cores per SM
  • 64 SMs (4096 cc)
  • 16GB HMB2 or 24GB GDDR6
  • Identical tensor core count
That should be a significant leap over Pascal in "normal" FP32 workloads while offering faster RT perf than Turing at a -hopefully- smaller die size and lower cost.
 
Last edited:
I think a big limiter will be bandwidth in the 7nm parts. NVIDIA is already using 14 Gbps GDDR6, going to the current max of 18 Gbps is only a modest bump.

They may need to go to HBM in the 102 class GPU to get the bandwidth.
 
That deep slomo network requires at least an order of magnitude more calculations and cannot be done in Real time. (At least not on a single GPU.)
Wonder if it's a future feature since it's included in the NGX SDK. Currently there is only details for DLSS and AI Painting, but also included in the stack are placeholders for AI Slow-Mo and AI Res-Up.
https://developer.nvidia.com/rtx/ngx
 
Last edited by a moderator:
I think a big limiter will be bandwidth in the 7nm parts. NVIDIA is already using 14 Gbps GDDR6, going to the current max of 18 Gbps is only a modest bump.

They may need to go to HBM in the 102 class GPU to get the bandwidth.

A theoretical 18Gb/384-bit part would see memory bandwidth up to 864GB/s from 616GB/s, a healthy 40% increase and well within the realm of possibility.
 
Way too much chit-chat, IMO, and little detail about the questions asked, actually.

And it's kind of annoying this recent attitude of "thanks to RT everything feels real and like you're actually in the game, so before this moment everything was crap".
 
Status
Not open for further replies.
Back
Top