Nvidia DLSS 1 and 2 antialiasing discussion *spawn*

His point was: it stands to reason that tensor cores are not necessary to implement DLSS.
You can of course run DLSS on the CPU, as every code. So no, they are not necessary from a theoretical perspective. In practice, it seems like you can invest more FLOPS in the upscaling/reconstruction than you could without them., even if their operation is not completely free in terms of PRF bandwidth, cache footprint etc. If that wasn't the case, we'd have DLSS on the shader cores by AMD and Intel.
 
Nothing he said was anywhere near a conspiracy or a conspiracy theory.

His point was: it stands to reason that tensor cores are not necessary to implement DLSS. Nvidia says that they’re optimal — and most of us here would believe that.

Its top level conspiracy theories to claim NV is lying about DLSS being faster on RTX gpus due to the inclusion of Tensor hardware.
 
If that wasn't the case, we'd have DLSS on the shader cores by AMD and Intel.

I believe it was DF that reported in the early days of DLSS that Nvidia limited which modes certain cards could run. The reasoning being that DLSS was too slow on lower end cards to be worth it. It would be faster to just render at native resolution.

No idea if that’s still the case but it does point to significant compute requirements.

Having said that, it could also be the case that AMD and Intel haven’t cracked the nut yet to come up with their own upscaling ML network.
 
For about the gazillionth time.
Matrix crunchers are not required for DLSS or any other AI work, they weren't designed for DLSS or image scaling and they're not the only way to accelerate AI workloads either.
Neither is RT h/w. What's your point? Can you run CP2077 with all RT options on a Radeon VII?

Current DLSS builds run on matrix crunchers, but we don't know if they're optimal hardware for it or even if they actually accelerate it much, it only needs to be faster than running it on CUDA to make sense because the cores are there for professional cards running same GPUs regardless. It could be several times faster, but it could also not be, we don't know, yet you and few others praise them like they're some alien technology dropped from heavens for the peasants for this very specific workload (hint: idea to utilize tensors for DLSS came a lot later than the cores themselves).
We don't know if they would be any faster than regular cores for any other hypothetical future load either. Only thing we can be sure of is that they're dedicated units which can make them more useful, but IIRC they also steal some other resources which can lead to hobbling rest of the GPU in worst case scenario (hypothetical, don't know if there's actual loads like this at this time)
You're free to make something which would work as good as DLSS without tensor cores.
...Oh, wait, AMD has just released FSR. Do you think they wouldn't have done just that if it was at all possible?
As for "stealing resources" - what does FSR do?
 
Its top level conspiracy theories to claim NV is lying about DLSS being faster on RTX gpus due to the inclusion of Tensor hardware.
No-one claimed anything like that, you need to stop twisting peoples words.
You're free to make something which would work as good as DLSS without tensor cores.
Like DLSS 1.9 which was leaps and bounds ahead of earlier DLSS versions and didn't use tensors?
As for "stealing resources" - what does FSR do?
I don't see anyone claiming FSR wouldn't eat resources, do you?
 
Its top level conspiracy theories to claim NV is lying about DLSS being faster on RTX gpus due to the inclusion of Tensor hardware.
You're being ridiculous. Hyperbole in response to logical speculation.
If that wasn't the case, we'd have DLSS on the shader cores by AMD and Intel.
The complexity of DLSS is far more likely to be in the training and development over the last few years to achieve what they have today, rather than simply performance requirements.
 
When dlss 2.0 came out nvidia claimed a 2x performance improvement because of optimization to how the tensor core was used. There’s also the option of running dlss asynchronously alongside general graphics workloads. I’m sure it’s possible DLSS 2.x could run on cud’s cores. The question is at what performance and on which GPUs. On a 2060 or a 3080 ti, what is the best option. I expect in both cases it’s the tensor core.

overall, until someone releases something comparable that runs on shader cores, we can’t really know. By that time who knows what dlss will look like. It’s improving every year.

FSR “2.0” is going to be temporal, or at least the strongly hinted it will be. We’ll see if it’s DirectML based.
 
I believe it was DF that reported in the early days of DLSS that Nvidia limited which modes certain cards could run. The reasoning being that DLSS was too slow on lower end cards to be worth it. It would be faster to just render at native resolution.

No idea if that’s still the case but it does point to significant compute requirements.

Having said that, it could also be the case that AMD and Intel haven’t cracked the nut yet to come up with their own upscaling ML network.
Yep, I remember something along those lines too. And I agree, it's probably pretty math intensive - and i decently designed rightfully so. FSR is also very much about shader utilization and very little about additional bandwidth requirement for a reason.

You're being ridiculous. Hyperbole in response to logical speculation.
The complexity of DLSS is far more likely to be in the training and development over the last few years to achieve what they have today, rather than simply performance requirements.
That's probably right and a core concept of ML inferencing. :)
 
Like DLSS 1.9 which was leaps and bounds ahead of earlier DLSS versions and didn't use tensors?
Which wasn't DLSS either. You forget this part. It was just TAAU and was considerably worse than DLSS2 precisely because of that.

I don't see anyone claiming FSR wouldn't eat resources, do you?
I see you claiming that it's somehow an issue of DLSS. In such approach any async compute has such issue and shouldn't be used because it gives just some 20% speedup instead of 100%?
 
You're being ridiculous. Hyperbole in response to logical speculation.

Theres nothing logical in trying to claim NV is lying about DLSS using the tensor cores, in a way you need RTX gpus to be able to utilize DLSS. It obviously would impact GPU performance too much without it.
 
Theres nothing logical in trying to claim NV is lying about DLSS using the tensor cores, in a way you need RTX gpus to be able to utilize DLSS. It obviously would impact GPU performance too much without it.
No-one claimed NVIDIA is lying about DLSS using tensor cores. Stop twisting peoples words.
Which wasn't DLSS either. You forget this part. It was just TAAU and was considerably worse than DLSS2 precisely because of that.
Right on. Except if you ask NVIDIA.

I see you claiming that it's somehow an issue of DLSS. In such approach any async compute has such issue and shouldn't be used because it gives just some 20% speedup instead of 100%?
I didn't talk about it being an issue, I only questioned it beeing free on top of everything the GPU does, which it in some cases isn't. I didn't say it shouldn't be used either. You should also stop twisting peoples words.
 
No-one claimed NVIDIA is lying about DLSS using tensor cores. Stop twisting peoples words.

And they are limiting it to RTX gpus for good reason, GPUs prior those lack tensor hardware blocks, and thus tank too much in performance would DLSS2 run on those (if even possible to begin with).
 
Right on. Except if you ask NVIDIA.
I can't be bothered to look for the original post but IIRC what I've wrote was pretty much exactly what Nvidia has said.
They've mentioned that they've used the approaches found in ML training done for 2.0 to fine tune the TAAU shader in DLSS 1.9 but that's about it.

I didn't talk about it being an issue, I only questioned it beeing free on top of everything the GPU does, which it in some cases isn't.
And in some cases it is. What's your point?
 
Questioning the claim of it being free, like I said already.
It can be "free" in the same sense as any async compute can be "free" - when the h/w idling enough to load it with an async workload without "stealing" too much of resources off the main pipeline.
So if you're saying that it can't be "free" then you're wrong. It can be "free".
 
And they are limiting it to RTX gpus for good reason, GPUs prior those lack tensor hardware blocks, and thus tank too much in performance would DLSS2 run on those (if even possible to begin with).
Titan V has tensors but no DLSS, though.
And again, we don't know what the performance would be like without tensors, you just choose to assume it's world of a difference. It could be anything from few %s to hundreds of %s, not just the higher end of the spectrum.

It can be "free" in the same sense as any async compute can be "free" - when the h/w idling enough to load it with an async workload without "stealing" too much of resources off the main pipeline.
So if you're saying that it can't be "free" then you're wrong. It can be "free".
My original claim was that with tensor cores sharing some resources with rest of the GPU, it could be a factor slowing down something else. It was later specified to be register file bandwidth, which apparently gets saturated easily anyway. With tensors eating into bandwidth saturated easily even without them, the addition isn't free, even when the end result is greater.
 
Titan V has tensors but no DLSS, though.
And again, we don't know what the performance would be like without tensors, you just choose to assume it's world of a difference. It could be anything from few %s to hundreds of %s, not just the higher end of the spectrum.

What kind of tensors does the TitanV have, though? Its an older GPU/architecture. It probably could run DLSS2.X, but at what speeds?
No we dont, perhaps. But, if NV claims you need RTX gpus to be able to use the tech you can do the math. This then becomes the question if NV is making up things or not to try and sell newer hardware. But again, this isnt the thread for it, and its way below my expectations for this sub-forum. This aint the console section afterall.
 
My original claim was that with tensor cores sharing some resources with rest of the GPU, it could be a factor slowing down something else. It was later specified to be register file bandwidth, which apparently gets saturated easily anyway. With tensors eating into bandwidth saturated easily even without them, the addition isn't free, even when the end result is greater.
Which again is true for any type of async compute workload. Doesn't mean that you can't optimize the code to run in a "free" fashion.
Although such optimizations are likely to happen on fixed h/w platforms only (i.e. consoles or HPC nodes) because on PC the amount of h/w configurations makes them almost impossible. But it's the same with any async compute, again.
 
we don't know what the performance would be like without tensors
According to NVIDIA you would need an additional 9 TFs of compute to replace the speed you get from Tensor cores, and that's on a 2080 GPU, meaning you need almost 80% more CUDA cores on top of it's existing ones to simulate DLSS at the same speed.

images


Tensors operate in a burst like fashion, they power up, do their thing quickly and then power down. With general traditional units you either need to increase their count significantly, thus requiring more power and die area, and stealing resources from rasterizarion/ray tracing, or let them do the work more slowly, again stealing resources from rasterizarion/ray tracing.
 
Back
Top