Nvidia DLSS 1 and 2 antialiasing discussion *spawn*

Surprisingly it's not easy to find performance metrics for DLSS. Would have thought with Unreal Engine plugins it would have been profiled to death. Seems to take 0.8-1.2ms at any quality setting on a 2080ti. That's the only number I could find.

Edit: Surprisingly difficult to find inference benchmarks comparing performance with and without tensor cores on an nvidia gpu.
 
Last edited:
My original claim was that with tensor cores sharing some resources with rest of the GPU, it could be a factor slowing down something else.
That's a funny topic to discuss. TCs can execute 4x flops in comparison with FP32 on Ampere.
If neural net is bound by cache/smem/memory/etc bandwidth and we can only extract just half of flops out of the TCs, it means that TCs are underutilized (which should be visible on the nsight graphs btw).
Such underutilization would leave a lot of spare cycles for FP, INT, SFU instructions that can be leveraged via the Async compute. So we can couple bandwidth bound NN kernel with some other math intensive kernel and still get close to the theoretical 4x speedup via the async compute.
If NN is math bound, speedup from leveraging TCs is likely close to the theoretical 4x maximum from the beginning. So a good utilization would actually "slow down" resource sharing via async)
 
New Surprisingly it's not easy to find performance metrics for DLSS
NVIDIA did provide some.
DLSS2_5-scaled.jpg

The 2060 Super takes 2.5ms with it's 277 Tensor units, the 2080Ti doubles the count of units to 544 and shaves off an additional 1ms down to 1.5ms.

Meaning the 2060 Super with 50% less tensors resources is 66% slower than 2080Ti.
 
And they are limiting it to RTX gpus for good reason, GPUs prior those lack tensor hardware blocks, and thus tank too much in performance would DLSS2 run on those (if even possible to begin with).
This is precisely the unknown quantity in question.

You accusing people with genuine questions as to compute performance and efficiency as somehow peddling conspiracy theories is completely counter-productive.

EDIT: Agreed with the poster above. Can’t we have at least one bloody thread where we can ask questions about the implementation of DLSS without the usual three Nvidia Defence Force members kicking up a fuss?
 
According to NVIDIA you would need an additional 9 TFs of compute to replace the speed you get from Tensor cores, and that's on a 2080 GPU, meaning you need almost 80% more CUDA cores on top of it's existing ones to simulate DLSS at the same speed.

images


Tensors operate in a burst like fashion, they power up, do their thing quickly and then power down. With general traditional units you either need to increase their count significantly, thus requiring more power and die area, and stealing resources from rasterizarion/ray tracing, or let them do the work more slowly, again stealing resources from rasterizarion/ray tracing.

So you want to ask questions about DLSS but only willing to accept the answers you like? Who's the "defend force" here? Get a mirror.
According to NVIDIA you would need an additional 9 TFs of compute to replace the speed you get from Tensor cores, and that's on a 2080 GPU, meaning you need almost 80% more CUDA cores on top of it's existing ones to simulate DLSS at the same speed.

images


Tensors operate in a burst like fashion, they power up, do their thing quickly and then power down. With general traditional units you either need to increase their count significantly, thus requiring more power and die area, and stealing resources from rasterizarion/ray tracing, or let them do the work more slowly, again stealing resources from rasterizarion/ray tracing.

NVIDIA did provide some.
DLSS2_5-scaled.jpg

The 2060 Super takes 2.5ms with it's 277 Tensor units, the 2080Ti doubles the count of units to 544 and shaves off an additional 1ms down to 1.5ms.

Meaning the 2060 Super with 50% less tensors resources is 66% slower than 2080Ti.

So you want to ask questions about DLSS but only willing to accept the answers you like? Who's the "defend force" here? Get a mirror.

They have their answers. Going against hard data and facts remains an uphill battle. A battle for nothing anyways, as the PS5 isnt even in the picture anyways for none of these techs or its future iterations to begin with.
 
They have their answers. Going against hard data and facts remains an uphill battle. A battle for nothing anyways, as the PS5 isnt even in the picture anyways for none of these techs or its future iterations to begin with.

And where did I refute any of this?

In fact, I stated, in response to your ridiculous accusation of conspiracy theories:

His point was: it stands to reason that tensor cores are not necessary to implement DLSS. Nvidia says that they’re optimal — and most of us here would believe that.

So we’ve gone full circle: we take Nvidia’s word that tensor cores are optimal for DLSS. We just want to have a precise number and some comparisons done on a set of GPUs archs such as Pascal, Turing, and Ampere.

My point and the original question still stands. But thanks for playing your part.
 
The question is not if tensor cores are needed for DLSS, but 1. if DLSS is the only option to do upscaling, and 2. if we need upscaling at all considering 4K screens are still small for that.
For me it's a no to both.
 
Other way round, 2080Ti is 70% faster than 2060S with 100% more resources. So scaling is sub-linear.
Scaling favors smaller unit counts and advantage is decreasing with higher res. Quite apparently, there is some kind of overhead involved.
upload_2021-6-28_16-42-40.png
 
The question is not if tensor cores are needed for DLSS, but 1. if DLSS is the only option to do upscaling, and 2. if we need upscaling at all considering 4K screens are still small for that.
For me it's a no to both.
1 isn't even a question?
2 has been answered many times, and 4K displays aren't even that related to the answer - even if you won't do a full frame upscaling many things don't need to be rendered in per pixel quality to look good. Saving performance to use it elsewhere isn't anything new and image reconstruction is just another option here.

Yep, keep going. You’re doing such great PR work for Nvidia with your behaviour on these forums. :rolleyes:


Show me one instance where I declined or refused to accept an answer or explanation?
Why would I show you anything? You've already made the point - anyone who's saying something you don't like does PR work for Nvidia. My suggestion about you getting a mirror is the best I can do.
 
1 isn't even a question?
Well, i'll rephrase... 'Do we need machine learning for temporal reconstruction upscaling, or can we do that without special hardware which so far is useless for any other gaming task?'
Agree it's no question. The answer is obvious.
 
That isnt the question. ML is not used for temporal reconstruction upscaling. The question should be what image quality is archievable through a ML approach.
 
The question is not if tensor cores are needed for DLSS, but 1. if DLSS is the only option to do upscaling, and 2. if we need upscaling at all considering 4K screens are still small for that.
For me it's a no to both.

I'd say
1. It's obviously not the only way. There are many ways
2. Yes, we need upscaling. Take any current game on console that uses some form of upscaling and imagine what it would look like if it didn't. I don't think Insomniac's games would look nearly as good, for example. They have very good upscaling.
 
Well, i'll rephrase... 'Do we need machine learning for temporal reconstruction upscaling, or can we do that without special hardware which so far is useless for any other gaming task?'
Agree it's no question. The answer is obvious.

Honestly, I doubt tensor cores are useless. The problem, like anything is adoption. If every single gpu on the market had tensor cores you'd probably see them used for a lot more. Right now adoption is too low for studios to invest heavily. It's the same thing every time new hardware features come out. People say they're useless until you hit a critical point of hardware support and suddenly every developer starts using it. Question becomes how many years until they're really leveraged and will this gen of gpus still be capable at that time.
 
Just to get a base level of understanding of deep learning, you train a model and then when you are "running" the trained model against a set of data that's called inferencing, correct? I've reviewed this stuff in the past, but don't remember it. So essentially I could use some large data set to train a model to recognize pictures of cars. After training is complete, I have a model that I can inference against to take any picture and hope that the model will correctly identify as pictures of cars or not.

Now I expect like anything else, trained models will have vastly different performance characteristics. What we really need to see is potentially a variety of inferencing benchmarks on turing and ampere gpus, probably related to image processing, that compare performance running inference on cuda cores vs tensor cores. I don't know if those benchmarks exist. If they do we might see a range of results where some bencharks so no gains, gains or losses. That might give some insight about how fast tensor cores could run dlss vs cuda cores, though without knowing how it's implemented we wouldn't know where in the range it would fall.

Unfortunately I don't think comparing Control's DLSS 1.9 vs DLSS 2.0 in a profiler would be useful, because 2.0 is a different approach to the model.
 
Honestly, I doubt tensor cores are useless. The problem, like anything is adoption. If every single gpu on the market had tensor cores you'd probably see them used for a lot more. Right now adoption is too low for studios to invest heavily. It's the same thing every time new hardware features come out. People say they're useless until you hit a critical point of hardware support and suddenly every developer starts using it. Question becomes how many years until they're really leveraged and will this gen of gpus still be capable at that time.
It's very different this time IMO.
An example where your point holds would be tessellation:
We know we want curvy round objects, not hexagons. So there is a need for more triangles to get there.
NV did pioneer work over the decades, but adoption made early support like HW bezier patches a failure. (I used it, and it was really nice and fast - skinned characters looked awesome.)
AMD tried some other things, but same problem.
Later we got Tessellation shaders, and adoption worked. Although it never became a killer feature.

Now what's the situation with tensor cores?
We don't know what we could do with ML in game runtimes, we even struggle to make it useful in offline content creation. And game developers never requested HW accelerated ML. (I'm not just against it! I think it's promising. And there was related discussion about potential applications.)
Non the less NV pushes it to market with a heavily overpriced GPU generation. Marketing campaign about innovation leading upscaling, necessary to compensate the underperforming other new feature, helps to convince customers.
Some of them pay the price and are happy about it, but most simply can't afford the new tech innovation. Not really helpful for the PC gaming platform, imho.
And now, having the second GPU generation with tensor cores, still nobody aside NV uses it. And there still is no other application beside upscaling. (Correct me if i'm wrong - there may be some things in the work using DirectML ofc.)

In short: NV pushes a datacenter feature to gaming market, offloading related costs (also about chip development) to gamers. And all they get is a form of TAAU with all its shortcomings - if they realize or not.
In some years, eventually game developers will have found ML applications, and then tensor cores may be justified.
 
Back
Top