Nvidia DLSS 1 and 2 antialiasing discussion *spawn*

So, looks like Tensor Cores will be addressable through the recently released DirectML API, the same way RT Cores are addressable through DXR.






perfchart.png

https://devblogs.microsoft.com/directx/gaming-with-windows-ml/
Yes. Will see quite a different and varied methods of using ML for reconstruction methods etc now that they can integrate it directly into the rendering pipeline.
 
but I don't think the name does that? I think they've always been open about being lower resolution.
In light of the discussion, I would posit that if we're having this discussion, its misleading enough to some people. If I didn't work with ML (and there are certainly more senior and better ML agents here) I would have probably been lost on this definition as well. the fact that i'm trying to defend it is probably proof enough that it's not clear enough.
 
We could do the same thing in rendering land and have it trained by MSAA, FXAA, SSAA, TAA, and the deep learning would try to emulate those antialiasing methods provided it was trained by them. In this case, nvidia chose to SSAA to be the output, thus I'm unsure if it's technically misleading that they called it DLSS, as they used Deep Learning to emulate SSAA.
Okay, I see where you're going with that. It's like calling it Deep Learning Kentucky Fried Chicken if your setting the AI to learn to make KFC. However, the wording is wrong for that. It'd be 'Deep Learned Super Sampling' - we learned how to do it. Deep Learning is an adjective form describing the type of super sampling, but you can't really do that. So, Temporal Supersampling would be super sampling with a temporal element. Stochastic Supersampling would be supersampling with stochastic patterns. Deep Learning Super Sampling would be some form of supersampling (more samples than pixels) that incorporates deep learning. In fact, even Deep Learned Supersampling doesn't work. It's a form or supersampled projection or something or other. But the noun at the end can't be supersampling unless it's actually supersampling. Supersampled Deep Learning I'd have no beef with, although then you're describing the type of learning. I think full transparency would have the word reconstruction or projection upscaling or similar, so Deep Learning Supersampled Reconstruction perhaps.

And I do think we need to be specific, same as not calling all reconstruction techniques 'checkboarding'. Overloading terms just leads to confusion. What would you call DLSS when applied to supersampled image, so DLSS AA applied to an 8K image downsampled to 4K? DLSS Super Sampling? ;)
 
So, looks like Tensor Cores will be addressable through the recently released DirectML API, the same way RT Cores are addressable through DXR.

That their compute implementation ended up 2.5 times worse than NVIDIA's FP32 implementation is atrocious though. For a game developer it makes little sense to use DirectML as far as I can see, just have a generic compute code path and a CUDA one for tensor. DirectML will do nothing but slow you down.
 
Overloading terms just leads to confusion. What would you call DLSS when applied to supersampled image, so DLSS AA applied to an 8K image downsampled to 4K? DLSS Super Sampling? ;)

Exactly, supersampling is rendering at a higher resolution than output resolution and then downscaling it to output resolution, so 8K -> 4K
DLSS does the opposite, rendering at a lower resolution and then upscaling it like 2K -> 4K

Stretching the idea further of DLSS 1K -> 4K would still be called supersampling ?
Upscaling using a NN network trained to produce 4K supersampled images based on downsampled 1K input images still is upscaling a subsampled image (with inevitable general loss of detail)
See next post.
 
Last edited:
Okay, I see where you're going with that. It's like calling it Deep Learning Kentucky Fried Chicken if your setting the AI to learn to make KFC. However, the wording is wrong for that. It'd be 'Deep Learned Super Sampling' - we learned how to do it. Deep Learning is an adjective form describing the type of super sampling, but you can't really do that. So, Temporal Supersampling would be super sampling with a temporal element. Stochastic Supersampling would be supersampling with stochastic patterns. Deep Learning Super Sampling would be some form of supersampling (more samples than pixels) that incorporates deep learning. In fact, even Deep Learned Supersampling doesn't work. It's a form or supersampled projection or something or other. But the noun at the end can't be supersampling unless it's actually supersampling. Supersampled Deep Learning I'd have no beef with, although then you're describing the type of learning. I think full transparency would have the word reconstruction or projection upscaling or similar, so Deep Learning Supersampled Reconstruction perhaps.

And I do think we need to be specific, same as not calling all reconstruction techniques 'checkboarding'. Overloading terms just leads to confusion. What would you call DLSS when applied to supersampled image, so DLSS AA applied to an 8K image downsampled to 4K? DLSS Super Sampling? ;)
Its certainly challenging and I don’t envy the role of marketing to come up with names here. As ML becomes more prominent in every bit of technology there is were going to run more into these discussions than less.

nvidia probably should have called it DLAA to keep it non-confusing and just specify in the description that it’s trained off SSAA.

That being said, we are getting deep into semantics here. And you made mention of it earlier about how if rasterization made exactly a duplication of ray tracing do we call it ray tracing still. That problem is an interesting one with ML.

If we assume the ML can never surpass the source and with 100% accuracy it would be perfectly reproducing the same algorithm that it is trained by, in this case, if we limit -> infinity in terms of training methods and some novel work we get 99.9999999999999998% accuracy. So let’s say DLSS becomes fully indistinguishable from SSAA.

Consider the following scenario:

What do you call it? The function of SSAA is to take more data and run an algorithm to reduce to pixel loss.
So f(g) = 1440p(g) where g=4K image.
So. If we are training the ML algorithm to equal f(g).

Then the equation really is f(x) = f(g)

It no longer matters that the source data was there was more information and transmuted down to f(g) for a final output. With enough training examples we have made a new algorithm that creates f(g) given an x where x is a native aliased image.

So it’s not supersampling, in the essence in what it does, but the output is exactly the same.

Performance wise we are getting the same result at least visually. How we achieved the output is different. So is the difference now just a novelty or is it significant?

This is a bit like the diamonds, we can find them or engineer them now. Engineered diamonds are so cheap compared to found ones. But no one can tell the difference because engineered samples are perfect, and perfectly found diamonds are insanely expensive.

Should the consumers care if it’s found (blood conflict?) or if it’s engineered?

These are questions that will matter to those who care about novelty of how things are constructed, but not about those who care only about the output. Full circle then, who is the marketing directed towards? Those who care towards the novelty of how it is constructed or those who care about the output?

Considering that if we could achieve 99.999998% accuracy of SSAA using DLSS you’re going to walk out with a huge performance boost as well. Would anyone care about SSAA and how it’s made? Or just that DLSS is outputting the equivalence of SSAA?

Anyway I don’t have a position but I suspect there will be many of these issues across many industries in our near future.
 
That their compute implementation ended up 2.5 times worse than NVIDIA's FP32 implementation is atrocious though. For a game developer it makes little sense to use DirectML as far as I can see, just have a generic compute code path and a CUDA one for tensor. DirectML will do nothing but slow you down.
Link to what you are referring to ?
 
Its certainly challenging and I don’t envy the role of marketing to come up with names here.
Not at all. Marketing departments have no real obligations for transparency. They want to sell stuff, and if picking a misleading name will help with that, they'll pick a misleading name. That's their job. ;)

What do you call it? So it’s not supersampling, in the essence in what it does, but the output is exactly the same.
Again, supersampling is a method, not a result. Xiolin Wu's antialised line algorithm produces results indistinguishable from 16x supersampling, but it's not supersampling. It's way better than supersampling for lines.

Should the consumers care if it’s found (blood conflict?) or if it’s engineered?
in the case of the results, it doesn't matter how it's produced. In fact, it does - you want the best results in the most economical way. If DLSS produces better results than supersampling, we should use it. However, talking rendering tech means talking about techniques which means classifying them correctly. It's only the engineers who need to care about what DLSS is doing, not the consumers. In that, nVidia's choice of naming for marketing purposes is at odds with engineers talking about different image rendering technologies. Even if DLSS produces the same or better results than supersampling, it shouldn't be called Deep Learning Super Sampling because supersampling is not the method being used.
 
DLSS does the opposite, rendering at a lower resolution and then upscaling it like 2K -> 4K

This is not true. For instance DLSS2X renders at native resolution. Is it not DLSS anymore? Yes it is, it's the exact same NN doing the AA afaik. It just so happens that the DLSS mode that Nvidia chose to call just DLSS also upscales the image, with a second NN. The problem here is the name, not the tech. The truth is DLSS as a whole is a group of techniques that fall within a common brand name, but the actual main method, that "warrants" the name, that makes it an AA method and encompasses both DLSS modes is the DL AA step, and not the upscaling step, which is probably identical to Nvidia "AI Superresolution".

To make the naming even worse Nvidia further chose to call one of it's modes the same name as the group of tech, which is arguably the root of the discussion here. If the mode would have been called DLSS1X or DLSSUpscaled instead, there would be less confusion.

Stretching the idea further of DLSS 1K -> 4K would still be called supersampling ?

As discussed previously upsampling would be preferable, but otherwise, yes, it would be 1K upsampled, then upscaled. Let's put it this way, 1K 4xSSAA upscaled to 4k, suddenly the technique stops being supersampling, is it no longer supersampling being applied?
 
nvidia probably should have called it DLAA
Even that is pushing it.

It's a combination of anti-aliasing and super-resolution ... so DLAASR?

Actually since Marco Salvi already named a method TSRAA for super-resolution + anti-aliasing, it's only right to call it DLSRAA (although I have no idea about his actual algorithm ... NVIDIA seems uninterested in internal competition for DLSS it seems).
Considering that if we could achieve 99.999998% accuracy of SSAA using DLSS
We are pushing the texture resolution limits and we have no edge adaptive upscalers to compare against. Upscaling on PCs is relatively new, uptill now it was mostly restricted to the more secretive console devs. A variation of SMAA designed for upscaling, together with a bit of sharpening might easily look as good or better.

Regardless, that zoomed in tower will remain looking like shit. There's a limit to how much data can be stored in a set of weights for a small NN. It's not going to improve in that respect.
 
Last edited:
It's a lie when they lie about it too.

The undersampled wire with gaps in it stays an undersampled wire with gaps in it ... some things can only be fixed by supersampling, but really.
 
Last edited:
If we assume the ML can never surpass the source and with 100% accuracy it would be perfectly reproducing the same algorithm that it is trained by, in this case, if we limit -> infinity in terms of training methods and some novel work we get 99.9999999999999998% accuracy. So let’s say DLSS becomes fully indistinguishable from SSAA.

You seem into NN/ML, but for some of your statements I see a lack of comprehension regarding information transfer and interpretation (no offence! it's just a perception).
Please read this, combine it with this "The principle can be used to prove that any lossless compression algorithm, provided it makes some inputs smaller (as the name compression suggests), will also make some other inputs larger". Which is slightly inprecise, because it makes a tiny number of inputs smaller, and a humungously larger set of inputs larger (smaller vs. larger over bits gained: 50/50, 25/75, 12.5/87.5, etc.).

The generalization would state something in the direction of "a statistical reconstruction network/algorithm, will make some small number of samples tend towards ground truth, and a humungously larger set of of sample-correction indistinguishable from noise".

Information management is the limiting factor in the whole setup. A context for a sample (say 8 neightbours in current frame and 9 neightbours in the previous frame) contains 256^17 (2^136) different states (just luminance here), extrapolate this much information to a whole frame.

So there are two problems occuring concurrently here: you can not make the hallucination machine too explicit because the model would bust your available memory, so you have to compress that information (classification and merging). At the same time you have to build and apply the model on a sub-sampled source (no ground-truth information whatsoever). Which means the context itself is sparse, and bijectivity is lost, the same sub-sampled information can, in ground-truth, produce 256 (just luminance here) statistically completely undecidable/unrankable results, when ground truth can have 256 outcomes there. In effect, in worst case, hallucination is just a probabalistic random number generator. And as I infered above, actually the majority (completely perceptually unweighted, mathematically L2 distance) of hallucinations will be random garbage.

This whole thing is nothing but a lossy image compressor, where you have a original source, a reduced information source, and a reconstruction process with side-channel (the DLSS profile). You have to put the information content of the side-channel into relation to the amount of information truly recoverable.

Look back at the pigeon whole principle, you can not "summon" two informations out of one. You can only be lucky that you only need one information, or you accept basic lossy/lossless information channel theory and thus occational non-sense results.

Nvivia didn't break Shannon's theorems. They also didn't make computing Kolmogorov's complexity practical. Lossy NN-compression is nothing new. And this is still computer science, in which 2k neural networks are still unable to produce more than 2k of additional information. The trick is ofc, what information expresses, so you rank information by importance, which is mostly heuristical, or pseudo-visual metrics.

Nothing of this is related to "feeling" that it looks better, it's all about rigerous evaluation of mathematical metrics. Just be conscious what is what.
 
What happened to Marco Salvi's TSRAA BTW? (A much more honestly named method.) Was it memory holed to make DLSS look better?
 
This whole thing is nothing but a lossy image compressor, where you have a original source, a reduced information source, and a reconstruction process with side-channel (the DLSS profile). You have to put the information content of the side-channel into relation to the amount of information truly recoverable.
I hadn't actually thought of it like that, but now I mention it, given my weak understanding of these NN solutions, that's exactly what it is, same as video encoding. Only of course we no direct encode but a ruleset to 'decode' running on a NN.
 
You seem into NN/ML, but for some of your statements I see a lack of comprehension regarding information transfer and interpretation (no offence! it's just a perception).
None taken. I only started a role in ML over the past few months, and my title isn't DS. Even if it were I wouldn't be deserving of the title. Most of it has been self taught, and i'm still very much in learning mode so I appreciate the response, and a couple other ones in the past you've responded to me as well, also appreciate those insights as well. I'll definitely look over the materials.

This whole thing is nothing but a lossy image compressor, where you have a original source, a reduced information source, and a reconstruction process with side-channel (the DLSS profile). You have to put the information content of the side-channel into relation to the amount of information truly recoverable.
Absolutely agree, it's a perfect definition of what's occurring here.

We have a general rule that ML/NN algos shouldn't hit anywhere close to 99% unless we did something completely wrong. But I wanted to make a hyperbolic example of an impossible future where perhaps it could hit it properly, and address whether the naming is still appropriate. I still have no stance on the naming, I just don't know if in the future we'll see more things named like this or less. And in some ways i see how (with the fewest words possible describing what this is) DLSS seems appropriate. But in other circumstances read another way seem completely inappropriate.

Nvivia didn't break Shannon's theorems. They also didn't make computing Kolmogorov's complexity practical. Lossy NN-compression is nothing new. And this is still computer science, in which 2k neural networks are still unable to produce more than 2k of additional information. The trick is ofc, what information expresses, so you rank information by importance, which is mostly heuristical, or pseudo-visual metrics.
Agreed, nothing new, the challenge for Nvidia is engineering the model to run in a < 7 ms IIRC, that I suspect changes the way they approach the challenge.
 
I believe motion vectors are an added input for DLSS

Yeah, you're right Atomic Heart devs said so.

Any way, it seems engines could use some alternative for SRAA ... 1080/1440p is good enough for most geometry/texture detail, so playing the upscaling game like the consoles makes sense on PC too. DLSS is pretty heavy on the GPU for what it is though and without the tensor cores it's going to run like a dog. Use some type of VQ classifier to select interpolation kernels instead of using CNN's, generally lighter weight and it avoids the software patent on DLSS AFAICS. Although maybe a purely analytical approach would work as good, TAAU looks okay.
 
Last edited:
Back
Top