Nvidia DLSS 1 and 2 antialiasing discussion *spawn*

Keep in mind that DLSS has costs and that cost is fixed.

The less tensor cores the longer that fixed cost will be so it doesn’t matter how little or how much is happening on the screen really, it has little control over that portion. to meet 16.6ms you’re going to need to be significantly less than that to make up for DLSS time.

If we assume DLSS has a fixed time of 5-6ms (it is likely less). That frame time needs to be below or around 9ms to make it.

I’m just not sure if the 2060 is the card to do that, its nearly like asking for it to make 100+ fps. Which is tough already for most cards in the 6TF range.

You would have to add Tensor Core TF's on top of that compute when running DLSS right?

This show that compute run all the time but Raytracing and DLSS run concurrently on top of that...and add performance on top of the normal compute performance:
Desktop-Screenshot-2020.09.01-11.17.07.05-1030x579.png


Pure FLOPS were always a "rubber" metric...but now it just go a whole lot worse
 
DLSS in its 2.0 iteration is a bit different to think about - when I say it is rendering internally at XXXX by XXXX resolution in the video I am simplifying it a bit. Post processing like motion blur, depth of field, bloom, colour correction, etc. is all done at native resolution actually in DLSS 2.0. So it more expensive by default than just rendering at the internal resolution. THen you have the DLSS run time on top, which is different between GPUs and also different between Ampere and Turing (Ampere has a level of concurency I believe with its Tensor cores that TUring does not have).
 
DLSS in its 2.0 iteration is a bit different to think about - when I say it is rendering internally at XXXX by XXXX resolution in the video I am simplifying it a bit. Post processing like motion blur, depth of field, bloom, colour correction, etc. is all done at native resolution actually in DLSS 2.0. So it more expensive by default than just rendering at the internal resolution. THen you have the DLSS run time on top, which is different between GPUs and also different between Ampere and Turing (Ampere has a level of concurency I believe with its Tensor cores that TUring does not have).

I used a slide for Turing, this one is for Ampere:
NVIDIA-GeForce-RTX-30-Tech-Session-00036_657EBA0F93174506B1741A4533873B0D.jpg
 
Here we are battling a 2018 rtx2060 6TF range gpu, which at the time was the lowest 2000/ray tracing series turing gpu.
And it still favours quite well in normal rendering against the ps5 in the latest DF video. Throw some RT in the mix and the paltry 2060 will be really competitive.

Theres good reason why we see this 2060 comparison happening though, its the only gpu the ps5 has a noticeable advantage over basically. It should, just by glancing over where a 5700/XT resides in comparison to a 2060 vanilla. Ofcourse, once dlss and rt both make their entry, that 2060 again will be quite competitive if not outright offering the best experience, in special for ports done somewhat better (which Nioh is clearly not the best case of).

Lets see with other titles like cp2077 which use next gen technologies.
 
  • Like
Reactions: HLJ
I now wonder what the concurrent performance ceiling of Ampere is.
If it can do compute + raytracing + DLSS concurrently (where turing only was able to do compute + raytracing or compute + DLSS concurrently) I assume power/TDP becomes the limiting factor and pure compute FLOPS does not tell the whole story.
 
Btw, Wolfenstein is the only game where this Ampere concurrency is actually in use afaik.

It's possible to see the DLSS processing cost using the SDK version of nvngx_dlss.dll and enabling an on-screen indicator from Windows registry. Looks like this:

RTX_Example-Win64-Shipping_2021_03_03_10_00_02_219.png

Wolfenstein is the only game where I've seen this show 0.00ms.

Maybe the concurrency is difficult to implement or something, otherwise you'd think it would be in UE4 etc already.
 
Btw, Wolfenstein is the only game where this Ampere concurrency is actually in use afaik.

It's possible to see the DLSS processing cost using the SDK version of nvngx_dlss.dll and enabling an on-screen indicator from Windows registry. Looks like this:

View attachment 5317

Wolfenstein is the only game where I've seen this show 0.00ms.

Maybe the concurrency is difficult to implement or something, otherwise you'd think it would be in UE4 etc already.

It only works in combination with raytracing. Can you try Call of Duty Cold War? Performance impact is very small for DLSS.
 
It only works in combination with raytracing. Can you try Call of Duty Cold War? Performance impact is very small for DLSS.
It should work in combination with any rendering. But I assume that the renderer must be crafted in a way which will allow this since it means that DLSS and everything after it must be done asynchronously with graphics.
 
Post processing like motion blur, depth of field, bloom, colour correction, etc. is all done at native resolution actually in DLSS 2.0
That's not always the case, that's just a recommendation for DLSS from NVIDIA, but there are many games which don't follow this recommendation - Control, Death Stranding, CP2077 and even Nioh, this usually appears as either aliased edges during motion blur in CP2077 or Control or as DOF Boke shapes flickering as with Death Stranding and Nioh. I wish there was at least depth buffer rendered at full resolution (should be virtually free since rasterization speed is 4x for depth only anyway), so that devs can do depth aware upsampling for such effects as MB, DOF, etc., this would eliminate resolution loss due to low res post processing, but would probably require additional implementation effort from game devs (and we know that they even forget setting LOD bias levels on a regular basis). Luckily, this resolution loss due to low res PP is nowhere near as severe as fucked up jittering or resolve in GoW - https://forum.beyond3d.com/threads/god-of-war-ps4.58133/page-38#post-2193708

Maybe the concurrency is difficult to implement or something, otherwise you'd think it would be in UE4 etc already.
In order to implement this concurrency, additional profiling is required and inter frame async execution is likely a requirement, so yes, it does make things harder.
 
Can you try Call of Duty Cold War?
Nope, don't happen to own that game.

These standalone caustics demos have the SDK nvngx_dlss.dll version packaged in them:
https://drive.google.com/drive/folders/10MRz-_jcL5pxotvJAXD46Cik6Bm2bo9g

Saving this into a .reg file and running it enables the on-screen indicator:

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\NVIDIA Corporation\Global\NGXCore]
"FullPath"="C:\\Program Files\\NVIDIA Corporation\\NVIDIA NGX"
"ShowDlssIndicator"=dword:00000001
 
Btw, Wolfenstein is the only game where this Ampere concurrency is actually in use afaik.

It's possible to see the DLSS processing cost using the SDK version of nvngx_dlss.dll and enabling an on-screen indicator from Windows registry. Looks like this:

View attachment 5317

Wolfenstein is the only game where I've seen this show 0.00ms.

Maybe the concurrency is difficult to implement or something, otherwise you'd think it would be in UE4 etc already.
Any idea what happens when you use this OSD and Marvels Avengers + DLSS? Does it show the internal DRS res constantly switching I wonder...
 
I guess the comparison to the 2060 stopped the hopes about comparing consoles to Ampere in performance...the bottom Turing RTX card is now showing it self to be on par with the consoles (disgarding the dynamic resolution used on consoles here)...I wonder how much more DLSS can achieve (I have zero interest in DLSS 2.1's "dynamic resolution" on PC) as NVIDIA updates DLSS.
 
Nioh 2 isn't a good comparison point anyway because the game is rather badly optimized for NV h/w as it is and DLSS in this case is just a band aid slapped on top of the badly optimized code to make things somewhat better.

So we have a kinda worst case here, no biggie.
 
Any idea what happens when you use this OSD and Marvels Avengers + DLSS? Does it show the internal DRS res constantly switching I wonder...
Don't have that game either. I would assume the render resolution indicator updates in realtime as render resolution changes.

The new UE 4.26 editor DLSS plugin has a slider which allows you to use any render resolution between 50% and 66%. The resolution indicator updates in realtime when dragging this slider back and forth.
 
The new UE 4.26 editor DLSS plugin has a slider which allows you to use any render resolution between 50% and 66%.

Am I correct to assume this is horizontal resolution?
Meaning that setting the slider to 50% actually means rendering at 25% internal render resolution, and setting it to 66% means ~44% internal render resolution.

I get the lower threshold at 50%, but the 66% highest threshold seems low. I wonder if the cases where DLSS2 isn't working so well could gain with a higher resolution (at the cost of lower performance, of course).
 
Back
Top