Would AMD really use FP32 for the compute shader? I thought they talked about FP16...
Would AMD really use FP32 for the compute shader? I thought they talked about FP16...
If I had a 2080 I would love to double check that.Unfortunately it seems like there is only 1 GPU that was tested which used 4K and 1440p resolutions which was the RTX 2080. So this is the only sample in that in which we can compare the scaling overhead against native.
RTX 2080 avg Frame times in milliseconds -
1440p Native - 13 (77 fps)
FSR 2.0 Quality 4k- 18.5
DLSS Quality4k - 16.7
This means the avg frame time cost for FSR 2.0 is 5.5ms, DLSS is 3.7ms.
RTX 2080 1% Frame times in milliseconds -
1440p Native - 18.2 (55 fps)
FSR 2.0 Quality 4k- 23.3
DLSS Quality 4k - 21.7
This means the 1% frame time cost for FSR 2.0 is 5.1ms, DLSS is 3.5ms.
So at least for the RTX 2080 in this test case the overhead for FSR 2.0 Quality 4K is about 50% higher than DLSS Quality 4K, which means DLSS is 50% faster or FSR 2.0 is 33% slower depending on perspective.
However at least my rough gauge of the overall data gives me the impression that FSR 2.0 might be most dependent on the available amount of FP32 resources relative to everything else. This would mean that Turing and by extension the RTX 2080 might be "weaker" (well we don't really have a baseline point) with respect to FSR 2.0 performance. This would however also does mean the interesting irony that FSR 2.0 performs better on Ampere than RDNA2, which at least the data for the 6800 XT and RTX 3080 in this test would corroborate.
It's difficult to guess what the bottleneck is because scaling between GPUs is all over the place. Pascal processes the FSR portion of the frame faster than than GCN despite the FP32 and FP16 deficit.
Would AMD really use FP32 for the compute shader? I thought they talked about FP16...
If I had a 2080 I would love to double check that.
After my vid is done Here, I am not sure why I am measuring FSR 2.0 usually being more expensive on AMD than NV, but is being measured. But one cannot forget it was not completely consistent. 580 using my measurement executed FSR faster than the 1060 at 1080, apparently.
Edit: If i have time (which I do not), I would love to profile the cost of FSR 2.0 more and more accurately. Even using native res internal is not the best as a base, as you are missing out on the UI costs And anything else that changes when output res is higher (mips, geometry LODs, etc). And even just toggling FSR 2.0 is completely opaque as to what it is doing. It could also be changing the costs of other things as well and not just adding in FSR 2.0 reconstruction.
These are wrong numbers since 1440p resolution scene here is very likely CPU bound, you need a GPU utilization counter to make sure scene is not CPU bound, which is available on the next page - https://www.techpowerup.com/review/amd-fidelity-fx-fsr-20/3.htmlI parsed TPU's numbers with their RTX 3060 (https://www.techpowerup.com/review/amd-fidelity-fx-fsr-20/2.html) using the first 2 comparison FPS numbers -
Alex tested a 1060 vs a 580. Despite the 580 having almost 50% more FP32 compute and 800% more FP16 compute a 1060 runs the algorithm faster.What's giving you that impression?
Vega 64 for instance has more perf gain with FSR2 than the 1070ti. The 1070ti's relative perf against the Vega 64 increases as as resolution is lowered. We don't have a 1706x960 test, but at 1080p Vega 65 fps, 1070 ti 68 fps. Whereas at 1440p FSR2 Quality Vega 64 60 fps, 1070ti 59 fps. This at least to me suggests the Vega 64 is spending less time on FSR2 on avg.
Turn down texture quality and see how it does. I am also curious how DLSS performs versus FSR 2.0 with a power limited GPU such as a laptop 2060, maybe the tensor cores are able to make a bigger difference in performance there.On my 3090 with a 225W instead of 375W power limit DLSS performs much better:
Quality: +16%
Performance: +20%
Tried it on my 80W RTX2060 in the Notebook but it looks like 6GB VRAM is not enough in this game...
That's data formats but there are no indication what math they are using to process all these buffers? Unless I'm not seeing it.
Agreed, and will be welcome to see how FSR 2.0 scales outside the Deathloop universe.More importantly i'd wish to see more then just one game which was truly optimized for the first show to the public. How does it perform in Horizon, CP2077, god of war....
AMD’s FSR 2.0 debut, while limited, has upscaled our GPU hopes | Ars TechnicaAt this point, the real question is exactly how well FSR will scale outside of Deathloop's specific traits: its first-person perspective; its small, tight cities; and its high-contrast, saturated-color art style. Will FSR 2.0's rendering wizardry look as good while driving a virtual car or when seen in third-person perspective, across a foliage-lined landscape, or pumped full of more organic colors and constructions? Or is this where the FSR 2.0 model will begin coughing and stuttering as it tries to intelligently process wooden bridges, prairies full of swaying grass, or level-of-detail pop-in of elements on the horizon?
On my 3090 with a 225W instead of 375W power limit DLSS performs much better:
Quality: +16%
Performance: +20%
Tried it on my 80W RTX2060 in the Notebook but it looks like 6GB VRAM is not enough in this game...
Just turn down texture quality or disable Raytracing. There are plenty of ways to test that.Shame that i have not a better notebook. Would be really interessting to see how both upscaling methods behave in a tight power budget scenario.
Smart testDid the opposite test - how less power DLSS needs to archive the same performance:
350W - FSR Performance in 4K: 100FPS
280W - DLSS Performance in 4K: 100FPS
Shame that i have not a better notebook. Would be really interessting to see how both upscaling methods behave in a tight power budget scenario.
This one lacks description. Which one is FSR / DLSS? Thanks.