AMD FSR antialiasing discussion

Discussion in 'Architecture and Products' started by Deleted member 90741, May 20, 2021.

  1. troyan

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    610
    Likes Received:
    1,148
    Would AMD really use FP32 for the compute shader? I thought they talked about FP16...

    Interessting would be seeing FSR 2.0 on a Oled display. The fast pixel response time should make the movement artefacts more visible...
     
    PSman1700 likes this.
  2. arandomguy

    Regular Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    263
    Likes Received:
    366
    It's just guesswork on my part based on HUB's data.

    The 6800 XT I think has something like 35%? more FP16 throughput than the RTX 3080 on paper, but I don't know off hand what real tests of the two compare in that respect.

    At least with my rough estimation, as I haven't gone through the data finely and human pattern recognition of course can be pretty faulty, the most commonality to me does seem to be FP32 relative to everything else. Just to me more precise don't mean just total FP32 throughput, but how much FP32 is available to the GPU relative to where it performs for gaming.

    Also the other issue is this could just be correlation and something else on the GPUs that is scaling with FP32 resources is accounting for the difference. As well as the limited sample size we have here.
     
    PSman1700 likes this.
  3. arandomguy

    Regular Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    263
    Likes Received:
    366
    Just a follow-up, so I decided to parse some numbers. More ideally I would have liked to have actual overhead numbers (as I used in an earlier post), but making do with perf gain instead. Performance gains numbers is how much more avg FSR 2.0 quality has over native. FP32 numbers pulled from TPU's database, FP16 ratio also pulled from TPU's database. I used "vs" because the cards being compare were the closest to the same performance tier for this game based on the numbers used and they had the same settings.

    RX 570 vs. 1650 Super at 1440p medium -

    RX 570 - 10% perf gain. 1650 Super - 6%.
    RX 570 - 5.1 tflops, 1:1 FP16. GTX 1650 Super - 4.4 tflops, 2:1 FP16.

    The 1650 Super however has a faster baseline and therefore less total frame time per frame to work with for speedups.

    Vega 64 vs 1070ti at 1440p ultra -
    Vega 64 - 25%, 1070ti - 20%
    Vega 64 - 12.7 tflops, 2:1 FP16. GTX 1070.ti - 8.2 tflops, 1:16 FP16.

    Both had essentially the same perf, so frame time per frame would be very close.

    5700 XT vs. RTX 2060 at 1440p ultra, except RTX 2060 used low textures
    5700 XT - 21%, RTX 2060 - 18%
    5700 XT - 9.75 tflops, 2:1 FP16. RTX 2060 - 6.5 tflops, 1:16 FP16.

    6800 XT vs. RTX 3080 at 4K Ultra
    6800 XT - 28%, RTX 3080 - 38%
    6800 XT - 20.7 tflops, 2:1 FP16. RTX 3080 - 29.8 tflops, 1:1 FP16.

    6800 XT was actually faster at native but ended up slower with FSR 2.0 scaling regardless of quality.

    I'm going to throw this possibility out there as well. I'd wonder if AMD were to anticipate that performance comparisons against DLSS would likely been done on Ampere that they might have put some priority in optimizing for Ampere?

    As another aside especially with XeSS also coming up and some speculation on how it's going to approach things, I wonder if we need to maybe test scaling image quality to see if it's final result consistent across different architectures and/or GPUs.
     
    #1443 arandomguy, May 14, 2022
    Last edited: May 14, 2022
    PSman1700 likes this.
  4. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,436
    Likes Received:
    911
    It's difficult to guess what the bottleneck is because scaling between GPUs is all over the place. Pascal processes the FSR portion of the frame faster than than GCN despite the FP32 and FP16 deficit.
     
    T2098 likes this.
  5. Dictator

    Regular

    Joined:
    Feb 11, 2011
    Messages:
    683
    Likes Received:
    3,976
    If I had a 2080 I would love to double check that.
    After my vid is done Here, I am not sure why I am measuring FSR 2.0 usually being more expensive on AMD than NV, but is being measured. But one cannot forget it was not completely consistent. 580 using my measurement executed FSR faster than the 1060 at 1080, apparently.

    Edit: If i have time (which I do not), I would love to profile the cost of FSR 2.0 more and more accurately. Even using native res internal is not the best as a base, as you are missing out on the UI costs And anything else that changes when output res is higher (mips, geometry LODs, etc). And even just toggling FSR 2.0 is completely opaque as to what it is doing. It could also be changing the costs of other things as well and not just adding in FSR 2.0 reconstruction.
     
    #1445 Dictator, May 14, 2022
    Last edited: May 14, 2022
    PSman1700 likes this.
  6. arandomguy

    Regular Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    263
    Likes Received:
    366
    What's giving you that impression?

    Vega 64 for instance has more perf gain with FSR2 than the 1070ti. The 1070ti's relative perf against the Vega 64 increases as as resolution is lowered. We don't have a 1706x960 test, but at 1080p Vega 65 fps, 1070 ti 68 fps. Whereas at 1440p FSR2 Quality Vega 64 60 fps, 1070ti 59 fps. This at least to me suggests the Vega 64 is spending less time on FSR2 on avg.
     
    PSman1700 likes this.
  7. Krteq

    Newcomer

    Joined:
    May 5, 2020
    Messages:
    150
    Likes Received:
    264
    #1447 Krteq, May 14, 2022
    Last edited: May 14, 2022
    Lightman and T2098 like this.
  8. arandomguy

    Regular Newcomer

    Joined:
    Jul 27, 2020
    Messages:
    263
    Likes Received:
    366
    I parsed TPU's numbers with their RTX 3060 (https://www.techpowerup.com/review/amd-fidelity-fx-fsr-20/2.html) using the first 2 comparison FPS numbers -

    RTX 3060 frame times -

    1440p - 14.1ms
    DLSS Q 4k - 18.2ms
    FSR2 Q 4k - 19.2ms
    FSR1 Q 4k - 16.7ms

    DLSS Q 4k - 4.1ms. FSR2 Q 4k - 5.1ms. FSR1 Q 4k - 2.6ms.

    Are you seeing it being more expensive on AMD vs. NV in general or is there specific architecture to architecture variances? At least based on my very early conjecture while Ampere seems faster than RDNA2, is that extending to all architectures? At least in the HUB test it doesn't seem like Turing is faster than RDNA1, nor is Pascal faster than GCN.

    I agree the overhead time isn't strictly just FSR 2.0 (or DLSS) processing itself as not everything else is identical. I guess maybe the better term would be functional overhead or functional cost in terms of what this would be referring to. If hypothetically one upscaler does need higher inputs of certain factors that another doesn't that would still from a functional stand point mean it does have a higher cost. The interesting follow-up on this line though would be to know what maybe the other differences are between the upscaling methods aside from the actual scaling process itself.

    The other advantage of having that data is it better isolates against the issue of low vs high fps if we're assuming the upscalers have a relatively fixed cost. As if it's true than gains for scalers are higher at lower fps but that also shrinks the difference in performance differences between the scalers. While the reverse would be true at a higher fps.
     
    PSman1700 likes this.
  9. OlegSH

    Regular

    Joined:
    Jan 10, 2010
    Messages:
    806
    Likes Received:
    1,640
    These are wrong numbers since 1440p resolution scene here is very likely CPU bound, you need a GPU utilization counter to make sure scene is not CPU bound, which is available on the next page - https://www.techpowerup.com/review/amd-fidelity-fx-fsr-20/3.html
    GPU utilization here is 97% in 1440p, which indicates GPU bound scene.
    DLSS Quality on 3080 takes 1.01 ms for 4K reconstruction vs No AA 1440p
    FSR 2.0 Quality on 3080 takes 1.66 ms for for 4K reconstruction vs No AA 1440p
    You can search for DLSS integration guide where execution times are highlighted for different RTX models, the calculations above align with DLSS execution time numbers in this guide.
     
  10. techuse

    Veteran

    Joined:
    Feb 19, 2013
    Messages:
    1,436
    Likes Received:
    911
    Alex tested a 1060 vs a 580. Despite the 580 having almost 50% more FP32 compute and 800% more FP16 compute a 1060 runs the algorithm faster.
     
  11. troyan

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    610
    Likes Received:
    1,148
    On my 3090 with a 225W instead of 375W power limit DLSS performs much better:
    Quality: +16%
    Performance: +20%

    Tried it on my 80W RTX2060 in the Notebook but it looks like 6GB VRAM is not enough in this game...
     
    PSman1700 and DavidGraham like this.
  12. Dampf

    Regular

    Joined:
    Nov 21, 2020
    Messages:
    297
    Likes Received:
    519
    Turn down texture quality and see how it does. I am also curious how DLSS performs versus FSR 2.0 with a power limited GPU such as a laptop 2060, maybe the tensor cores are able to make a bigger difference in performance there.
     
  13. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    3,282
    Likes Received:
    3,471
    PSman1700 likes this.
  14. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,901
    Likes Received:
    4,554
    Agreed, and will be welcome to see how FSR 2.0 scales outside the Deathloop universe.
    AMD’s FSR 2.0 debut, while limited, has upscaled our GPU hopes | Ars Technica
     
    PSman1700 likes this.
  15. troyan

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    610
    Likes Received:
    1,148
    Did the opposite test - how less power DLSS needs to archive the same performance:
    350W - FSR Performance in 4K: 100FPS
    280W - DLSS Performance in 4K: 100FPS

    Shame that i have not a better notebook. Would be really interessting to see how both upscaling methods behave in a tight power budget scenario.
     
    #1455 troyan, May 15, 2022
    Last edited: May 15, 2022
    OlegSH, T2098, DavidGraham and 6 others like this.
  16. Dampf

    Regular

    Joined:
    Nov 21, 2020
    Messages:
    297
    Likes Received:
    519
    Just turn down texture quality or disable Raytracing. There are plenty of ways to test that.
     
  17. Dictator

    Regular

    Joined:
    Feb 11, 2011
    Messages:
    683
    Likes Received:
    3,976
    Smart test
     
  18. troyan

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    610
    Likes Received:
    1,148
  19. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,460
    Likes Received:
    475
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...