Here are a few examples of what I’m referring to from my personal experience. Now anyone can claim they’re irrelevant to their personal use case and therefore not worthy of a review but I am a heavy user.
In home streaming. Nvenc on the 3090 is incapable of encoding 4K above ~80fps. I had to discover this on my own through trial and error. I thought my card was broken before I found corroborating experiences online. I can’t think of a single outlet that has investigated encoding or streaming performance in the past 4 years. Nvidia certainly doesn’t advertise those limits.
DSR/DLDSR. I had to trawl through Reddit posts to get some kind of explanation of how the “smoothness” setting is actually flipped when you’re using DSR vs DLDSR. It’s confusing as heck. I recently started playing with RTX HDR and it’s the same story there.
This is what I mean by the experience of actually using a graphics card not being captured in avg FPS graphs.
Just in case you hadn't seen it, as I happened upon it way back when I was digging through their SDK docs for the optical flow accelerator block, trying to see what the performance differences were between Ampere and Ada generations.
Archive version of Video Codec SDK
docs.nvidia.com
They only give info for 1080p 8-bit encoding, but if you take the HEVC p3 preset numbers and scale them downward based on 4k having 4x the number of pixels to encode, you're very much in the ballpark of ~80fps, doubly so if you were doing a 10-bit encode.
Edit: I do have to give Nvidia some credit for transparency in their table too, when the numbers aren't exactly flattering. Not only has encoding
performance overall stayed largely flat since Pascal, Ada regresses in a lot of the presets compared to Ampere, and on some of the slower (high quality ones) even Pascal is significantly faster. They mention in one of the footnotes that performance scales essentially linearly with the video clocks, and then give those same video clocks as examples, with Ada clocked head and shoulders above the rest, and yet performing
worse.
Like
@trinibwoy says, it'd be a very interesting deep dive to try to figure out
why encoder performance seems to regress. Do they just remove 'slices' or chunks of the decoder block to save die space every generation, knowing that the higher clocks will even everything out and keep performance roughly static?