RT GPU Benchmarking Should Not Be Done Solely With Upscaling For Record Keeping Purposes

Boss

Regular
Newcomer
I was scouring the web trying to find RT benchmarks for different games and while looking through various written reviews, I noticed that upscaling was used in the graphs for several games. On many sites, reviewers didn't even bother to chart the native performance of the GPUs in their RT benchmarks. An example of this is GamersNexus review of the 5090. Dying light, Resident Evil and Black Myth Wukong have upscaling factors baked into their graphs. This extremely problematic for a variety of reasons.

The first reason is that there is potential variability in upscaling factors. As we've seen with Intel's Xess, gpu vendors are able to change the upscaling factors leading to unlike comparisons between GPUs. If a reviewer were to make the mistake of equating DLSS quality to XeSS quality, it would be an unequal comparison as it relates to upscaling factors.

Secondly, there is potential variability in upscaling performance cost between the GPUs being compared. In GamersNexus review, they use FSR to compare all GPUs. However, FSR has a performance cost and that cost is not necessarily equal between all gpu vendors. This adversely impacts the comparison. Furthermore, it may be the case that other vendors have upscaling algorithms that have a lower performance cost at an equivalent or superior quality level. In that instance, the data isn't particularly helpful.

Furthermore, we've now seen a new change with DLSS 4 which allows the user to replace the CNN model with the new TNN model in games. With this development, it's quite easy to envision a future where this is done automatically and not on a game by game basis. As we've seen, the TNN model has a higher performance cost. If the model becomes something that is automatically replaced, it renders benchmark data as useless.

Finally, when comparing historical performance between GPUs, things become unnecessarily difficult if the raw rt data is not available. By raw RT data, I mean traditional benchmarks with no upscaling, ray reconstruction and other software features. While it's helpful to provide upscaling benchmarks to the viewers, it should not be done at the expense of the raw data. Raw benchmarks should be always be prioritized and upscaling benchmarks should be treated as an "additive".
 
The first reason is that there is potential variability in upscaling factors. As we've seen with Intel's Xess, gpu vendors are able to change the upscaling factors leading to unlike comparisons between GPUs. If a reviewer were to make the mistake of equating DLSS quality to XeSS quality, it would be an unequal comparison as it relates to upscaling factors.

Agree this is a real problem.

Secondly, there is potential variability in upscaling performance cost between the GPUs being compared. Furthermore, it may be the case that other vendors have upscaling algorithms that have a lower performance cost at an equivalent or superior quality level. In that instance, the data isn't particularly helpful.

I see what you mean but this is no different to the hundreds of workloads in games that have different performance profiles on different architectures. The fact that AMD, Intel or Nvidia may have their own version of the same thing does add another wrinkle but we’ll have to deal with that until game developers ship their own ML models.

By raw RT data, I mean traditional benchmarks with no upscaling, ray reconstruction and other software features. While it's helpful to provide upscaling benchmarks to the viewers, it should not be done at the expense of the raw data. Raw benchmarks should be always be prioritized and upscaling benchmarks should be treated as an "additive".

There is no reason why upscaled graphics shouldn’t be considered raw data. How is it different to the many other systems that modern games employ? We really need to stop pretending that games run at some pristine “native” configuration. Upscaling and dynamic resolution are only going to see more widespread usage and soon “native” won’t mean anything at all.
 
There is no reason why upscaled graphics shouldn’t be considered raw data. How is it different to the many other systems that modern games employ? We really need to stop pretending that games run at some pristine “native” configuration. Upscaling and dynamic resolution are only going to see more widespread usage and soon “native” won’t mean anything at all.
The reason I made this thread was because I was tracking the historical performance of GPUs over time. So as you can imagine, it becomes very troublesome to do so when upscaling is used. In the becomes particularly troublesome in since we're in an ever changing landscape of different machine learning models, model performance costs, etc. RT benchmarks relying solely on upscaling is akin to building a house with a very unstable foundation. Upscaling skews the data and how much it skews the data is dependent on the ml model used.

I have no problem with upscaling data being provided along with reviews but it should not be the only data point when benchmarking RT. RT benchmarking with upscaling is not a measure of how well the GPU runs RT but, how well it does so with the aid of an upscaling model. That model may or may not be accelerated via hardware. If it's accelerated via hardware, each new iteration of the model carries it's own additional cost so you're attempting to benchmark 2 things at the same time. Perhaps there should also be separate data benchmarking showing how well each gpu runs each model? So for example a DLSS2, DLSS3, DLSS4, FSR, XeSS performance cost comparison?
 
The reason I made this thread was because I was tracking the historical performance of GPUs over time. So as you can imagine, it becomes very troublesome to do so when upscaling is used. In the becomes particularly troublesome in since we're in an ever changing landscape of different machine learning models, model performance costs, etc. RT benchmarks relying solely on upscaling is akin to building a house with a very unstable foundation. Upscaling skews the data and how much it skews the data is dependent on the ml model used.

Ok but why is upscaling different to all of the other systems in a game? I think the concern about inconsistent workloads across the various upscaling methods is valid however upscaling itself is just another thing running on the GPU that’s part of generating the frame on screen. When UE updates a shadow map every 4 frames instead of every frame do you count that as upscaling? When an intermediate buffer is rendered at lower than full screen resolution how are you accounting for that in your analysis? All of those things are also "unstable foundations".

I have no problem with upscaling data being provided along with reviews but it should not be the only data point when benchmarking RT. RT benchmarking with upscaling is not a measure of how well the GPU runs RT but, how well it does so with the aid of an upscaling model.

A GPU runs RT with the aid of many optimizations. The only reason you're focused on upscaling is that it's a toggle you have control over as an end user. You don't have control over a ton of other things that games do to make RT run well. If you want to track historical GPU performance the only thing you need to control for is that all GPUs are running the same workload for each benchmark you choose.
 
Back
Top