B3D IQ Spectrum Analysis Thread [2021-07]

you are comparing all at the same resolution though I take it
or is the balanced VRS shot taken at a higher resolution (its FPS is lower than VRS off)
For VRS, yes the source images are all the same resolution. I may go back and do the other Doom Eternal ones again with the VRS system.
 
Gears 5 VRS Analysis

*I'll need to come back and fix the way the images are shown, pngs are transparent and making it really messed up. It's hard to see the overlay I think.

Original Images Courtesy of a Neogaf user(will come back later to get URL)
Native | Balanced | Performance


VRS Comparisons

Legend

Blue square means VRS outperformed the native image
Green square means identical or imperceptible difference from native
Yellow means slightly degraded, difficulty to spot any degradation with naked eye
Red means degraded and can be seen with naked eye

In terms of VRS more Green and Yellow is good. Blue is likely not possible without some form of resolution advantage.

VRS Balanced vs Native



VRS Performance vs Native




Conclusion
This is a crude comparison method and may require additional tweaking over time or tolerance changes, but my examination of it has been that this is fairly in line with looking over an image without needing to zoom in 400% to see differentials.
I disagree about your rating with the green and yellow squares. If it has a lower perceptible resolution just state it as such (same if it looks slighty sharper). Maybe you should use percentage compared to native resolution so people would be able to have their own opinion. Like green = 95% of perceptible resolution vs native.

In my experience we can almost always say if it looks sharper or blurrier and it should almost always looks softer with VRS (except when some parts are impacted by dynamic lighting and such).

Because here you are simply repeating the manufacturer PR (who want to sell their product), that VRS gives you only upsides, no downsides, but it is not the case. Saying 95% of native res is as good as native res is an opinion (some like blur in games) and I would say many would notice the difference even using 100% (without zoom) pics.

Particularly when we see how little performance we gain with the balanced and quality settings of VRS. If you gain 5% performance but you loose 5% of perceivable resolution, I'd say overall there is no upside. It's just a trade-off.
 
I disagree about your rating with the green and yellow squares. If it has a lower perceptible resolution just state it as such (same if it looks slighty sharper). Maybe you should use percentage compared to native resolution so people would be able to have their own opinion. Like green = 95% of perceptible resolution vs native.
I'm pretty sure I've provided the whole picture however. It's showing degradation across the image in yellow and red tiles. VRS doesn't impact the whole image, the filter the developer uses will determine if there is too much detail there that it must use 1x1 shading rate, and if there is less detail there if it can apply 1x2, 2x1, or 2x2. Which is more or less what my filter is attempting to do.

I'm working with 20*log(value) and the mean of those values within a tile. So % wouldn't work because a difference of 2 is dramatic, I'm bucketing variation at 0.35 for instance so percentage is unfortunately out of reach.
Most of these IQ tests are really user A/B tests. I know it sounds stupid, but, that's generally how image quality assessments work, you gather a bunch of strangers and ask them what they think the quality of the image is out of 5 or something vs the reference image.

I know it's sounds fluffy and non-empirical but measuring IQ at a level that is not perceptible isn't necessarily fair either; it may be tempting to treat it like hearing, as though something is providing more frequency response. But even hearing tests are done in buckets of 1000Hz. If we bucket to 1Hz frequency changes, you're asking a user if they can hear the difference between 11001 and 11005 Hz. This is just plainly the same thing. They can't possibly hear the difference until enough range has been covered.

But the second reason I can't do 95% of native is because I still have to account for noise. This isn't a pure difference subtraction between 2 images. So I need to make larger bands to accommodate there. And not to mention difference in pixel setups, the 2 images are never 100% pixel aligned. The thing is, even with my own eye, green is very well likely the same between native and VRS. But because we're working with spectrum and alignment issues you'll end upon with different numbers due to the way the tiles are cutting difference pixels for comparisons.

I have to go back and change the original comparisons of PS5 and XSX because of this. It's unfair to just use a < or > symbol to determine which tile is better. There needs to be a bucket or threshold, and if both land in the same bucket, I need to mark it as such. The goal is to move the discussion to IQ and away from resolution. This should provide additional wins here for PS5, which is ideal imo, resolution is not the only indicator of IQ.

** Doom Eternal updated**
 
Last edited:
I have a question (maybe I missed the explanation) -- what drives the size of the tiles in your figures? My intuition says this tile size is too coarse to show the whole picture -- I would expect certain severe artifacts (such as 4x4 stairstepping at the edge between a vrs and non-vrs object) by intuition, I'm wondering if your tool could output at a finer res to confirm or deny that.
 
I have a question (maybe I missed the explanation) -- what drives the size of the tiles in your figures? My intuition says this tile size is too coarse to show the whole picture -- I would expect certain severe artifacts (such as 4x4 stairstepping at the edge between a vrs and non-vrs object) by intuition, I'm wondering if your tool could output at a finer res to confirm or deny that.
I drive the tile sizes. I have to manually select a tile size that should be representative of an area. I generally do this by looking at the alignment of the images and what is on display (volumetric fog, particle effects, stuff that shouldn't be there). If there is a lot of difference between the two pictures the tile sizes need to increase, if there is no variation, we can reduce the tile sizes very small (but still not get that small).

If the tiles get incredibly small, you're basically subtracting 2 pixels at the lowest level, and obviously any dynamic noise, compression etc, will showcase large differences, so that's why we don't do an absolute difference. I can take the average of the entire spectrum and come up with a final number. But that final number won't mean anything to us, it's just a number. The tile colouring here are to visualize the differences in IQ.
 
I drive the tile sizes. I have to manually select a tile size that should be representative of an area. I generally do this by looking at the alignment of the images and what is on display (volumetric fog, particle effects, stuff that shouldn't be there).

Makes sense. Hopefully a game comes around where you can turn all of that stuff off with a config file or something and get a super clean side by side test. (I guess that same stuff would probably drive vrs results though so it'll never happen)
 
Makes sense. Hopefully a game comes around where you can turn all of that stuff off with a config file or something and get a super clean side by side test.
Even in that scenario, the aim here is to use feedback from here to progress an methodology that has to work with compression issues as well as image alignment and deviation issues. Ultimately the consoles and PC will not render equally, nor will DF or others be able to make 100% alignment on capture, and the captures will all be encoded and compressed. The goal here is to ensure we can have an algorithm that works with those challenges in place.

Otherwise, we should just subtract pixels.
 
I wonder what happens if you throw DLSS into the mix, I got some from DOOM Eternal with RT enabled:

Scene 1: DLSS Performance, DLSS Quality, Native

Scene 2: DLSS Performance, DLSS Quality, Native

There's some variance because of the environment but I tried to match them as close as possible.

I'd like to see the same, but not from static screens. Swinging the camera around with motion blur off should do the trick.

Same with any temporal accumulation upscaling techniques.
 
4.0 Doom Eternal DLSS Analysis

Thanks to @Clukos for providing some stills. Here I provide a small crude analysis of DLSS performance and quality vs native.

Original Captures Scene 1 (Native | Performance | Quality)



Original Captures Scene 2 (Native | Performance | Quality)



DCT Analysis
All DCT spectrums show the unique 4 star Doom Eternal pattern that we have come to see in earlier analysis. All spectrums showcase a full 4K, there is no indication here of a reconstruction to small frame upscaled to a larger one. To note, all DLSS spectrums show additional sharpening over native; this can be seen by looking at the length and thickness of the discrete lines shown in the DCT spectrum vs native.

Scene 1 (Native | Performance | Quality)
doometernal-dlss-s1-quality-dct.png


Scene 2 (Native | Performance | Quality)



FFT Analysis
Alignment of the images were very well done allowing us to use smaller tiles to and average over a smaller area. The tile settings used here are 45*80 (16*9)*5. The filter settings for removal of low noise is set to 45.

Quality DLSS tends to have less degraded values over Performance DLSS as expected. The majority of the wins tends to be around distinct edges which may possibly be byproduct of additional sharpening being done granting DLSS the win over native. The red squares over textures indicating that details are lost here however. Edge performance improves as you move from performance to quality modes in DLSS as indicated by the additional blue tiles in quality mode in scene 2.

In Scene 1 there is a lightning bolt pattern in native that does not exist in the DLSS comparisons. These are highlighted by the bright red tile streak on the right side of the pillar. Otherwise the it would appear that visibility is better during DLSS in this area, possibly owed to less volumetric fog, or whether DLSS cleans the image through fog better. I am unsure of this.

Legend
Blue - Details are greater than native
Green - Approximately the same as native
Yellow - Slightly degraded
Red - Heavily degraded

Native (Reference) vs DLSS Performance Scene 1 | Native vs DLSS Quality Scene 1



Native (Reference) vs DLSS Performance Scene 2 | Native vs DLSS Quality Scene 2

 
Last edited:
Nice visualisation. These techniques you're working on are incredible and take away the subjective, making it a quantifable analysis.

Are you easily able to derive percentage values of the colours for each screenshot?

Edit: the second screenshot appears to show DLSS increasing perceived resolution on straight lines. Exactly the usual places you'd try and pixel count. I wonder how intentional that is.
 
Last edited by a moderator:
Nice visualisation. These techniques you're working on are incredible and take away the subjective, making it an quantifable analysis.

Are you easily able to derive percentage values for the colours for each screenshot?
No, perhaps if I had a major in math this might be derivable ;) Fourier transform is an incredible formula that uses just about every single operation we know in math including all domains.
  • it uses euler's number e
  • it uses multiplication
  • it uses exponentials
  • it uses the complex numbers space as well as the real number space
  • it uses sin, cos, and pi

Sorry guys, I don't want to say it's impossible, but it's impossible asking me to do this. lol. I can provide the raw number for those tiles, but they aren't interpretable since it's a mean value of whatever is remaining after low frequency removal. If I just did
(Native - DLSS) / DLSS

or

DLSS / Native

or some version of that, it would likely be wrong as a percentage if that is what is being asked. I would be hesitant to provide those numbers because they would ultimately be worthless but people may be tempted to use them as a debate. If I take the mean of the whole screen shot and compare them with each other, this will probably be closer to what people desire, but it cannot be visualized; subjectively you could say 34.56 is better than a mean of 32.23 but it doesn't mean anything except to say one is better and it will provide you the differential you're looking for and as a metric be able to explain in a broad term that 1 image has more clarity than another.

If you want to see how DLSS is affecting what you're seeing compared to native, then you need to visualize it like this, which means we need to account for eye acuity, ie. we need to line up people and ask if they can see a difference in quality or not. And how much they can see.
 
Last edited:
The pictures are a visualisation of that same data though, right? As a sample size, the data you're presenting here are likely a very accurate breakdown of proportional differences between native and DLSS.

What you're showing here is really incredible and definitely an insight into what Nvidia are doing with their technology. Especially the areas coloured blue.

It looks to me that the software concentrates on areas usually pixel counted (or aliased).
 
The pictures are a visualisation of that same data though, right? As a sample size, the data you're presenting here are likely a very accurate breakdown of proportional differences between native and DLSS.

What you're showing here is really incredible and definitely an insight into what Nvidia are doing with their technology. Especially the areas coloured blue.

It looks to me that the software concentrates on areas usually pixel counted (or aliased).
The tiles are bucketed in specific ranges relative to each other. But yes, it's the visualization of the same data.
I didn't think it would be fair that we could say 5.239823293 vs 5.22828283 is a win for instance. And I couldn't represent that as a percentage (of a notable value of representation). But it is something I would declare green because the difference is so small; they are so close and this could owe to a variety of factors like noise, alignment issues, fog etc.

To get a blue square you would have to end up with a negative difference greater than 0.15. So it couldn't just be slightly better due to noise, it had to be a bucket larger than that. So this would be 5.00 vs 5.15+
Yellow squares is a bigger drop off a difference of 0.35. So this would be 5.239823293 vs 4.85 ish
Red is a large drop off. Essentially anytime we get a differential of 2 or more, it's red. This would be like 5 vs 3.

lol, I suspect this will be a consistent ask;
but do you want me to provide the mean of the screenshots?
 
Last edited:
No, perhaps if I had a major in math this might be derivable ;) Fourier transform is an incredible formula that uses just about every single operation we know in math including all domains.
  • it uses euler's number e
  • it uses multiplication
  • it uses exponentials
  • it uses the complex numbers space as well as the real number space
  • it uses sin, cos, and pi

Sorry guys, I don't want to say it's impossible, but it's impossible asking me to do this. lol. I can provide the raw number for those tiles, but they aren't interpretable since it's a mean value of whatever is remaining after low frequency removal. If I just did
(Native - DLSS) / DLSS

or

DLSS / Native

or some version of that, it would likely be wrong as a percentage if that is what is being asked. I would be hesitant to provide those numbers because they would ultimately be worthless but people may be tempted to use them as a debate. If I take the mean of the whole screen shot and compare them with each other, this will probably be closer to what people desire, but it cannot be visualized; subjectively you could say 34.56 is better than a mean of 32.23 but it doesn't mean anything except to say one is better and it will provide you the differential you're looking for.

If you want to see how DLSS is affecting what you're seeing compared to native, then you need to visualize it like this, which means we need to account for eye acuity, ie. we need to line up people and ask if they can see a difference in quality or not. And how much they can see.
But it is not a fair comparison. Because manufacturers (who want to sell their products with RDNA2 GPUs) are touting 5 to 15% performance improvements (well they actually say 15% when it should be more like 5-15% but whatever) as impressive or great stuff (so they are using objective data). But then they (and you) are using subjective (marketing ready) adjectives to completely brush aside any minimal loss of perceivable resolution because 5-10% loss is insignificant in the whole game experience. Which it is. But that's not the problem.

The problem is why should 10% of resolution loss be judged insignificant (so we should not even write down the numbers cause it could manipulate the candid souls) but 10% of performance improvement should be touted on all roofs in order to sell the product to those same naive souls?
 
But it is not a fair comparison. Because manufacturers (who want to sell their products with RDNA2 GPUs) are touting 5 to 15% performance improvements (well they actually say 15% when it should be more like 5-15% but whatever) as impressive or great stuff (so they are using objective data). But then they (and you) are using subjective (marketing ready) adjectives to completely brush aside any minimal loss of perceivable resolution because 5-10% loss is insignificant in the whole game experience. Which it is. But that's not the problem.

The problem is why should 10% of resolution loss be judged insignificant (so we should not even write down the numbers cause it could manipulate the candid souls) but 10% of performance improvement should be touted on all roofs in order to sell the product to those same naive souls?
Performance and Image Quality are 2 completely separate metrics. And it's important to maintain that separation even though GPU performance is likely to drive IQ. They aren't measuring the same thing. Example given: if I put an 8K photograph vs 4K photograph, what would the observer score it as? And there are so many things here that could affect the photograph like artifacts, noise, focus, etc. They could give it a subjective 4.8 and give the 4K one 4.7. But they have no idea that the rendering power to go from 4K to 8K is 2x. Would it be fair to suggest that 4K is 0.1/4.8 or 98.8% of 8K? That's not technically right either. We haven't even taken into account frame rate here, since we're just looking static images.

Image quality assessment is about measuring something as close to our human eyes as possible ie; we have always accomplished image quality assessment using human observers. Therefore image quality has always been a highly subjective topic.

The goal of this spectrum analysis is to quantify or emulate the likelihood of a human observer being able to notice a difference; or in other words, the goal is to represent, qualitatively, human's perception of what quality is.

There is some light reading here on the topic:
https://towardsdatascience.com/deep-image-quality-assessment-30ad71641fac

The goal here for me is to encourage the positive reinforcement and research and development of upscaling/reconstruction techniques and novel methods to obtaining higher IQ scores using less GPU performance. We can't do that if I link IQ scores to GPU Performance.

imo, The public will unfairly hamstring developers for using such techniques in favour of representing GPU/console performance, when ultimately our eyes do the interpretation for image quality. If developers weren't burdened with chasing resolution to appease public audiences, perhaps we'd see better use of their resources in actually improving graphics/framerate performance for titles.

That being said, I've noted your criticism, I just don't know yet how to address it. The answer may come in time, either through more discussion or debate, but at the moment, I have no ideas to address the relationship between IQ to GPU performance.

A lower resolution with 16xAF may yield better results for instance than higher resolution with much lower AF. So would that be unfair? I can't answer what I don't know. Human observation is still critical component to IQ. Higher resolution textures can generate more details, further LOD will generate more details. Those GPU differences can go into anywhere, and if someone wants to score higher on this metric, they would change the settings to increase details in each tile as much as possible and resolution may not be the best way to do necessarily once a specific resolution threshold is hit.

TLDR; You can go back 20-30 years of benchmarking. They have all addressed the technical element of things and these technical journalists are very adept at this. I am addressing the human element of things.
 
Last edited:
The problem is why should 10% of resolution loss be judged insignificant (so we should not even write down the numbers cause it could manipulate the candid souls) but 10% of performance improvement should be touted on all roofs in order to sell the product to those same naive souls?

10% resolution loss (your numbers) of select areas of the screen, not the whole screen.

Versus.

Either X% reduction in resolution across the entire screen (DRS or just a full permanent reduction in overall resolution) or X% performance loss (no DRS).

Sure, alternatively the user (PC) or developer (consoles) can choose whether they want to increase overall performance instead of resolution. Gears 5 actually has options for whether you want to use DRS or static resolution.

So, this is basically what VRS allows:
  • Increased framerate with select areas having lower accuracy/quality/resolution. Good developers will place this where the quality loss is virtually imperceptible during gameplay.
  • Increased overall resolution with select areas having lower accuracy/quality/resolution. Again caveat of good developers.
  • Bad developer. Framerate capped below where VRS provides a performance increase and extra performance gain isn't used for resolution OR framerate. Basically, VRS for no benefit.
The alternative without VRS is:
  • Without DRS, lower overall resolution or lower performance. Your choice.
  • With DRS, larger resolution drops, may or may not have similar peak resolution.
There is no free lunch with or without VRS. Thankfully, on PC, you can choose to enable or disable the effect.
  • People that want a lower overall resolution or lower performance can choose to turn VRS off. Native Rendering without VRS.
  • People that want higher resolution or higher performance while knowing that some areas of the screen won't be shaded at that higher resolution can turn it on. Rendering with VRS.
Also, did you also complain as much about checkerboard rendering on the PlayStation consoles last generation? Because that introduces way more reductions in quality across the entire screen than VRS does. :p Sure the greater quality loss = greater resolution increase versus VRS, but unlike VRS the quality loss is uncontrollable and applied to the entire screen. So, much like VRS it's an overall benefit with some trade-offs, with the trade-offs for checkerboard rendering being more easily seen than the quality loss in select areas of the screen that VRS introduces when implemented well.

Regards,
SB
 
Last edited:
Back
Top