B3D IQ Spectrum Analysis Thread [2021-07]

iroboto

Daft Funk
Legend
Supporter
Requisitions for Images! (should be in .png)
1. DLSS comparison shots: variety of scenes and modes and titles vs native
2. FRS comparison shots: variety of scenes (games other than riftbreaker) vs native

Analysis Hot Links
1.1 Doom Eternal RT PS5 vs XSX
1.2 Doom Eternal Balanced PS5 vs XSX
1.3 Doom Eternal Balanced 2 PS5 vs XSX
1.4 Doom Eternal RT 2 PS5 vs XSX
2.0 Gears 5 VRS
3.0 GDC Alpha Point Marketing Material Analysis
4.0 Doom Eternal DLSS Analysis

Preface

I would like to present a basic analysis of image quality of games as we continue to enter a period of graphics where it is critical to save processing power while keeping image quality up to match native TV resolutions.

In this case, I have been looking at and working with some newer basic tools to look at resolution, AF, VRS, and upscaling technologies to show gamers what they are getting back from the graphics in terms of useable visual feedback. It is a basic analysis of course, full of holes, easily exploitable etc. But it is certainly an improvement over having no tools to use at all.

Traditionally the industry has leveraged resolution and framerate as being the single most important metrics for image quality (one of static resolution the other motion resolution), but current techniques with upscaling, DRS, DLSS, FSR, and VRS can now alter the image in ways that static resolution as a metric alone can no longer represent successfully. We are now in a stage where we must examine image quality, and to do so, we must break static images down to its finer parts and examine them tile by tile.

This basic analysis covers Fourier and Discrete cosine analysis algorithms to separate edge quality while reducing as much noise as possible. Where there is more discrete detail available, the algorithms award them with a higher metric. Where there is less discrete detail (Vaseline effect) they are awarded less. The system is imperfect for obvious reasons, volumetric fog and other effects like this would move to lower scores not to improve them even though they are more taxing to GPU and higher quality versions of this is more desirable not less. Conversely, over sharpening your image would grant the image benefits as the current setup is not designed to take this into account (a butterworth filter would be required). UI and other elements are not removed, so this influences the analysis, it was not desirable to just have the average of the frame alone as a single point representing what the user will receive.

The Fourier transform will cover the edge and frequency analysis, the bulk of the IQ analysis. We transform the image into a frequency amplitude domain, and remove the low frequency data from the image, leaving only high frequency detail remaining. We then average the image block from here, images with more higher frequency data will result in higher scores. Higher frequency data is data you can visually see a difference in, so in this case, a transition from black to black has no frequency, but black to white is high frequency. This is a basic method of doing edge detection (as there are many methods) to see what the eye can perceive in clusters of pixels.

With Fourier transform we can get an idea of what is happening with the picture as we look for uniformity in the spectrum. Below is a picture of native rendering. The center of the picture represents low frequency data points in the image, and the 4 corners represent high frequency. Here we can see a uniformity of overall brightness happening across the image.


Source image for Riftbreaker courtesy of @cho
https://forum.beyond3d.com/posts/2211513/


When we move to TAA, we can see that in the 4 corners are darker in appearance, indicating a lack of high frequency detail during its reconstruction to 4K from 1440p. This example shot of Plague’s Tale shows this below.


Source Image courtesy of VG Tech (A Plague Tale Innocence PS5 vs Xbox Series X|S Frame Rate Comparison - YouTube)


When we look at FSR, we see that the corners and edges of the spectrum are filled in, the middle is filled in, but there is still a similar dark halo in between the low and high frequency. What is interesting is that at the poles N-S-E-W, there is additional values filled in there in the form of lines. The algorithm appears to be applying some form of additional sharpening here compared to the native image.



Source image for Riftbreaker courtesy of @cho
https://forum.beyond3d.com/posts/2211513/

The Discrete Cosine transform analysis is based on a paper to detect deep fakes. Using DCT, one can identify if there are upscaling artifacts in the image, and these upscaling artifacts are easily visible. Once detected, we know somewhere on the image upscaling was leveraged, and this is useful to separate deep fakes from real movies/images as we move forward into the future when ML generated deep fakes. Coincidentally, some items I discovered while looking at this analysis is that at times, the artifacts provide enough image to determine the resolution of what the image was before it was upscaled, as non-scaled images have uniformity in DCT. This is remarkably effective for pixel counting without having to do it manually, however it is not applicable to all upscaling methods, and I continue to do more research in this area to find a way to perform automatic resolution counting for upscaled images.

An example listed below is a scaled 1080p image of Dark Souls 3 upscaled to 1080p from 900p on XSS.

Source Image courtesy of VG Tech (Dark Souls 3 Xbox Series S Frame Rate Test (FPS Boost | Backwards Compatibility) - YouTube)



Another example Scaled 4K image of Doom Eternal upscaled to 4K, the arrow points to 83% of full frame, which is approximately 1800p at 4K.

Source Image courtesy of VG Tech: (Doom Eternal PS5 vs Xbox Series X|S Frame Rate Comparison - YouTube)

Of the two algorithms, DCT is less consistent, thus it is critical to pixel count to ensure these values are accurate. At least until we can ensure these artifacts are representative of what I believe them to be, and that may change for each title.


Conclusion:

If you have any feedback, ideas, criticisms of my methodology, please reply to these posts. If you have questions about the analysis (posts below), please reply to those posts directly. This is far from perfect of course, but it is meant to provide some ideas of what is happening behind the scenes that you may not be able to perceive and provide an inner look into what is happening when these images are reconstruction, shaded at different rates and what not.

Finally, these analyses are fun distractions from my everyday life. There are other projects I work on (should be working on), but sometimes irresistible to do these things from time to time, especially when the results have controversy when being discussed in other threads. But recent encouragement to see this project progress and interest in how it works has been nudging me to release at least the bare bits of how it operates. The project is far from complete and requires a significant amount of pressure testing for it to be of anything significant. A lot still must be learned in this whole process, and I haven’t found a lot of resources to assist here, any help is always appreciated.


Future Work:
Working on breaking the tiles down and bucketing the tiles in clarity buckets. This way we no longer need to analyze native to reconstruction, or native and VRS. We can just look at how clear each tile is and can easily analyze two various configurations but look at the difference in tile buckets. Also looking into using some of this for full movies, but processing time is much too slow. 60fps and above right now is murder without a dedicated CUDA script to do this from start to finish.
 
Last edited:

iroboto

Daft Funk
Legend
Supporter
Analysis 1: Doom Eternal XSX vs PS5 VRS + DRS vs DRS

In this analysis I look at near identical images taken from VGTech: Doom Eternal PS5 vs Xbox Series X|S Frame Rate Comparison - YouTube

1.1 PS5 RT vs XSX RT

The most important item we need to address here is baseline the internal rendering resolutions of the images so that we get an idea of what we are working with. We will begin with a DCT analysis to see what differences there are in this image that our eyes cannot spot. I am not particularly great at pixel counting, and such, I leave it to other readers to provide the actual pixel count from pixel count methods. My DCT analysis will provide its own pixel count reading here.

Left: PS5 DCT Spectrum | Right: XSX DCT Spectrum


From what I can see in the DCT, both internal rendering resolutions are approximately the same. What is interesting is that that upscaling artifacts are a different pattern, suggesting that not everything is equal in terms of rendering here. I am not sure entirely if this is a VRS thing or other as it is impossible to tell looking at spectrum to determine what is happening on the image; but the artifacts are indeed different suggesting there are some differences that may not be accounted for.

Proceeding to the image quality section, I break identical images up into tiles, perform edge/detail detection per tile and compare them against an equivalent tile. Doing this I can determine which part of the image has higher IQ between two various versions of the same screenshot.

PS5 Fourier Transform



XSX Fourier Transform



Thought it may not be obvious zoomed out, if you look at the PS5 spectrum here compared to XSX, you see the lined artifacts crossing the entire spectrum indicating some form of different reconstruction/sharpening method.

Using the Fourier transform I will tile the images and compare the exact tiles like for like to account for detail the eye can perceive. Once again, this is a flawed analysis as it does not remove certain conditions, but you will be able to visualize what the tool is comparing as opposed to a single metric number to represent the image quality in full.

FFT Comparison (Left: Original | Right: FFT Comparison Updated (July 14))
Updated:

In this comparison I make PS5 the reference image and XSX the challenger image. The reason here is that PS5 is considerably more static owed to a lack of VRS, and so we can conclude that it is running 1x1 shading rate across the image.

Legend
1. Blue - XSX has better quality for the tile
2. Green - PS5 and XSX have the same quality
3. Yellow - XSX is slightly degraded and could be visible to naked eye
3. Red - XSX is significantly degraded and is visible to the naked eye
doometernal-xsx-rt-s1.png
 
Last edited:

iroboto

Daft Funk
Legend
Supporter
1.2 Doom Eternal PS5 Balanced vs XSX Balanced
Sourced Images: Doom Eternal PS5 vs Xbox Series X|S Frame Rate Comparison - YouTube

Starting with DCT we can use the artifacts to determine the size of the original image. I have marked the 4 star artifacts on PS5 image, and again in the XSX image. You can see in the position of the XSX graph that XSX is running a full 4K frame while PS5 is running 90% of 4K here, or about 1944p.

Left: PS5 DCT Spectrum | Right: XSX DCT Spectrum


To determine further what the cross-stitch effect was, I attempted to oversharpen the XSX image, however the effect was not the same. I will need to conduct more experimentation to determine how that cross stitch was made, but likely there is some setting causing that artifact formation, I do not believe this is an upscaling artifact as we can see PS5 hit a full frame in one DCT analysis and it continues to have this cross-stitch. TDLR; It would be a needle in a haystack to be able to apply a filter that would be able to create that type of artifact in post or someone would have to have knowledge of how to create that effect after staring at enough DCT analysis. Any ideas here would be appreciated.

The Fourier analysis confirms what we see in the DCT analysis. The resolution difference and uniformity of the image to a full frame grants the XSX a win in every tile. Overall, the metrics are not far apart however. It would appear the edge strength for PS5 is strong here in competing with the XSX.

FFT Comparison (Original Left | FFT Comparison Right)
Updated:

In this comparison I make PS5 the reference image and XSX the challenger image. The reason here is that PS5 is considerably more static owed to a lack of VRS, and so we can conclude that it is running 1x1 shading rate across the image.

Legend
1. Blue - XSX has better quality for the tile
2. Green - PS5 and XSX have the same quality
3. Yellow - XSX is slightly degraded and could be visible to naked eye
3. Red - XSX is significantly degraded and is visible to the naked eye
 
Last edited:

iroboto

Daft Funk
Legend
Supporter
1.3 PS5 Balanced vs XSX Balanced 2

Sourced Images: Doom Eternal PS5 vs Xbox Series X|S Frame Rate Comparison - YouTube


DCT analysis here shows both XSX and PS5 running full frame at 4K. PS5 continues to show the cross-stitch artifact though less of it now as it comes into the spectrum later in the image. XSX shows nothing of note.

Left: PS5 DCT Analysis | Right: XSX DCT Analysis


The Fourier analysis grants a larger win to PS5 here. Both platforms being the same resolution, VRS is playing a role in degrading the quality of XSX compared to a native PS5. This leads me to believe that VRS is not running necessarily in tandem with its DRS system and may have its own trigger points on how much VRS is applied independently of frame rate/resolution performance.

Updated:
In this comparison I make PS5 the reference image and XSX the challenger image. The reason here is that PS5 is considerably more static owed to a lack of VRS, and so we can conclude that it is running 1x1 shading rate across the image. Volumetric fog in this level may be a penalty, so increasing the tile size will mitigate some of the inaccuracies here. I have left the tile size as is to showcase how sensitive the filter can be. Increasing the tile size will shift it slight more red into yellow and yellow into green.

FFT Comparison (Original Left | FFT Comparison Right)
Legend

1. Blue - XSX has better quality for the tile
2. Green - PS5 and XSX have the same quality
3. Yellow - XSX is slightly degraded and could be visible to naked eye
3. Red - XSX is significantly degraded and is visible to the naked eye
 
Last edited:

iroboto

Daft Funk
Legend
Supporter
1.4 Doom Eternal RT Analysis 2 PS5 and XSX
Sourced Images: Doom Eternal PS5 vs Xbox Series X|S Frame Rate Comparison - YouTube

DCT analysis I left unmarked so that readers could get an idea of what the spectrum looks like before I mark it up. Of note, this is the first comparison shot where the PS5 does not exhibit the cross-stitch oni the spectrum. I do have to wonder if this is a setting that may have been enabled/disabled at one point in time in capturing or not. I do not know. You can see the difference in position between PS5 and XSX on their artifacts. To illustrate this further I did a differential between the 2 images, here you can clearly see their artifacts displaced with respect to each other. In this differential it is calculated to be 1270x715 of the 1640x922px giving PS5 approximately 77.4% of 4K or an internal rendering resolution of 1671p. The XSX measures to 1369x770 of the 1640x922 giving XSX approximately 83.4% of 4K or internal rendering resolution of 1800p.

DCT XSX - PS5 Differential (PS5 DCT Analysis | XSX DCT Analysis)


Fourier analysis shows the XSX winning many of the tiles here likely owed to the higher internal rendering resolution. This screen shot has a lot of dynamic noise in terms of lines which may be skewing the results between the two, as an effect everything right of the gun is likely just random which platform would score better as a result of having rain present in the tile or not.

FFT Comparison (Left: Original | Right: FFT Comparison)
Updated:

In this comparison I make PS5 the reference image and XSX the challenger image. The reason here is that PS5 is considerably more static owed to a lack of VRS, and so we can conclude that it is running 1x1 shading rate across the image. This image is incredibly noisy so it was difficult to get a hold of good tile averaging. As the image separates in alignment I will increase the size of the tile to average over a larger area to mitigate the inaccuracy of misalignment. For now I left it as is, but for this particular image, I should be increasing tile size by 2x to 4x to account for the random variation in rain. For now I've left it as is, increasing tile size to 4x will result in a complete blue picture.

Legend

1. Blue - XSX has better quality for the tile
2. Green - PS5 and XSX have the same quality
3. Yellow - XSX is slightly degraded and could be visible to naked eye
3. Red - XSX is significantly degraded and is visible to the naked eye

 
Last edited:

iroboto

Daft Funk
Legend
Supporter
Gears 5 VRS Analysis

*I'll need to come back and fix the way the images are shown, pngs are transparent and making it really messed up. It's hard to see the overlay I think.

Original Images Courtesy of a Neogaf user(will come back later to get URL)
Native | Balanced | Performance


VRS Comparisons

Legend

Blue square means VRS outperformed the native image
Green square means identical or imperceptible difference from native
Yellow means slightly degraded, difficulty to spot any degradation with naked eye
Red means degraded and can be seen with naked eye

In terms of VRS more Green and Yellow is good. Blue is likely not possible without some form of resolution advantage in this particular comparison as this is the exact same configuration.

VRS Balanced vs Native



VRS Performance vs Native




Conclusion
This is a crude comparison method and may require additional tweaking over time or tolerance changes, but my examination of it has been that this is fairly in line with looking over an image without needing to zoom in 400% to see differentials.
 
Last edited:
It's a neat idea, and I love it as a beginning. Though it doesn't really take any art direction into account, or temporal information. How stable is the image between from frame to frame, and how would you even measure that like this? What about depth of field, do they apply chromatic abberation, is the art direction supposed to be sharp, or is it supposed to be smooth? And really, wouldn't the standard be to compare PSNR from a "reference" render, something like a supersampled frame then compare everything to that reference?

I don't know, I'm just typing out loud, I love the idea and the effort.
 

zed

Legend
Nice one mate, though I hope you're looking at getting your images later from elsewhere than youtube with its compression
 

Globalisateur

Globby
Veteran
Supporter
This is exactly what this forum needed. It was about time someone analyses perceptual resolution instead of geometrical resolution. I applaud your effort and the time you spend in that work.

Now as you already noted, the main problem with that medhod will be dynamic weather effects (like lighting difference, smoke and rain). But yes it's very interesting to see PS5 winning when both resolutions is the same and trading blows (with some parts of the image sharper) even when XSX is "officially" outputting at 16% higher res.

It's also interesting to see a good mean to find approximate resolutions. Thanks to this automated process, you would be able to give us an average resolution and average perceptual resolution for a whole level. :yep2:
 

iroboto

Daft Funk
Legend
Supporter
It's a neat idea, and I love it as a beginning. Though it doesn't really take any art direction into account, or temporal information. How stable is the image between from frame to frame, and how would you even measure that like this? What about depth of field, do they apply chromatic abberation, is the art direction supposed to be sharp, or is it supposed to be smooth? And really, wouldn't the standard be to compare PSNR from a "reference" render, something like a supersampled frame then compare everything to that reference?

I don't know, I'm just typing out loud, I love the idea and the effort.
Indeed, I’ve toiled with many of these questions: SSIM was an algorithm that could likely solve a lot of those issues as you compare something to reference you can get an idea of deviation. I may try this in tile Format. Typically it’s used to for movie or image compressions, comparing the uncompressed vs compressed. So it’s very sensitive compared to the spectrum analysis where I remove generally that type of noise. So with respect to the setups, SSIM will both require PNGs or near 0 compression on everything that is compared. This will give us accurate results on what is shown on screen. But typically with the way DF does it's console comparisons, they are using a device to pull the screens from a capture device that will turn it into a movie and it becomes compressed. SSIM won't be useful there as a method because it's going to pick up on the differences in compression as well.

The hardest part about it is lining up two perfect stills/movies and ensuring that one of these images is actually a perfect reference.

Perhaps I can run that for the FSR and VRS analysis for PC. Since I have stills that range from reference to various levels of performance. But yea great ideas, I will think more on this.
 
Last edited:

iroboto

Daft Funk
Legend
Supporter
It's also interesting to see a good mean to find approximate resolutions. Thanks to this automated process, you would be able to give us an average resolution and average perceptual resolution for a whole level
I think it's really lucky it worked on Doom Eternal. I can't say it will work every time, it just comes down to how they upscaled it.
 
D

Deleted member 86764

Guest
Would be interested in seeing analysis of DLSS using your tools, especially to see if there are differences between static and dynamic images on the same hardware.
 

iroboto

Daft Funk
Legend
Supporter
Nice one mate, though I hope you're looking at getting your images later from elsewhere than youtube with its compression
yea =P I linked VG Tech's videos so that they get some hits and ad revenue. Inside their youtube descriptions you get a hot link to the pngs used for pixel counting and it links to their google drive. But as they are a big part of this community and it takes labour to line up shots like that, I wanted to ensure that they were receiving the hits on their youtube vids.

But also, the algorithm must be flexible enough to work with compression, as eventually if I move to movies, all movies are compressed.
 

iroboto

Daft Funk
Legend
Supporter
GDC Alpha Point Zero Analysis

I thought it would be fun to do this one since it just came out. I will see if there is VRS enabled here at a later time, but so far my spectrum analysis concludes that this is an upscaled image. It is approximately 86.7% of whatever the final frame buffer they wanted it to be. The image is outputted at 1080p, but I can't know if they downsampled this to showcase on the internet for twitter (likely). But I also can't say this was a 4K image originally. So to note, it is 86.7% of whatever the output buffer was.

The spectrums are extremely clean, so I don't think they used any reconstruction methods here, or at least I can't spot any evidence of upscale technique. I suspect that this image was rendered plainly and then upscaled using a basic method, or downsampled for promotional reasons.

Alpha Point Zero Original (July 20th! for full presentation)


DCT Analysis
Here I identify same basic upscaling artifacts indicated by the blue lines. It was very difficult to read the artifacts here as the lines were not solid, I needed to make guidelines to find them. I cannot determine the original size of the image as it may have been downscaled. Overall I determine this to be 87.6% internal rendering resolution of whatever they upscaled to. If this is a captured png (which then converted to jpg) using the xbox series x controller method, then I suspect the output would be 4K downsampled to 1080p. If so this image is rendering at 1892p.


FFT Analysis
Very clean output here, as you can see the corners of this output it darkens, indicating a lack of detail in reproduction of high frequency detail. This is typical for an image being upscaled with a simple upscale algorithm
 
Last edited:

zed

Legend
yea =P I linked VG Tech's videos so that they get some hits and ad revenue. Inside their youtube descriptions you get a hot link to the pngs used for pixel counting and it links to their google drive. But as they are a big part of this community and it takes labour to line up shots like that, I wanted to ensure that they were receiving the hits on their youtube vids.

But also, the algorithm must be flexible enough to work with compression, as eventually if I move to movies, all movies are compressed.
Yeah you're a smart guy I assumed you werent using screenshots off youtube for testing.
Oh OK so there were uncompressed images nice.
I didnt realize it was hard to get in game shots, movies excepted, dont games normally have some way to record a clip, that then can be used as a benchmark, this is how it was many years ago.
 

iroboto

Daft Funk
Legend
Supporter
Yeah you're a smart guy I assumed you werent using screenshots off youtube for testing.
Oh OK so there were uncompressed images nice.
I didnt realize it was hard to get in game shots, movies excepted, dont games normally have some way to record a clip, that then can be used as a benchmark, this is how it was many years ago.
Yea, surprisingly very little material to work with for this type of comparison =P
The shots have to be exceptionally close in setup between 2 consoles; it takes an incredible amount of work of positioning, pointing at the same place, getting to the same place as you have to play the consoles separately since there are no save game transfers etc. cross progression may change this. On PC it's significantly easier.
 
Last edited:

zed

Legend
Blue is likely not possible without some form of resolution advantage.
you are comparing all at the same resolution though I take it
or is the balanced VRS shot taken at a higher resolution (its FPS is lower than VRS off)
 
Top