Intel XeSS anti-aliasing discussion

Comparing XeSS vs FSR2 vs DLSS2 in Death Stranding on a 3080, DLSS2 is the winner in IQ, followed by XeSS, then FSR2. Performance wise though, DLSS2 is 5% faster than FSR2, but 15% faster than XeSS! As XeSS is slower than even FSR2 on the RTX 3080!

 
Comparing XeSS vs FSR2 vs DLSS2 in Death Stranding on a 3080, DLSS2 is the winner in IQ, followed by XeSS, then FSR2. Performance wise though, DLSS2 is 5% faster than FSR2, but 15% faster than XeSS! As XeSS is slower than even FSR2 on the RTX 3080!

This is expected as it runs on shader units while being computationally expensive.
 
Ok, I grabbed the laptop, a Dell with a Core i5-1135G7 @ 2.40Ghz, 16 Gigs of RAM, and an Iris Xe Graphics using eight gigs of shared memory running Windows 10 20H2. Dell has not updated the driver as of today, so you will need to go to the Intel website to download driver version 31.0.101.3420 for games to recognize XeSS on the GPU. I downloaded Death Stranding and did not mess with the settings at all and left it a medium shadow quality. Here are my findings playing around with settings in Death Stranding.

The game does not perform well at 1080p medium shadow settings, using regular TAA. The game runs at 12FPS. While I could mess around and try to get it playable, I'm just looking to see the FPS difference, and there is one. At Ultra Quality, it performs like native, maybe one frame faster. At Quality, it performs about two frames faster than native. At Balance, it performs about two frames better than Quality. Performance, performs about two frames better than Balance. Out of all the Resolution scaling options available, FSR 1.0 has the best performance, followed by FSR 2.0 and then XeSS. XeSS has the best image quality of the Resolution scalers.
Here is a video with no commentary, just a plain video of me going through all the settings. Warning the video is of the game running in the teens also, since I ran into a problem and needed to encode using software, you should give each setting an extra two frames because software encoding seems to lower the framerate by two frames.
Be sure you are viewing in 1080p to see the framerate counter.
 
The game does not perform well at 1080p medium shadow settings, using regular TAA. The game runs at 12FPS. While I could mess around and try to get it playable, I'm just looking to see the FPS difference, and there is one. At Ultra Quality, it performs like native, maybe one frame faster. At Quality, it performs about two frames faster than native. At Balance, it performs about two frames better than Quality. Performance, performs about two frames better than Balance. Out of all the Resolution scaling options available, FSR 1.0 has the best performance, followed by FSR 2.0 and then XeSS. XeSS has the best image quality of the Resolution scalers.
Thanks.

Does it look better than native at all?
The way I see something like this being for the XSS is less about performance and more about IQ. Same performance with better IQ is a win.
And possibily benefits where something like bandwidth/rops may have been the issue, so lower resolution could help.

Order is as expected. How much faster was FSR 1 & 2, if its in the video sorry, watching on phone couldn't make out fps etc. (or much else)
 
Does it look better than native at all?
The way I see something like this being for the XSS is less about performance and more about IQ. Same performance with better IQ is a win.
Only in Ultra Quality would I say it was comparable everything else gets blurrier but not quite ready to make that call. A sharpening filter might be able to fix that. Sharpening is disabled by the way. Also could not combine FSR 1.0 and XeSS. I thought this GPU would be a great test to see how the XSS could perform but I think the jury is still out on the XSS. I know the game has no chance of ever being on the Xbox but I'm sure the XSS could have run the game at least at 30fps.
Order is as expected. How much faster was FSR 1 & 2, if its in the video sorry, watching on phone couldn't make out fps etc. (or much else)
FSR 2.0 did not run that much better and the balanced and Performance modes looked worse IMO at least for this game. FSR 1 did have a boost in both running and image quality running in Ultra Quality Mode but I think that has something to do with Sharpening.
 
A sharpening filter might be able to fix that.
I've seen this observation about XeSS a lot. Can't remember the title but saw someone say running XeSS & FSR1 for the sharpening made big difference.
You would think it would be an easy addition, just include AMDs CAS slider.
 
So both the DSOgaming and TPU article (linked in comments above) suggest that XeSS is doing a better job at anti-aliasing compared to FSR2, but I think the situation is more complex that than. That is because it is also seemingly widely noticed that XeSS is producing a "softer" or "smoother" image. If the entire image is "softer", that suggests XeSS is producing a lower resolution (resolving power, not # of pixels) and "blurrier" image and some of the better anti-aliasing may be due to "blurring" edges as a consequence of "blurring" the entire image. If you look at the screenshots in the DSOgaming article its seems pretty clear to me that FSR2 and DLSS show more detail, especially in the text on the backpack and the texture of the boulders.

This is a difficult property to quantify, but here is my attempt (putting into practice a method Ive been thinking about for awhile). Image entropy is a measure of how much "information" an image contains. You can think of this information as how unique and unpredictable the value of each pixel is compared to its neighbors. An image with complex high frequency detail like fully resolved foliage with complex lighting will have more entropy than one of a nearly uniform and smoothly varying blue sky. A 4k image that is produced by upscaling a 1080p image through repeating each pixel 4 times will have identical entropy to the 1080p image because the repeated pixels provide no new information. However, if an upscaler accumulates data over time it can recover genuine data about the scene thereby matching or even exceeding the entropy/information of the "native" image. I used the screenshots from the TPU article and calculated the image entropy for each of the rendering methods they used and these are the results, normalized to the entropy of the natively rendered 1080p image to aid interpretability:

results.png

The 4k native image contains 443 % more information than the natively rendered 1080p image and the 4k DLSS quality (444 %), FSR2 qualty (446 %), and XeSS ultra quality (444 %) are able to exceed that. As you would expect, the balanced and performance modes of all methods have relatively less information and do not reach that of the native 4k. Interestingly, and seemingly confirming my suspicion from above, XeSS quality mode is only able to match the amount of information that is recovered by DLSS performance and FSR2 balanced. Normalizing the amount of information to the 4k native image, it is like XeSS quality is producing a 2112p image. At 1440p all reconstruction methods are able to produce more information than native except FSR2 and XeSS performance modes and all DLSS modes produce more information than the best FSR2 and XeSS modes, putting numbers to many prior qualitative observations!

Its worth now mentioning that image entropy should not be taken as a measure of image quality. The highest entropy image possible at a given bit rate is that of pure white noise! I want to be clear that this is a preliminary analysis of exactly 1 scene in 1 game on 1 gpu with 1 graphics configuration and is not intended to imply broad, general, or in any way confident conclusions!
 
Latest DF Direct has a segment dedicated to Intel Arc XeSS vs. Non-Intel XeSS btw showing the qualitative differences.
Was informative as usual, and even with those differences it holds up well.
Intel have done a good job right out the gate with XeSS, any performance issues on any other card they can't be expected to take care of.
 
I personally w
So both the DSOgaming and TPU article (linked in comments above) suggest that XeSS is doing a better job at anti-aliasing compared to FSR2, but I think the situation is more complex that than. That is because it is also seemingly widely noticed that XeSS is producing a "softer" or "smoother" image. If the entire image is "softer", that suggests XeSS is producing a lower resolution (resolving power, not # of pixels) and "blurrier" image and some of the better anti-aliasing may be due to "blurring" edges as a consequence of "blurring" the entire image. If you look at the screenshots in the DSOgaming article its seems pretty clear to me that FSR2 and DLSS show more detail, especially in the text on the backpack and the texture of the boulders.

This is a difficult property to quantify, but here is my attempt (putting into practice a method Ive been thinking about for awhile). Image entropy is a measure of how much "information" an image contains. You can think of this information as how unique and unpredictable the value of each pixel is compared to its neighbors. An image with complex high frequency detail like fully resolved foliage with complex lighting will have more entropy than one of a nearly uniform and smoothly varying blue sky. A 4k image that is produced by upscaling a 1080p image through repeating each pixel 4 times will have identical entropy to the 1080p image because the repeated pixels provide no new information. However, if an upscaler accumulates data over time it can recover genuine data about the scene thereby matching or even exceeding the entropy/information of the "native" image. I used the screenshots from the TPU article and calculated the image entropy for each of the rendering methods they used and these are the results, normalized to the entropy of the natively rendered 1080p image to aid interpretability:

View attachment 7102

The 4k native image contains 443 % more information than the natively rendered 1080p image and the 4k DLSS quality (444 %), FSR2 qualty (446 %), and XeSS ultra quality (444 %) are able to exceed that. As you would expect, the balanced and performance modes of all methods have relatively less information and do not reach that of the native 4k. Interestingly, and seemingly confirming my suspicion from above, XeSS quality mode is only able to match the amount of information that is recovered by DLSS performance and FSR2 balanced. Normalizing the amount of information to the 4k native image, it is like XeSS quality is producing a 2112p image. At 1440p all reconstruction methods are able to produce more information than native except FSR2 and XeSS performance modes and all DLSS modes produce more information than the best FSR2 and XeSS modes, putting numbers to many prior qualitative observations!

Its worth now mentioning that image entropy should not be taken as a measure of image quality. The highest entropy image possible at a given bit rate is that of pure white noise! I want to be clear that this is a preliminary analysis of exactly 1 scene in 1 game on 1 gpu with 1 graphics configuration and is not intended to imply broad, general, or in any way confident conclusions!
I think this is partially due to the manual mip bias option in dlss 2 and fsr 2 whereas xess trains the network to get the suitable "sharpness". I would argue if xess is equipped with a flexible mip bias option, this issue may no longer be pronounced.
And actually to my eyes, fsr2.0's "information" more comes from the sharpening effect due to the Lanczos resampling scheme, instead of actual information reconstructed.
 
Latest DF Direct has a segment dedicated to Intel Arc XeSS vs. Non-Intel XeSS btw showing the qualitative differences.

Intel "releasing" XESS for other vendors but intentionally gimping it seems quite dumb. They get almost no points for cross vendor compatibility and get to easily confuse people as to the quality and cost of XESS to begin with, it's a lose/lose.
 
I personally w

I think this is partially due to the manual mip bias option in dlss 2 and fsr 2 whereas xess trains the network to get the suitable "sharpness". I would argue if xess is equipped with a flexible mip bias option, this issue may no longer be pronounced.
And actually to my eyes, fsr2.0's "information" more comes from the sharpening effect due to the Lanczos resampling scheme, instead of actual information reconstructed.
I do also wonder if mip bias is being appropriately handled, both DLSS and FSR2 had issues with that during initial deployment if I remember correctly. The XeSS SDK developer guide (available on github) gives advice on how to appropriately set mip bias for XeSS, so I dont think its correct that that is a learned feature.

To be clear, FSR2 is definitely reconstructing genuine information, and this is trivial to show by looking at rendered res screenshots compared to the displayed res. Remember, the Lanczos sampling is being applied to subpixel locations accumulated over time unlike in FSR1 (which is the same general mechanism for super resolution as in DLSS and XeSS and all would be capable of producing a true 4k image in a static scene given enough samples, to the best of my understanding). See the comparison below, all the newest upscaling methods resolve the true vertical window slats on the dark brown building that are not present in the 1080p render. This is why the 4k methods have greater than 400 % extra information in the metric I presented. The 4k methods have better resolving power for the scene and so they show more than 4x as much info with only 4x the pixels. Upscaling the 1080p native image using the FSR1 command line tool to investigate the effects of Lanczos spatial filtering measured a much lower 423 % information gain over 1080p native and this gain comes without resolving the window slats and we know the gain is "artificial".

Combined Stacks.png

I do consistently find that FSR2-quality images at a given output measure higher than the equivalent DLSS and native outputs and I agree with you that this excess (the 446 % vs 444/443 % from my first post) could be due to the specific analytic filter used instead of the neural filter that DLSS/XeSS are presumably learning. However, I think it is very difficult to tease out how much of this difference is "fake" over-sharpening vs "genuine" high frequency detail without something like a test pattern or seeing artifacts like ringing. Aliasing would be true detail but "incorrect", so thats probably a subjective call depending on what you want to measure.

Anyway, I'm personally glad XeSS is finally out because I have a 1070 and XeSS gives me another method I can experiment with and learn from. Prelimary results from my own testing in SoTTR with the newest XeSS at 1440p show that all the XeSS modes have more information than native 1440p, similar the spiderman results I presented above. The observation that XeSS may scale worse than the others may also point to a mip bias issue. But the last sentence of my prior post still applies!
 
Intel "releasing" XESS for other vendors but intentionally gimping it seems quite dumb. They get almost no points for cross vendor compatibility and get to easily confuse people as to the quality and cost of XESS to begin with, it's a lose/lose.
Gimping is the wrong way to describe it IMO. RDNA 1 does not accellerate what it uses, so it uses a slower path.
I do not think one would complain if Pascal GPUs ran DXR titles with RT on poorly. Same thing.
 
I do also wonder if mip bias is being appropriately handled, both DLSS and FSR2 had issues with that during initial deployment if I remember correctly. The XeSS SDK developer guide (available on github) gives advice on how to appropriately set mip bias for XeSS, so I dont think its correct that that is a learned feature.

To be clear, FSR2 is definitely reconstructing genuine information, and this is trivial to show by looking at rendered res screenshots compared to the displayed res. Remember, the Lanczos sampling is being applied to subpixel locations accumulated over time unlike in FSR1 (which is the same general mechanism for super resolution as in DLSS and XeSS and all would be capable of producing a true 4k image in a static scene given enough samples, to the best of my understanding). See the comparison below, all the newest upscaling methods resolve the true vertical window slats on the dark brown building that are not present in the 1080p render. This is why the 4k methods have greater than 400 % extra information in the metric I presented. The 4k methods have better resolving power for the scene and so they show more than 4x as much info with only 4x the pixels. Upscaling the 1080p native image using the FSR1 command line tool to investigate the effects of Lanczos spatial filtering measured a much lower 423 % information gain over 1080p native and this gain comes without resolving the window slats and we know the gain is "artificial".

View attachment 7117

I do consistently find that FSR2-quality images at a given output measure higher than the equivalent DLSS and native outputs and I agree with you that this excess (the 446 % vs 444/443 % from my first post) could be due to the specific analytic filter used instead of the neural filter that DLSS/XeSS are presumably learning. However, I think it is very difficult to tease out how much of this difference is "fake" over-sharpening vs "genuine" high frequency detail without something like a test pattern or seeing artifacts like ringing. Aliasing would be true detail but "incorrect", so thats probably a subjective call depending on what you want to measure.

Anyway, I'm personally glad XeSS is finally out because I have a 1070 and XeSS gives me another method I can experiment with and learn from. Prelimary results from my own testing in SoTTR with the newest XeSS at 1440p show that all the XeSS modes have more information than native 1440p, similar the spiderman results I presented above. The observation that XeSS may scale worse than the others may also point to a mip bias issue. But the last sentence of my prior post still applies!
I appreciate your analysis, cuz I have never done a very close look at all the reconstruction techniques. I gathered most of the results from DF videos ;)

I would say this is only partially right in saying "the Lanczos sampling is being applied to subpixel locations accumulated over time unlike in FSR1". As far as I understand, Lanczos is used in two places in FSR 2.0: the first one is for history buffer resampling, like you have pointed out (functions the same as the bicubic resampling schemes from other popular TAA/TAAU solutions); yet it also serves as a spatial upscaling filter when the history buffer is no longer valid. This is especially the case when FSR2.0's disocclusion test "kill" the potentially ghosting history pixels and they need a fallback solution, which I personally suspect is why there's fizzling in those areas. And this is actually what I refer as the over-sharpening. Note that, the second spatial resampling sharpness is dynamically adjusted based on the amount of history available, so this is not a binary toggle and theoretically could still exist in still shots due to sub-pixel mismatch (probably not very likely due to AMD's "lock buffer" mechanics).

I'm not really saying "FSR 2 doesn't reconstruct enough info", more so I'm just suggesting one of the hardest points in comparing different upscaling solutions. To be fair, when at still, it is not really surprising to see FSR2.0 can at least reconstruct 4x the info if you think about the fact that most of these temporal solutions requires 8 jitters, which means ideally it should recover up to 8x though practically not possible ^^
 
So both the DSOgaming and TPU article (linked in comments above) suggest that XeSS is doing a better job at anti-aliasing compared to FSR2
How are they measuring this? A delta from Ground Truth. You could have a numerically superior upscale that is worse when it comes to the subjective human interpretation. Measurements need to factor in the Human Factor and scale based on perception, as you allude...

This is a difficult property to quantify, but here is my attempt (putting into practice a method Ive been thinking about for awhile). Image entropy is a measure of how much "information" an image contains...The highest entropy image possible at a given bit rate is that of pure white noise!
That was my immediate concern in the idea of measuring entropy. You don't know if it's useful 'noise', actual high frequency detail that should be present, or just noise noise. An upscale could eschew some blurring/averaging/smoothing and even smatter a little noise in the results to up the entropy and measure as 'more detail' if interpreted as such. Bang a high intensity sharpening filter in there and you'll get 'more information' but it'll look like arse. It seems to me, as presented in this number, you're just counting amount of difference and not qualifying if that's wanted or unwanted, so I'm not sure it's that useful. So much would come down to interpretation.

Have you seen Iroboto's work on an image analysis tool? :
 
That was my immediate concern in the idea of measuring entropy
Agree, entropy would be a flawed metric for details analysis as well as other mathematical metrics.

We use eye charts to check visual acuity because we can make sense of the symbols in these charts (so that we either recognize these symbols or not). Entropy is not a measure of sensible (for us) information, simple white noise is the image with highest entropy, yet it would not carry any bits of useful information to us.
To compare how much of details that make sense can be rendered with different technics, we simply need to find such details in games (that we know exactly how they should look like even if they are uncompleted, text for example) and compare technics in such scenes. Small text at further away distances (so that texels are subpixel) would be a perfect test - the more text we would be able to recognize, the higher effective resolution would be - the same stuff as with the eye charts. Though, we already know how DLSS 2 / XeSS and FSR 2 resolve subpixel details thanks to DF, so problem is kind of solved.
 
How are they measuring this? A delta from Ground Truth. You could have a numerically superior upscale that is worse when it comes to the subjective human interpretation. Measurements need to factor in the Human Factor and scale based on perception, as you allude...


That was my immediate concern in the idea of measuring entropy. You don't know if it's useful 'noise', actual high frequency detail that should be present, or just noise noise. An upscale could eschew some blurring/averaging/smoothing and even smatter a little noise in the results to up the entropy and measure as 'more detail' if interpreted as such. Bang a high intensity sharpening filter in there and you'll get 'more information' but it'll look like arse. It seems to me, as presented in this number, you're just counting amount of difference and not qualifying if that's wanted or unwanted, so I'm not sure it's that useful. So much would come down to interpretation.

Have you seen Iroboto's work on an image analysis tool? :
I agree that those are important concerns, but i dont think they should discount the idea entirely, especially because we can combine entropy quantification with qualitative assessment and domain knowledge. I analyzed screenshots of the same scene that were described as acquired without applying additional sharpening, had denoised RT effects, were "static" so that noise was also averaged away by temporal accumulation, and were free from obvious artifacts by comparison to native renders. I "know" there wasn't noise noise because I looked at the image! I also measured local entropy of hand-picked regions of the scene to ensure that comparisons were as close as possible. Finally, I only applied the analysis to a narrow subject, comparing images produced with the same algorithm but different amounts of rendered input, or different algorithms but the same amount of rendered input.

Irobotos work is actually the reason I am here (after a long absence, I had been a lurker awhile back, thanks iroboto) because I found that exact post when searching for image quality assessment tools. I initially started working with frequency analysis as well but found it difficult and time consuming to create whole image summary measures for very similar images of the type produced by different upscaler modes (ie Quality vs Balanced). More direct methods for measuring detail resolution are either unavailable (test patterns*) or complex in practice (16K ground truth images to estimate transfer functions, or finding specific features like text as OlegSH suggets). Entropy is the only descriptive statistic I've been able to find that can consistently resolve the expected differences between upscaler modes and resolutions while being relatively straightforward to measure and present. I think of entropy as a similar and complementary tool and with similar intent to frequency analysis and I think many of your critiques of entropy's usefulness apply to an extent to frequency analysis. I think Iroboto's work is fantastic and a "gold standard", but its also complex and time consuming to implement. In contrast, entropy is much simpler, but less specific and can require more rigorous qualitative "validation".

*Granath posted a link in the FSR2 thread that actually has some screenshots of a pattern that can be used as a test pattern. Ill post an analysis there but the takeaway is that FSR2 does not show negative oversharpening effects while stationary or in motion whereas DLSS2 does show the telltale signs of ringing in motion but not while stationary. XeSS images were not provided.
Agree, entropy would be a flawed metric for details analysis as well as other mathematical metrics.

We use eye charts to check visual acuity because we can make sense of the symbols in these charts (so that we either recognize these symbols or not). Entropy is not a measure of sensible (for us) information, simple white noise is the image with highest entropy, yet it would not carry any bits of useful information to us.
To compare how much of details that make sense can be rendered with different technics, we simply need to find such details in games (that we know exactly how they should look like even if they are uncompleted, text for example) and compare technics in such scenes. Small text at further away distances (so that texels are subpixel) would be a perfect test - the more text we would be able to recognize, the higher effective resolution would be - the same stuff as with the eye charts. Though, we already know how DLSS 2 / XeSS and FSR 2 resolve subpixel details thanks to DF, so problem is kind of solved.
I agree that entropy is a flawed metric and that your method would be more meaningful. The significant downside of your method is the time it would take to do the analysis and the significantly fewer scenes we could analyze. Entropy can be applied to any image, but comparisons to other images would need to be done very carefully. You are paying for the lack of quantitative specificity with qualitative rigor. Futhermore, I think that entropy analysis over a large N could be more accurate in aggregate as the different perceptual preferences (sharper or smoother) average away. I disagree that understanding upscalers is a solved problem. XeSS uses a stochastically trained NN and there are no guarantees that it can resolve the higher resolution image in the way that an analytical method can. Intel has written a paper, A Reduced-Precision Network for Image Reconstruction, describing NN reconstruction with quantized weights to reduce computation and that is another un-studied mechanism by which a NN approach to image generation can have important differences in comparison to what we are used to with traditional TAAU-like methods and DLSS. (It also literally reduces the bitrate/entropy of the calculation!)

Thanks both for the comments.
 
Back
Top