Intel XeSS anti-aliasing discussion

The performance benefit of running XeSS varied quite a lot. It's no surprise that cards without DP4a were effectively useless. We tested several AMD cards that showed negative performance deltas — Vega 64, 5600 XT, 5700 XT, and even RX 6500 XT. XeSS in performance mode on the 5700 XT basically matched native performance, while rendering 1/4 as many pixels before upscaling.
 
Last edited:
What's going on? It looks like DP4a (8-bit integers) is being emulated via 24-bit integers on architectures that don't natively support it.

So not even using the packed math capability in VEGA and RDNA1 it seems.

Those are some awful results all around.

Edit: Is DP4a just 4X8? I remember the Vega whitepaper said it was capable.
 
Last edited:
Is Vega 64 the 7nm one? I ask because according to PC gamer only the 7nm version has DP4a.
Wonder if AMD would find in their hearts to try and optimize XeSS on their chips because I think Intel has already done gamers a great service.
 
It wasnt a surprise that without DP4a (or other acceleration) wouldnt net anything meaningfull. Th xbox consoles do have it right?
 
It wasnt a surprise that without DP4a (or other acceleration) wouldnt net anything meaningfull. Th xbox consoles do have it right?
They do but the performance on the low end RDNA2 cards is troubling when thinking about the Series S. Though we have to also remember that with this running on basically everything, I doubt it is optimized for any dp4a GPUs outside of Intel's.
 
They do but the performance on the low end RDNA2 cards is troubling when thinking about the Series S. Though we have to also remember that with this running on basically everything, I doubt it is optimized for any dp4a GPUs outside of Intel's.
We have to see how the non SM6.4 DP4a kernel performs on Intel igpu's to get a better indication of how it may perform on XSS.
The XS would get that version if intel and ms work on bringing it to the GDK.
Much like how amd worked to optimize FSR2.0 for it.
 
I think all RDNA/RDNA2 GPUs except for Navi10 also have INT8/INT4 support and Radeon VII/Vega20 has it as well.
Sorry to disappoint you...
Let's open RDNA1 ISA and vega7nm(Radeon VII)
Screenshot_5.png
As you can see dot instructions are supported in the Radeon VII and are mentioned throughout the file
But in RDNA1...
Screenshot_6.png
Whoops
 
Sorry to disappoint you...
Let's open RDNA1 ISA and vega7nm(Radeon VII)
As you can see dot instructions are supported in the Radeon VII and are mentioned throughout the file
But in RDNA1...
Whoops
Yeah, that's because it doesn't exist in Navi10, which came out first. I'm guessing it was borked since Navi10 was a buggy mess in general and both Navi12 (GFX1011) and Navi14 (GFX1012) both support it.
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1011.html

As for semicustom parts, it's too bad the PS5 doesn't support it. Still, I'm interested if XeSS might perform well on the Xbox.
Oh, I'm sorry. I read it wrong, you meant int4/8, not dot4
Then you're right, all current gpu architectures support these instructions

BUT, xess doesn't use "regular" int, it uses dot4(4-dimensional int, https://learn.microsoft.com/en-us/w...lsl-shader-model-6-4-features-for-direct3d-12)
And they are only supported by AMD GPUs in RDNA 2 and Vega 7nm
I meant dot4 INT8 and dot8 INT4, indeed
 
Do the 3 current gen consoles support hardware dp4a and packed int8 math? I personally feel game consoles tend to cut these units.
 
Honestly, XeSS is already a lot faster than I thought. The last gpgpu based ml temporal upscale method I can recall is Facebook's paper from 2020, which requires 24.42ms to output to 1080p on Titan V and they don't even release the figures for 4k (though this is for 16x supersampling). At least XeSS is still usable and provides much better results than hand-crafted heuristics like fsr 2.0 (that is, if we ignore the transparency jitter issue)
 
1664411949502.png

Death Stranding DC is the new champ for upscaling options comparisons. No competition :yes:

XeSS is again the softest here when comparing them all at default sharpness at least (which you can adjust for DLSS and FSR2). Intel really should've added user controllable sharpening from the start.

Performance is about the same as in SoTTR I'd say here on a 3080.
That is DLSS>FSR2>XeSS>Native+TAA
But none of the upscaling options load 3080 to 100% per GPU monitoring.
 
Last edited:
Surprised I could not find this earlier. This might need further testing.
Adding another 1060 video
 
Last edited:
Back
Top