At least for high framerate modes, it might be a fixed function issue as the lower resolutions depend more on that than the CUs. There's still a limit at the main memory bandwidth, but perhaps for the ROPs the bandwidth compression is seeing better throughput.
If we assume they are both 64 ROPs for colour writes, then XSX will only need 467GB/s (1.825GHz * 64 writes * 32bpp per write), so they actually have a lot more bandwidth than they need for that. PS5 would need 570GB/s at 2.23GHz. Bandwidth compression may provide about up to an extra 30% on average (448GB/s + 30% = 582GB/s) which puts raw fill rates at a big advantage on there.
Things may look a little different on blending rates, but there's evidence that transparency heavy scenes are also at a disadvantage on XSX, so there's something funny going on there, and there shouldn't be an API issue blocking such simpler operations. Normally, you'd see the blend rates saturate the available bandwidth and XSX should have automagically won, but that's seemingly not being observed.
Things may look different again where 4xFP16 render targets (not pixel shader precision) are used for certain render passes, but the addition of different 32bpp precision formats may skew things back again as developers optimize for performance vs pixel quality.
I'll leave it at there. :V