You are not paying attention here. Each CUs need to access to the L1 and L2 cache to work. On PS5, each CUs have about 0.111 MB of L2 cache. On XSX each CUs have about 0.096 MB of L2 to work. So if you feed those CUs with the same amount of things to do (ideally), XSX CUs will have less L2 cache so there will be more L2 miss, hence more accesses to the GDDR6 memory to work.
There are significantly more CUs operating simultaneously as a result however.
You're still looking at direct comparisons between 2 systems. Which is probably why you're hung up on the numbers there.
You should be looking at how much work they need to process, of which the architectures are built around.
You can easily showcase a cpu processor per instruction, per branch, per core, per l1, per l2, per system memory to be significantly better than any GPU.
But it stands to point that there are 32 at most versus _thousands_ on the GPU.
The GPU has a significant overhead to get started, but once it's gets started it's ability to consume massive amounts of work as a result of having a massive amount of processors is what allows it to bolt ahead.
We can point at individual metrics all day long, but at the end of the day there are 44% more processors in the XSX. Hyper tuning hardware helps, it helps that PS5 is fairly general purpose, flexible and adaptable because of it's clock speed setup. But at a certain scale of load, having more compute is going to matter more than all the hyper tuning you can do.
Smaller loads will definitely benefit more the higher performance processors. That's not being debated. But each of these metrics, conveniently leaves out how many more processors are working at the same time.
tldr; 2080 probably has similar more ideal characteristics with cache and memory bandwidth vs the 3080. And in some workloads yes, the 3080 is poorly leveraged against the 2080. But in others, which the workloads we are heading into, it's 2x the performance.
If cache was the greatest bottleneck to getting more compute we would have made caches the focus. But each generation, the compute gets larger and wider, the caches increase to support it and the clockspeeds generally improve only slightly.
I don't mind looking at these things, but I don't want to overattribute something in which every architecture will suffer from. Its not like PS5 solved cache hit/miss problems by clocking high and keeping processor counts low. We need more compute because we need more calculations going into next generation.