The thing about something like Unreal Engine is that it's a hugely complex bit of kit, and it's easy to use features in a way that can impact frame rates or cause performance drops in particular moments. There's a certain amount of setup work involved with a scene (including per frame), and while multiple cameras in a scene is common multiple scenes isn't, and I imagine you could easily lose a lot of performance. Things like the order in which you do different steps of the different scenes, how you manage LOD as you change the balance between scenes, how GPUs are naturally less efficient at lower resolutions (multiple scenes = lower resolution per scene), potentially doubling the draw calls, being stalled at the front end of the GPU more ... I dunno stuff like that.
Optimisation isn't just about a dude looking at a screen of code and juggling some operations around - in a engine like Unreal there are a lot of performance implications hidden behind a tick box or a slider or a linking this effect to that effect or whatever. It's going to be tough for a small team pushing a complex engine in an uncommon way.
Even the mighty Squaresoft ran into massive texture streaming issues with UE for FFVII remake at first! And they were a huge team with a huge budget.
It's quite hard to parachute people into a team to fix things. You need to develop some familiarity with the project, be able to work with the people already at the team, not interrupt the workflow of people who are crunching for release, and also have permission from the team to do so. They may feel a delicate balance would be disrupted at a difficult time. A lot of the time simply having access to people who can answer questions is the most important thing - but you have to have worked out what it is you want to ask!
They'll surely improve all-round. With the XSX the issue is that the compute units are going somewhat less well utilised than in PS5, despite the architectural similarity. The question is whether better tools (which allow for insight and fine tuning) will allow this to improve relatively speaking on XSX. It always takes more work to effectively use something that's wider - but there's a point at which either you can't or it's not worth it.
In some areas PS5 will always have an advantage though. For example, fillrate where you're not bound by main memory bandwidth. No amount of squeezing more maths out of the compute units is going to change that. Unless maybe you moved from e.g. a rasterised particle system to tiled compute based one (no, I don't know how you'd do that, I just read about it from someone like sebbbi).