Woah, is that how it's done?? Devs have to manage CUs like CPU cores and load them up with work? I thought work was dispatched to the GPU and the schedulers allocated resources, distributed work around available CUs, interleaved work (async compute) etc. And that way, you can get the same PC code and run it on a 28 CU part or a 56 CU part and it'll distribute across available resources automatically. IIRC Naughty Dog used specific CU reservation on PS4 but then they could in closed hardware. Otherwise I thought you just throw work at the GPU and let it handle management, optimising the work you throw at it to make best use of its scheduling.
What level of GPU-resource-distribution management is there and what level is down to the developer's code?
As
@iroboto says, it's not the developers who have direct control over the CUs, it's the GPU itself (I should have used a better choice of language). It's the GPU that manages that, but the workloads that developers give the GPU will affect utilisation - as you point out.
Whatever word, what's the loss in efficiency more CUs at lower clocks actually experiences versus fewer CUs at higher clocks? Is there a very tangible difference behind Cerny's choice, or is it negligible in the real world? This isn't really about XBSX vs PS5 but GPU design and where the best balance lies, but AFAICS among IHVs, narrower and faster isn't a strategy any are chasing. that might be because faster peaks out at a certain width beyond which power consumption skyrockets, and the only way to get more performance then is wider requiring lower clocks.
There's no inherent loss in efficiency at lower frequencies for a given GPU - if anything the reduction in memory access latency might help a little bit.
What I think Cerny is alluding to is width relative to the stuff that supports the CUs. Both PS5 and SX have two shader engines, but the SX has more CUs per shader engine for more compute. PS5 has fewer CUs, but runs at a higher frequency, so all else being equal its Graphics Command Processor and Asynchronous Compute Engines will have fewer CUs to manage and so - I expect - be a little better at keeping CUs busy (I expect they have a maximum rate at which they can disptach and retire shaders). Likewise, if the rasteriser is the same between both the PS5 will probably be better at issuing work for shitty small triangle through the 3D pipeline. I expect ROPs are also less likely to be a bottleneck on PS5, so long as you aren't main memory BW limited.
PC GPUs do use greater and greater width on high end models as you say, but they also scale up more than just ALUs. PS5 has 10 CUs per Shader Engine, as do RDNA1 Navi 10 and RDNA2 Navi 21 and 22 iirc. AMD actually dropped this to 8 per SE on RDNA 3 again iirc.
Series X has 14. I don't know if the GCP or ACEs or whatever were modified to account for this, but it does seem logical that under some circumstances it may be harder to keep those CUs as busy. You probably could have added a third shader engine, but this would have cost quite a bit of silicon and power and probably have had less benefit than increasing the width of the two shader engines.
It will be interesting to see how UE5 progesses on the SX. Micropolys there are software rasterised, which hopefully will make good use of all that compute.