I'm really curious if the 36CU's are purely for backwards compatibility or if it's also some degree of forward planning for a "Pro." Extract the best performance possible from the smallest possible chip, with a view to doubling that chip, at a relatively economical level, within 3-5 years.
They already have the engineering work they've done on the PS4Pro's "butterfly" design, and 72CU's on 5nm might not be all that much bigger than the XSX's 360mm2 beast. I need to go and check out the kind of area reduction 5nm will bring though...
Given that they've gone with 14gbps GDDR6 *sigh* a jump to 16gbps would make it relatively cheap to bump up the memory bandwidth too. I'd love some HBM and 1TB/s bandwidth there though
36 CUs would be easier to double, though there's less room below since the Series X would be at an intermediate position between the PS5 and a doubled Pro. Sony may need to consider if doubling is enough, especially if there were to be a Pro variant of the Series X.
Going from 14 to 16 Gbps would be a scant upgrade, and proportionally weaker than the PS4 to Pro transition with a ~14.3% bandwidth improvement stretched over 2x the CU. Perhaps there would be an even faster interface speed, or a change in width, such as at least matching 320-bits, if not going wider.
Sony's variable clock solution might have some kind of impact on a future Pro, since we'd assume Sony wouldn't want to drop the clock. Raising the clocks could be interesting, though the current clocks are being described as being in a region that's already inefficient. 72 or more CUs may be interesting versus a competing xPro if they are both much larger in CU count but one is still striving for constant clocks. There may be some load scenarios where its costlier or more difficult to hold to a constant clock with many more active units.
What else could be scaled with a Pro console like the CPUs might be an interesting question. Zen 2 seems to be a more successful initial implementation versus Jaguar, so the clocks currently given aren't artificially low. A 33% jump would give clocks that would be ~4.7GHz, and node jumps at that clock range are often threatened with clock regressions. Don't know if they'd try for a clock bump, or if a non-standard number of additional cores could be an option.
Maybe the restriction is not a product of some issue with the actual configuration of the CUs. But rather RDNA’s CUs poorly mimic the performance of GCN’s CUs in some form or fashion, and the frequency of a RDNA CU must be boosted to compensate.
In other words 2.23 GHz isn’t some consequence of having just 36 CUs but the other way around. BC requires RDNA CUs running at high frequency to perform adequately across the board. The frequency is high enough to limit the number CUs that Sony can readily use in its design.
There is a Sony patent about varying clocks on the fly so that a new unit can emulate an older unit's performance, but with a true clock that is potentially faster and a spoof clock that the legacy software perceives as the original fixed clock.
https://patents.google.com/patent/US9760113B2/en
A mildly higher true clock could paper over any higher internal latencies with clock speed so that by the time the spoof clock has reached what the older code expects for forward progress, the emulated operation is done. 2.23 GHz versus 800 MHz or 911 MHz could be too much, but that might be why there are BC modes--whose clocks may still vary somewhat above the advertised base clock depending on the characteristics of what is running.
The lack of abstraction seems to mean that PS5 was limited to 36 CUs. How long will that limit extend? PS5 needs a decent abstraction so devs can't rely on specific CUs or registry files, and have to access the GPU through a degree of abstraction so that it can be replaced. The old idea of hitting the console hardware is dead. For a platform with longevity in its library, abstraction is pretty essential.
The ISA and hardware itself have their own abstraction. The architecture promises certain outcomes or responses to various inputs, but whether those responses are accurately depicting what is happening internally are not required information for the software. Many values like the wave or CU ID are accessed with operations that read from system registers or are privileged locations. The hardware can give an answer that is valid in terms of what is possible for the legacy software, even if the true answer in terms of the modern implementation is different.
CPUs running a VM can trap out guest requests for CPU or system information, where the hypervisor or a storage location that tracks the host vs guest relationship can patch in values appropriate for the guest.
We don't, but if it was targeting price-point first, why go chasing TFlops that may possibly require extensive cooling solutions?
It's possible Sony's design may have had more pessimistic projections for the cost of 7nm wafers at the time the decision was made, so a bit less die area may have made more economic sense. It might have been considered easier to dial back on an over-engineered cooler with the next console hardware revision than it is to eat the cost of a die that would need to wait until 5nm for the next adjustment.