Maybe half of the pro was plain ole PS4 CUs and the other half Vega-based.
The PS4 Pro's advertised 2xFP16 throughput wouldn't be reachable if half the GPU didn't support rapid packed math.
If we consider some of the data from the github leak valid, there are Vega-era instructions that the BC mode for both PS4 and PS4 Pro explicitly states are not supported.
It seems odd to make half the CUs support features that cannot be used due to non-support by the other half.
But not all chips will have random defects. In fact, most won't have any defects at all.
In a chip with no defects, which CUs do you disable?
There can still be faults in patterning outside of the wafer-level random defects, or discards due to parametric yields.
It's possible they have the granularity in testing to know the power consumption of individual CUs or blocks of them, and functional tests would know if one or more CUs are failing tests at required clocks. Error-prone CUs would be targets for inactivation, and the most power-hungry CUs in each SE could be targets if deactivating them can get the chip under TDP.
Isn't the front-end
usually placed at the middle, between groups of CUs? Wasn't that the case with Liverpool?
My interpretation of the die shots of the 2013 consoles is that the CU arrays are a single block.
https://www.extremetech.com/gaming/...ered-reveals-sram-as-the-reason-for-small-gpu
For the PS4, I think there's some logic on the side of the CU arrays nearest the CPUs and uncore that have some symmetry with the two halves of the CU section, which makes me think there's SE front end logic there.
For the Xbox One, I think there's similar placement, in part because the ESRAM on the other side seems like it dominates that region with its size, interface, and ROPs.
The PS4 has a cluster of SRAM along the mid-line of the CU block on the far side, which I think is part or all of the L2, and the Pro has something similar. While I'm speculating on this point, I think there may be some benefit when activating/deactivating CUs that the array deactivation happen such that whatever portion of the GPU that is active is symmetric relative to the data paths coming out of the L2.
I think generally, there's some choice on whether the CU arrays are split up based on the overall die size of the solution. I think the tendency is to keep things symmetric on both sides of the front ends if there's an even number of SEs, but it seems like there can be a choice on whether two individual SEs are neighbors or separated since the 4-SE GPUs appear to split SEs such that there are two adjacent SEs in one half of the strip and two adjacent in the other.
I'm not sure I've seen die shots of the smallest GCN GPUs to know how they are laid out.
Limited die area gives most APUs one SE and so only one CU block. The console APUs have 2 or 4 SEs, with the 2-SE consoles having one block. This might be encouraged by the layout challenge of fitting in the CPU and uncore blocks, as stretching the GPU further along that dimension may make it difficult to fill in the die or fit the rectangle efficiently onto the wafer.
At 4 SE (33+ CUs), I think there's less of an option since there needs to be 4 SEs and GCN seems to prefer their front ends be reasonably close--possibly due to their needing to swap geometry data.
For RDNA, the RX 5500 has one SE and reverts to having one block of CUs and the front end off to the side.