@3dilettante Would more memory controllers have any benefit for the APU setup? Was just thinking how GDDR6 is organized as 16-bit per chan (32-bit dual chan) whereas GDDR5 is 32-bit per chan organized as 64-bit MC for Radeons.
I know that increased channel count has been cited as being beneficial, while the number of separate controllers hasn't been discussed as much. In the CPU space, the ability to operate the 64-bit channels in unganged mode has usually provided better utilization in more irregular workloads.
For GPUs, the 64-bit controller seems to have been preferable since GPUs frequently stripe data at a multiple of DRAM burst length, and the ratio of cycles for commands vs data transfers was apparently low enough that a controller could juggle multiple channels. I believe there was some discussion that HBM might have had more channels per controller, given the sheer number of them and their modest speed, but I wasn't able to find a clear reference.
GDDR6 would seem to provide less slack for the controller to juggle multiple GDDR devices, since the time available before a channel needs more command input would be smaller as the clock increases. That might create a need for one controller per DRAM, or some hierarchy of DRAM management. A 1:1 controller/channel arrangement might be too simple, since the GDDR6 device itself would have more global needs that would introduce a linkage between controllers.
More controllers means more flexibility and scheduling resources for accesses. If trying to balance the GPU and CPU, there's more buffer capacity and more possibilities for allocating memory that can help cater to the different priorities of the two processor types.
For an APU with coherent accesses, AMD's protocols also rely on the controller or its associated logic for broadcasting snoops and determining the global order of accesses to their associated DRAM. Having more of them would mean being able to sustain more concurrent traffic. Other features in the CPU space like memory encryption would scale with the controllers. I'm still not entirely sure where the GPU's compression logic lies, so this might have an association with controller count.
The downside would be that this scales up the global amount of controller hardware and interconnect cost. The CPU and GPU don't have the same desired level complexity per controller or client count, which may be an area of conflict.
AMD's allegedly leaked HPC "APU" slides sidestepped this with separate CPUs with DDR4 and a GPU with HBM.
A single-chip console APU might not have that option, though it could dispense with some of the high-end features of a server-bound chip. High-latency memory is already a feature of current console APUs, so plugging into a Vega-like mesh might be an option. Alternately, APUs show a willingness to have more complex controller layouts, which may produce a hierarchy or intermediate fabric topology between the full crossbar of Zen and Vega's mesh.