For there to be additional bandwidth overhead due to GPU and CPU memory traffic, the CPU's access patterns don't need to be random, just different from the GPU's. If the CPU isn't hitting the same arrays or happens to be writing something when the GPU is content with mostly reads, there would be some additional cycles lost.It can. There is no free lunch. Which means most of the "latency-sensitive" work CPU does should be limited by things in its cache.
CPU that spams memory bus with a lot of random requests will die underutilized in any case.
So essentially CPU access pattern should be: prefetch, work in cache, write. Where latency savings all go inside the "work in cache" phase.
Any workload where you need to "scan" a lot of memory should go to GPU. It's not negotiable.
The idea that CPU bandwidth consumption should be minimized if the GPU is bandwidth constrained was the point of the PS4 slide back when it was first released, although it doesn't seem like the zero bandwidth case is all that practical.
As far as "prefetch, work in cache, write" goes, I don't know how broadly I should interpret your wording. While it is preferred for a working set to fit in cache, CPUs don't have full control over whether the cache hierarchy writes to memory, since it's not a local store. There are hardware prefetchers and software prefetch, but there are practical limits to how far ahead they can go for most workloads before bandwidth consumption on unnecessary reads becomes counterproductive, or before the cache starts evicting parts of it. Zen 2 has decently sized caches (not clear what the capacity is for the consoles), but high performance cores will quickly exhaust what they can hold in many cases.
This is outside of cases where the CPUs or DMA controllers are expected to move data into system memory, which would have overhead.
There is peer to peer DMA functionality, and there were somewhat recent Linux changes mentioning it for Zen. Perhaps if the drive works with that it could avoid a trip to main memory.Would it be possible for SSG to work in concert with an external PCIe 4.0 NVME based SSD if coupled with a Zen2 based system? As I understand it the SSD would communicate directly with the IO die on the CPU but I'm unclear if it could pass the data straight through from there to the GPU memory (over the PCIe 4.0 16x link) without going via the system memory and CPU as I understand is the case for the XSX (and presumably the PS5).