I'm trying to work out why the cache scrubbers add particular benefit now and why this is only being done now given that the speed of the data feed from main memory to cache hasn't changed just because the system has a faster SSD.
My assumption is that the data in cache is changing much more frequently thanks to the data in VRAM also changing more frequently although surely that's always been the case as graphics data has increased in size and the way that's been dealt with in the past is through larger caches.
So are we saying cache scrubbers are now needed because cache sizes are disproportionately small compared with the amount of data that's being used per frame as enabled by the new storage designs?
Are cache scrubbers essentially a hack because RDNA2 hasn't been designed to cope with the amount of data per frame that the PS5 will allow as a result of the huge jump in streaming speed?
Cerny's presentation, and some of the past presentations on Sony's compute goals hint at a sensitivity to latency. Latency helped defeat the GPU and DSP's general use in most audio for the PS4, and now there is Tempest. Cerny gave as part of his justification for the high-clock strategy scenarios where the GPU could not fully utilize its width, but could complete smaller tasks faster if the clock speed was raised.
If there is a memory range that may exist in the GPU caches that gets overwritten by a read from the SSD, the old copies in the GPU do not automatically update. RDNA2 is not unique in this, as in almost all situations the GPU cache hierarchies are weakly ordered and slow to propagate changes. In fairness, most data read freshly from IO need additional work to keep consistent even for CPUs.
If you don't want the GPU to be using the wrong data, the data in the GPU needs to be cleared out of the caches before a shader tries to read from those addresses. The PS4's volatile flag was a different cache invalidation optimization, so there does seem to be a history of such tweaks in the Cerny era.
The general cache invalidation process for the GCN/RDNA caches is a long-latency event. It's a pipeline event that blocks most of the graphics pipeline (command processor, CUs, wavefront launch, graphics blocks) until the invalidation process runs its course. This also comes up when CUs read from render targets in GCN, particularly after DCC was introduced and prior to the ROPs becoming L2 clients with Vega. The cache flush events are expensive and advised against heavily.
In the past, a HDD's limited parallelism and long seek times would have eclipsed this process and kept it at a lower frequency.
If the PS5's design expects to be able to fire off many more accesses and use them in a relatively aggressive time frame, then the scrubbers may reduce the impact by potentially reducing the cost of such operations, or reducing the number of full stalls that need to happen.