Great thanks, I think I understand that now. So GPU's without some form of cache invalidation optimization are going to suffer a penalty in terms of increased cache flushes when used in concert with very fast storage solutions which will presumably become far more prevalent next generation. So essentially the cache scrubbers are a GPU performance aid as opposed to something that actually speeds up the IO from the sounds of it?
It seems to be a performance measure for the GPU. Data being read in will be written to its destination regardless of what state the GPU's execution is in.
How the notification process works, and what sort of synchronization operations are needed with the scrubbers aren't clear.
Sony may be trying to reduce the time and cache thrashing related to the global stall and cache invalidates, meaning the events are cheaper but are still used in the same fashion as the normal invalidates.
If there's some kind of non-standard way of synchronizing with that hardware, maybe some workloads can use a custom localized barrier that might allow some parts of the GPU to be excluded from the stall--but that may be a more extensive change than what has been described.
Does XSX have something similar or is it's GPU going to suffer relatively compared with the PS5 in this regard. Or indeed is it's storage system not fast enough for this to matter that much.
It hasn't been mentioned by MS, and the PS5 presentation indicated scrubbers were something that were customized for Sony and that AMD did not find compelling enough to include in its own IP despite it being available.
This seems to be an optimization for one category of methods that is probably uncommon now and might a subset of many other implementations. Sometimes these may be optimizations that are nice to have, but might not find much sufficient use or benefit in a broader market.
The PS4's volatile flag may have helped make GPU compute more capable of working alongside graphics, but the concept didn't catch on anywhere else and nobody's indicated that the other GPUs suffered significantly for the lack of it.
The PS4 had a form of triangle sieve that might have been a forerunner to the culling-focused primitive shaders in Vega, so the idea might make sense. However, the PS4's implementation in particular has only really been mentioned in the pre-launch articles in 2013, and I don't recall it being mentioned since.
The PS4 Pro's ID buffer and checkerboard optimizations have had an unclear amount of adoption. Many of the leading engines found something other than checkerboard relatively quickly.
There may be other areas that the XSX has emphasized, like sampler feedback customizations or other tweaks that might provide different benefits.
Cache srubbers sound like they'd be sensible in the PC space too unless the programming model on the PC makes this impractical. Assuming not though then I wonder if this is one of the enhancements in PS5 that we might see in future AMD GPU's that Cerny mentioned. If not RDNA2 then perhaps RDNA3 (which may better match the timescales of very fast IO solutions being prevalent in the PC space).
The PC space has a wider range of hardware and has to worry about a broader legacy base that might not give the IO capability that would justify them. If there are PS5-specific ways utilizing SSD data by shaders or GPU hardware that interface with the scrubbers in a non-standard way, that may make them less likely to be used.
Discrete products have a PCIe bus to transfer over, and until there's more unified memory those explicit transfers may be heavyweight enough to exceed the savings from scrubbing.
APUs might be better-positioned due to the single memory pool, but then we'd need one with more of a performance focus.
I don't really understand that though as it's only if the data changes will in caches that it matters, and how likely is that? Like, you've a load of geometry and textures present drawing some scenery, and then a character. New scenery is loaded. Now to draw that scenery, the caches are fill with character info so they'd naturally reload the scenery data with the latest copy in RAM.
It's only an issue if the GPU is drawing scenery, the scenery data is cached, and new scenery data is loaded. That seems a rare occurrence, that the caches stick with the same data.
Perhaps this is an optimization with a certain class of workloads in mind, such as virtual texturing like in the later Trials games? A virtual texturing cache is a range of memory addresses that may be updated by data from different disk locations or different assets based on how the GPU/renderer chooses to update it. Couple that with some of the ideas about how the latest Unreal demo may be virtualizing its geometry, there could be objects or subsets of them at different levels of detail being read in or switched out of a limited working set.
Assigning specific ranges within the virtual asset caches may see benefit from the scrubbers, since they could be used to clean up a given allocation more cleanly without thrashing other in-progress objects and allow a new object to take it over. However, that may require a level of interaction between the scrubbers and shaders that might not match reality, more fine-grained synchronization than reality, and an unclear level optimism with regards to SSD latency.