Explore GPU advancements in M3 and A17 Pro - Tech Talks - Videos - Apple Developer
Learn how Dynamic Caching, the next-generation shader core, hardware-accelerated ray tracing, and hardware-accelerated mesh shading of...
developer.apple.com
Sounds like the intent is to make large shaders more efficient. Very interesting feature. Just watching now.
My understanding before fully watching this is that if you write an "uber shader" the gpu will allocate registers, resources for the worst case of all the branches in the shader, so you can easily run into register pressure etc which tanks gpu utilization. This feature should allow for efficient (more efficient) execution of uber shaders, to minimize the number of total shaders and improve utilization.
I feel like it's not a coincidence that this feature came at the same time as ray tracing and even mesh shading.
Edit: Okay, so the register file and the threadgroup/tile memory are now caches. Lol, a few more seconds and it looks like one large cache(per gpu core) to serve the register file, threadgroup/tile memory and the buffer/stack cache.
So if you spillover from the cache data will go to last level cache, but the simd scheduler will adjust occupancy to make sure that your running threads fit back into the on-chip cache. Really cool. Basically the cache is not fixed segmentation between the register cache, the buffer cache or the tile cache, so any one of those can take up as much of the cache as needed, and then the scheduler will make sure the cache is not spilling over to a higher cache or main memory.
Wondering how long it'll be until we see nvidia, amd, intel copy this design. I think for wide gpus, and ray tracing performance it could be a big win.
Last edited: