Apple Dynamic Caching on M3 GPU

Scott_Arm

Legend



Sounds like the intent is to make large shaders more efficient. Very interesting feature. Just watching now.

My understanding before fully watching this is that if you write an "uber shader" the gpu will allocate registers, resources for the worst case of all the branches in the shader, so you can easily run into register pressure etc which tanks gpu utilization. This feature should allow for efficient (more efficient) execution of uber shaders, to minimize the number of total shaders and improve utilization.

I feel like it's not a coincidence that this feature came at the same time as ray tracing and even mesh shading.

Edit: Okay, so the register file and the threadgroup/tile memory are now caches. Lol, a few more seconds and it looks like one large cache(per gpu core) to serve the register file, threadgroup/tile memory and the buffer/stack cache.

1699641438408.png

So if you spillover from the cache data will go to last level cache, but the simd scheduler will adjust occupancy to make sure that your running threads fit back into the on-chip cache. Really cool. Basically the cache is not fixed segmentation between the register cache, the buffer cache or the tile cache, so any one of those can take up as much of the cache as needed, and then the scheduler will make sure the cache is not spilling over to a higher cache or main memory.

Wondering how long it'll be until we see nvidia, amd, intel copy this design. I think for wide gpus, and ray tracing performance it could be a big win.
 
Last edited:
I see that Apple's latest GPU architecture are now capable of doing dynamic register allocation in hardware rather than static register allocation by the compiler as is the case on other GPU architecture but they still recommend programmers to compile specialized shader variants in one of their most recent technical disclosures!

 
It's soooo cool. It's a shame that Apple makes products for uh, Apple, otherwise CPU and GPU wise Nvidia, AMD, Intel, etc. would all be doing pretty poorly in the consumer space against such competition.

I do wonder/hope someone will micro-benchmark the new GPU arch, it'd be interesting to see if you really could skip specialized shader variants/what impact this really does have on raytracing/etc.
 
Awesome. This is a direction we need to head in for a multitude of reasons.. but even if they all released products supporting this capability tomorrow, it would still be a while before the market is there.

It's got to happen at some point though, and I'm glad Apple seems to have kicked it off!
 
Last edited:
Back
Top