Apple Dynamic Caching on M3 GPU

Scott_Arm · Nov 10, 2023

Explore GPU advancements in M3 and A17 Pro - Tech Talks - Videos - Apple Developer

Learn how Dynamic Caching, the next-generation shader core, hardware-accelerated ray tracing, and hardware-accelerated mesh shading of...

developer.apple.com

Sounds like the intent is to make large shaders more efficient. Very interesting feature. Just watching now.

My understanding before fully watching this is that if you write an "uber shader" the gpu will allocate registers, resources for the worst case of all the branches in the shader, so you can easily run into register pressure etc which tanks gpu utilization. This feature should allow for efficient (more efficient) execution of uber shaders, to minimize the number of total shaders and improve utilization.

I feel like it's not a coincidence that this feature came at the same time as ray tracing and even mesh shading.

Edit: Okay, so the register file and the threadgroup/tile memory are now caches. Lol, a few more seconds and it looks like one large cache(per gpu core) to serve the register file, threadgroup/tile memory and the buffer/stack cache.

So if you spillover from the cache data will go to last level cache, but the simd scheduler will adjust occupancy to make sure that your running threads fit back into the on-chip cache. Really cool. Basically the cache is not fixed segmentation between the register cache, the buffer cache or the tile cache, so any one of those can take up as much of the cache as needed, and then the scheduler will make sure the cache is not spilling over to a higher cache or main memory.

Wondering how long it'll be until we see nvidia, amd, intel copy this design. I think for wide gpus, and ray tracing performance it could be a big win.

Scott_Arm · Nov 10, 2023

https://x.com/SebAaltonen/status/1722904062444638465?s=20

https://x.com/matiasgoldberg/status/1722973592420253749?s=20

Lurkmass · Nov 10, 2023

I see that Apple's latest GPU architecture are now capable of doing dynamic register allocation in hardware rather than static register allocation by the compiler as is the case on other GPU architecture but they still recommend programmers to compile specialized shader variants in one of their most recent technical disclosures!

David Frank (@bitinn@mastodon.gamedev.place)

Attached: 3 images @mjp I have just one question, that is, if dynamic caching actually solve most of the uber shader problem, then why in their latest video, titled "Learn performance best practices for Metal shaders", they still strongly advocate for macros (variants) over a uber shader...

mastodon.gamedev.place

Frenetic Pony · Nov 10, 2023

It's soooo cool. It's a shame that Apple makes products for uh, Apple, otherwise CPU and GPU wise Nvidia, AMD, Intel, etc. would all be doing pretty poorly in the consumer space against such competition.

I do wonder/hope someone will micro-benchmark the new GPU arch, it'd be interesting to see if you really could skip specialized shader variants/what impact this really does have on raytracing/etc.

Remij · Nov 10, 2023

Awesome. This is a direction we need to head in for a multitude of reasons.. but even if they all released products supporting this capability tomorrow, it would still be a while before the market is there.

It's got to happen at some point though, and I'm glad Apple seems to have kicked it off!

Apple Dynamic Caching on M3 GPU

Scott_Arm

Explore GPU advancements in M3 and A17 Pro - Tech Talks - Videos - Apple Developer

Scott_Arm

Lurkmass

David Frank (@bitinn@mastodon.gamedev.place)

Frenetic Pony

Remij

Similar threads