With new manufacturing technologies, SRAM is becoming more expensive. Making big on-die cache is expensive and is gettin more expensive. And also, making the L2 bigger also makes it slower.
The only way to have relatively cheap big cache is to put that cache on a separate die which is made on older manufacturing process(what AMD did in RDNA3). But that increases latency and power.
One reasonable direction might be keeping the outer level cache on a separate die, but trying to minimize the overheads of the die-to-die traffic, for example by integrating the dies vertically, cache die below or above the logic die. Something similar than what AMD did on the Zen-3D/v-cache.
Apple has fast access to their DRAM because the memory controllers on the same die (which is costly on new mfg processes) and also because they use LPDDR line of memory which is more latency-optimized, less bandwidth-optimized than GDDR line of memory.
And Apple can afford to pay for the cost of having the memory controllers on-die because they are selling their products at very high price, they have good margins anyway. Consumer GPUs have much less margins than Apple products and AMD has to try to save more on mfg costs.
On the flip side resolutions aren't really getting higher. So do you need bigger Sram caches for infinity cache ? Wouldn't the original 128megs and under still be large enough for their purposes ?