Silent_Buddha
Legend
Only if Nvidia breaks from their two-level cache hierarchy. My understanding is that Nvidia really tries to avoid the chiplet/tile route for their designs and keep the cache subsystem streamlined even at the cost of compensating it with faster and more expensive GDDR memory. Power consumption might be one of the motivations, since any high-speed interface going off die will degrade power/perf metrics and signal latency. The Infinity Cache implementation in Navi31, while providing more throughput than the previous generation, takes a significant hit in latency.
The question that must be asked, IMO, is at what point (if there is one) does it become more power cost effective to have a relatively small (relative to main memory) amount of off chip (chiplet) SRAM serving as a last level cache versus significantly larger pool of increased speed main memory which would consume more power than a slower pool of main memory?
Is what NV are doing by remaining monolithic with reliance on faster memory (thus more reliance on fast and more power hungry traces as well as faster memory which consumes more power) a design choice around saving board power costs or a design choice about simplifying the design and monetary cost of the GPU?
It's not like we have an apple to apples comparison in the wild to say that monolithic with the fastest main memory possible is necessarily a win over chiplet with slower main memory. We can't exactly compare NV to AMD for this as their GPUs are designed around 2 different design philosophies with some overlap. It's possible that NV keeping memory amounts low in comparison to competing AMD products isn't purely due to monetary cost but power consumption costs as well.
Regards,
SB