Will off-chip L2-4 cache be viable again?

I've been thinking, L2 cache on Wii U is eDRAM at 45nm and it has been proven via backward compatibility with Wii games/software and homebrew software to play Gamecube games on Wii U that latency of Wii U's CPU eDRAM L2 Cache is at least equal.

When 14nm matures enough for full scale mass production and eDRAM yields are acceptable or good, would it make sense to have an off die L2 DRAM cache? At 14nm node the latency could be acceptable for off die L2 cache eDRAM or more expensive SRAM or possibly if someone chooses to use 1T-SRAM.

Would be somehow financially make sense if someone made a viable design for mobile/handheld and even home consoles?

I am aware that Intel APU's with top of the line IRIS graphic chip have if I remember correctly 128MB L4 Cache that isn't embedded into CPU, but are on socket on an MCM(?) and is used as L4 cache for CPU and as GPU's VRAM(?)...
 
With HBM you don't need any more dram as cache. Only some sram caches for really quich caches. But the bandwith HBM memory provides is enough. L3 caches are only needed of they are much quicker than main memory. But as hbm is already on the package, you don't need it any more.

Or, while HBM memory is a bit limited in size, you can see it as additional cache memory, because you still need the external memory pool of slower memory (e.g. Ddr3/4)
 
Normally you'll want your cache to be fast (i.e. low latency). High bandwidth cache helps in some case (e.g. graphics) but generally that's the exception.
 
I am aware that Intel APU's with top of the line IRIS graphic chip have if I remember correctly 128MB L4 Cache that isn't embedded into CPU, but are on socket on an MCM(?) and is used as L4 cache for CPU and as GPU's VRAM(?)...
Yes. Intel has 128 MBs L4 off-die cache. The value in a console is debatable (indeed, see "value of eDRAM" discussion in this forum for that debate!). but with stacked memory, it's a certainty that we'll have a pool of fast, low latency RAM and the major storage RAM. Hence this off-die L3/L4 cache is going to happen, and has already with Intel. ;) (Depending on what one calls off-die. Alternatively, this cache will be on die negating the need for off-die cache)
 
The physical distance of cache (memory location) from the logic that processes that info is directly related to the latency. Registers are the fastest and lowest latency form of storage. Then comes the L1 sram cache which is the next closest in proximity, and so on. Off chip vs off die are two different things. Off chip would almost certainly have higher latency than most and off die (but on same package) solutions. Off chip would certainly be ideal for gpu functionality as their computational processes are massively parallel and latency tolerant. But an off chip solution for purely cpu functionality cost vs benefit to consumers would problably be minimal. Yes it would be faster than accessing main memory ddr3/ddr4, but the increased cpu performance would probably be negligible.

Large off die or L4 cache would certainly help an integrated gpu or an apu's gpu, like the Intel Iris 5200 like you mentioned, and Xbox One's 32MB esram is the 3rd level of gpu cache since GCN already has an L2, so in a way could be refered to as an L3 cache.
 
Large off die or L4 cache would certainly help an integrated gpu or an apu's gpu, like the Intel Iris 5200 like you mentioned, and Xbox One's 32MB esram is the 3rd level of gpu cache since GCN already has an L2, so in a way could be refered to as an L3 cache.

It's more of a separate RAM space tbc.
 
Back
Top