I was not talking about GPU accessing unified DDR4 system memory. I was talking about unified graphics memory (GDDR5 or HBM2) between two GPUs. No paging obviously. Direct cache line granularity access by both GPUs to the same memory.
But then you are back to a single structure used by all GPUs.
The motivation for moving to multi-die GPUs (multi-GPU GPU sound wrong) is two fold:
1. One optimized design for all markets: Low end, mid range and high end.
2. To circumvent the reticle limit for silicon ICs.
Regarding 1.) For low end you'd have a single die with attached memory. For mid range, two dies, - each with memory attached. For high end you'd have 4 dies plus memory. Your GPU is now a NUMA multi-processor system. This means you need a high bandwidth interconnect to glue the dies together, something that is very possible with silicon interposers.
Regarding 2.) Nvidia just built a >800mm^2 behemoth. They are at the reticle limit and can't go up. The market for these behemoths in terms of units is small; in terms of dollars, it's big. That implies high risk (if they had any real competition at the high end). If you could build the same system out of four 200mm^2 dies you reduce the risk by a lot. You also gets an economies of scale cost reduction.
Also, the end of silicon scaling is coming to an end. However, Moore's law (ever lower $/transistor) won't come to a crashing stop. As the time between each process node advance increases, the time to amortize the capital expenditures for equipping a fab also increases, lowering the cost per mm^2 Si. That means there will be a lot of silicon in GPUs in the future.
Cheers