Why would you need any special purpose RAM? Just page on-demand from the DDR4 main system RAM. 16 GB DDR4 is common in new gaming computers (2x 8GB DDR4 memory kit = 80$). When games become complex enough to use 32 GB of system RAM, the price has already halved.
It would still offer a performance advantage as it wouldn't be limited by the bandwidth of the PCIE slot and contention for system resources. While I'd agree it's overkill for most users, there is still an advantage that likely makes sense for an enthusiast or prosumer product. Server benefits for scaling are obvious.
That still required NVLink, application process (excluding IBM's Power8 CPUs to my understanding), and effective prefetching to overcome some latency. If brought to consumer products in the current form it should work well for most needs, but still has some limitations.
AMD's APU design theoretically allowed the GPU to utilize ALL available system memory bandwidth as opposed to just that of the PCIE link. That's about as unified as you can get. Discrete cards would still have the PCIE bottleneck. The separate pool, as mentioned above, works around that limitation. While likely not an issue for most gamers (someone will likely do this anyways), scaling with many(say 8-16) GPUs would create significant contention. Very reason "Network Storage" likely showed up on the Vega slides as opposed to going through a host network. Costs aside, the separate pool is technically superior in the same way that having all resources in VRAM should be superior to paging anything.
Game data set (60 fps) changes very little from one frame to the next. Just take two consecutive frame screenshots, and you notice that most texture surfaces are identical and visible geometry (including LOD level) is mostly the same. My experience (with custom virtual texturing) shows only 2% of texture data set changes per frame on common case (of total 256 MB active data set cache). I'd say that automated on-demand paging (with pre-fetch hints) from system RAM should work very well for games.
I'm not suggesting the technique won't work well, but that one implementation will be superior to the other and probably better than the current method. At the very least from the standpoint of making developer's lives easier. The cost of that implementation is a separate matter, but would still be marketable. It should also make the actual GPU the primary component of performance. Similar to how DX12/Vulkan reduced reliance on the CPU.
Is Vega another 7970 that will take years before it's getting competitive?
Really? Haven't they learned anything?
It will probably be competitive, but using "Primitive Shaders" for example, that to my knowledge don't exist in any of the APIs or are supported by the competition, likely limit the use a bit. That statement seems more about it taking time for new techniques to really take hold. It's simply forward looking hardware with more capabilities than are currently practical.