Crystalwell would be the L4 variant, which put it in the cache coherence hierarchy. For EPYC, it might exceed the probe filter/directory's ability to track lines. Earlier versions of Hypertransport assist could not fully cover a whole 8-socket system if all caches hit the same home node. EPYC doubtless has increased the capacity to more closely match the capacity of the L2 and L3 caches for a total of 8 dies in a system, but a coherent EDRAM die could easily exceed the total capacity of all caches in the system--and only if one were present system-wide.
The Skylake version where the EDRAM is memory-side would avoid this, since it would transparently sit in between a memory request and an actual DRAM access at the home node, but other than maybe halving DRAM latency for a hit, AMD's CCX, package, and socket latencies would be unchanged. Bandwidth-wise, I suppose it would be counting on an improvement to interposer or fan-out packaging like Nvidia and Intel are planning on. Otherwise, the chips lack the pads, bandwidth, or power efficiency to do better than what they have currently for within a socket, much less 2P.
Something would need to be done for the data fabric on-chip as well. Part of Zen's philosophy of "balance" which pervades everything is that the low, sustained, and best-case are rather close. The on-die fabric as we know it would scale a crossbar whose complexity and delivery either matches everything or will strangle the EDRAM.