And even with their considerable process advantages the EDRAM pool is only for the most expensive SKUs Intel ships, I would hazard a guess that just the MCM + EDRAM is a significant proportion of the entire die cost for XB1
And even with their 1.6 gHz edram, Intel are topping out at less BW from their off-die memory than MS's on-die esram.
I think the pie-in-the-sky "1000 GB/s" PS4 slide has been pretty successful at convincing people that MS's esram sucks. It's easier to make a powerpoint slide than engineer a processor.
They like to talk about energy consumption/efficency being a huge factor in all areas of hardware design including esram/edram.
If energy consumption was a factor, It was a very very wise decision that power savings of esram over the superior bandwidth and superior size of edram was a significant factor in the choice, and the overall development of the hardware as I dont think gamers anywhere who spend $400-500 on a console, $60 a year for online, and a hundred or more on games a year would tolerate a $2 year jump in their energy bill from a console with a 10% higher power consumption during gameplay gameplay.
Electricity bill is only one factor, as AMD and Intel's continued focus on processor power draw shows. MS chose a power envelope and engineered a fast solution to fit within it, to be manufactured within their budget constraints.
For better or worse, they wanted a silent console, and they could only spend so much on cooling. The heatsink in the Xbox One is already more expensive than the both of the heatsinks in the original 360 combined. I'd wager it's more expensive than the one in the more power-hungry PS4, too.
So can I assert the following:
In general this setup has higher theoretical bandwidth, harder to program for, harder to master, slow to maximize but with the pro of a higher ceiling than a simpler architecture. If you were to pinpoint a true weakness it would involve not having monumentally more bandwidth over the competition, instead esram has a 25% more bandwidth (over a simpler competing external architecture). They likely could have gone with higher bandwidth (edram) and more CUs but it would be less than 32mb of working space) which may have been much harder to program for.
How much would more performance would MS be able to extract from "monumentally more" esram BW? From within the esram, they have much more than +25% peak BW per CU, as they have fewer CUs. And there will be many situations where even this doesn't add much.
More bandwidth from an off-die edram pool would have required a very wide off-chip path - much wider than the 360 used (and MS were specifically trying to get away from this design) and wider than even Intel use on their 22nm Iris enbled Uber processors.
And if you're talking on-chip, then who's going to make that for them ....? Intel? Nope. Renesas on their 45 nm node? Nope. IBM on their 32nm node (at probably a larger die size and goodness knows what engineering cost)?
I would assert that edram was not a realistic option within their constraints, and that off-die edram would have probably netted them less BW but possibly higher power draw, and that on-die would have been difficult to source from the possibly no-one that could have manufactured it for them.
The esram's one real weakness is apparently its small size. Even another 16 MB (~40 mm^2) would significantly alter the proposition of using large g-buffers or texturing from it. And at an extra ~ $10 that still seems more attractive than current on or off die edram.
DMEs are there to help saturate the bus much like it would over PCIE.
Using DMEs to saturate the esram would likely also saturate the main memory bus and kill CPU performance through contention.
DME's are there to allow "processor free" transfer of data between memory pools, and copies within the same pool (most likely main ram).