What I still don't understand is that if even armchair architects see the major faults/cons in going with ESRAM, then how could MS have proceeded with it?
Because the armchair architects are ignorant of most of the considerations and cost projections. They are also, seemingly ignorant of their ignorance, and hence vocal and opinionated.
To me the only logical explanation is that they needed the TV stuff, requiring 8GB so badly, that they were willing to sacrifice:
yields, ease of programmability, graphical power.
esram was decided on before 8GB. sram is also pretty defect tolerant.
I know that it would have been a much better choice to go with a separate ARM SOC to handle ALL of the TV stuff, including having it's own RAM. HDMI passthrough mode with the "xbox turned off" would cost 5-10 watts at the most, instead of 70+ watts. For Kinect stuff the system could be kicked out of standby, handling the voice requests.
A separate SOC would have added more to the cost and added complexity to the board. A separate pool of ram would have not added to game-accessible BW. The Xbone isn't just doing HDMI pass-through. You cannot use voice to kick the Kinect into voice recognition mode - that is a contradiction.
Anyway, this way Xbox could have launched with 4-6GB of GDDR5 main ram, and the (in that case) useless ESRAM transistors could be left out; allowing for a GPU competitive with PS4.
4 GB of ram would have left the Xbox one with either a 128-bit bus (and hence still required esram) or committing it to using 8 x 512MB chips, likely ensuring high cost per MB of RAM over the platform lifetime.
There are some synthetic, hypothetical situations in which ESRAM could prove useful, but they are far outweighed, and outnumbered by all the associated cons. That's my conclusion at least.
The esram is useful in every Xbox One game. There are even real, in game situations where it offers performance advantages over a 256-bit GDDR5 configuration too.