Now that the full DF transcript of the interview with Microsoft's designers has been published, I figured I would tidy up some of the discussion earlier about what the eSRAM could be doing.
Per the designers, the eSRAM is divided into 4 8MB lanes, each with its own controller.
Internally, those chunks are subdivided into 8 modules.
The controller is able to issue accesses to different modules in a cycle, which allows for simultaneous reads and writes at the level of the lane if not the individual SRAM arrays themselves.
The description goes further to indicate that hitting the same areas repeatedly causes a loss of bandwidth, because the individual components themselves do not do multiple accesses.
The explanation for why peak bandwidth isn't twice that of the base bandwidth is that writes introduce bubbles in the access pipeline.
It seems that if there weren't concurrent read activity, the write bubbles can be hidden.
Some of my earlier examples as to how this can happen would need to be modified to take into account that there are 8 modules, and the eSRAM slightly favors reads over writes, which is not an assumption I made.
The behavior of the eSRAM bears some similarities to how the external memory controllers would work for a heavily banked DRAM device, and the point is made that they sit at the same level of the memory hierarchy, on the other side of a large crossbar.
My interpretation of the story they gave for the "doubled" bandwidth discovery was that the original minimum bandwidth number was decided upon and given as a design parameter for software development and planning before the eSRAM and its control logic was actually designed.
This isn't quite as early as the cocktail napkin stage, but it was well before the design was fleshed out. The description of the design that exists now shows that the eSRAM is structured to have separate paths for reads and writes, and it takes advantage of banked accesses to get simultaneous traffic.