Hmmmm...let's carry your idea further.
**
total guesswork below**
Assume equal bandwidth for read/write ops. You read from A, read from B, then write to C (which is A+B). The subsequent alpha blending operation then starts with a read from D that is simultaneous with the write of C from the previous operation, etc, like this:
...etc. Note that you can get 12 operations in 9 cycles (columns=cycles here). Without this newly discovered ability to read+write simultaneously you can only do one or the other, so you'd only ever be able to get 9 ops in 9 cycles max.
The columns above with the X's in them represent cycles where the eSRAM cannot read+write simultaneously. All other columns/cycles can have the eSRAM read+write simultaneously. Evidently (for reasons unknown) the eSRAM can only read+write simultaneously for 7 out of every 8 cycles. If you are doing alpha blending that nets you an improvement of (12/9)=133%, which should net you around ~133GB/s effective bandwidth for eSRAM typically sitting at ~100GB/s (actual value is 136.5GB/s for the 102.4GB/s eSRAM). Depending on rounding, that matches the value for alpha blending as reported in the article.
Other shader ops will vary on how much overlap there is. The max possible overlap would be one where the shader op is set up to have a read+write every single cycle (except for that 1 out of every 8 where ya can't for some reason); aka you'd want alternating read/write ops where you simultaneously read from one value and write to another. This would represent a full overlap 7/8 of the time, and give you 2 times the ops per cycle than you'd get without this revelation for those 7 of every 8 cycles.
Newly discovered bandwidth=(102.4GB/s)(7/8)=89.6GB/s
New Total eSRAM bandwith=89.6GB/s + 102.4GB/s=192GB/s
That 192GB/s figure is the max theoretical value but you'll never get close to that unless you had a shader op set up to alternate reads/writes the entire time, which probably would never make a lot of sense.
Thoughts?