My basic point really is extremely simple. Either Xenos doesn't need all that bandwidth or RV670 needs more.
Or you're comparing apples to oranges.
I believe its already been well answered in that Xenos can use all that bandwidth, but in most situations doesn't need too, and the same would translate to RV670. Most of the time its fine, but their are rare occasions when it would be bandwidth limited.
When you stay entirely on-chip, the bandwidth to internal RAMs is not a scarce good. Above a certain size, the area overhead of splitting 1 RAM into 2 is minimal.
So once you've decided to go this route, you can design the rest of the architecture while recklessly spending on bandwidth without worrying about efficiency. A direct consequence is that standards methods of comparison become pretty much meaningless, yet you're still pretending that apples are oranges. They are not.
You only need to read the Xenos article to see that the real bottleneck in the architecture is not in the on-chip bandwidth, but in the interface between the 2 dies.
Let's compare the two architecture GPU architecture for typical cases.
In one case, you're rendering completely covered 2x2 pixel tiles. In a traditional GPU, this results in compression. In Xenos, the data travels compressed to the ROP die and undergoes an 8-fold(!) bandwidth expansion. Sure, you see impressive eDRAM bandwidth usage, but with no additional benefit over a traditional GPU.
In the other case, when no compression is possible, the eDRAM uses only 1/8 of the theoretical maximum, because now you're limited by the link bandwidth.
So if you want to compare bandwidth numbers, the meaningful numbers are the bandwidth of a regular GPU memory interfaces against (the memory bandwidth of the system memory + the bandwidth of the inter-die link + the read bandwidth for RMW operations - the bandwidth to copy from eDRAM to system RAM). In this equation, there's no contest: an RV670 will probably beat a Xenos by a factor of two, if not more.
The real advantage of the eDRAM arrangement lies not in the bandwidth but in the complexity and area reduction that it allows: no compression logic, much more coherent data streams to the system memory (-> significantly smaller and simpler MC), and drastically lower latency for the eDRAM (-> large area savings due to smaller latency FIFO's).
Xenos has a really neat architecture, but it's just too naive to take one number out of context and build a whole argument around it without looking at the whole picture.