I don't see how he can take the penalties for texturing into consideration for RSX and not for Xenos. I know the texture units are decupled but texturing still is a limiting factor. I think ATI's statements regarding this issue is the most telling, they said it would outperform r520 in some cases and loose to it in others.
I'm comfortable with thinking xanos is a little faster than r520, it has major efficiency gains, but we still don't know the differences between xenos alus and traditional alus. If I had to guess I think the differences lie in the fixed function part, my understanding is that when you include this stuff in your flop counts gpus cross into the teraflops category. Perhaps in order to facilitate unified shaders some concessions needed to be made in fixed function because each alu is no longer as specialized and it would be two expensive to include all the fixed function logic for both ps and vs in every alu. The unknown here is how much die space you can save this way as opposed to others.
That's the real magic of xenos, however they did it they packed allot of alus into little die space.