I'll answer instead
. Well these diagrams were leaked before already, so there's nothing new to see here.
I don't even know where to start describing all the differences - I don't think I'm alone not having expected that many changes.
1) tmu organization. No longer shared across arrays, each array has its own quad tmu. I'll bet the sampler thread arbiter had to change with that as well.
2) tmu themselves changed. While they always had separate L1 caches (I think - the picture is misleading), now the separate 4 TA and point sampling fetch units are gone (thus one tmu is 4 TA, 16 fetch, 4 TF instead of 8 TA, 20 fetch, 4 TF). Also, early tests indicate they are no longer capable of sampling fp16 at full speed (dropping to half and one quarter at fp32 IIRC).
3) ROPs. They now have 4xZ fill capability (at least in some situations) instead of just 2. The R600 picture indicates a fog/alpha unit which is now gone, though I doubt it really was there in the first place (doesn't make sense should be handled in the shader ALU). The picture also indicate shared color cache for R600, I don't know if this was true however. Could be though (see next item).
4) no more ring bus. Clearly with rv770 ROPs are tied to memory channels (just like nvidia G80 and up), and there are per-memory channel L2 texture caches. Instead of one ring-bus it seems there's now different "busses" or crossbars or whatever for different data (it's got a "hub", it's got some path for texture data etc.)
5) Other stuff like the local data store, read/write cache etc.
That seems like the most important to me, architecture-wise (of course it got 10 shader arrays instead of 4 too...)