I've just finished to read RWT's preview. I previously thought that they got fully coherent L1 caches per SM. Wishful thinking..L2 is coherent with itself, there is a single L2 for each memory bus ... there can only ever be one copy.
That's not a cache coherency scheme, that is simply caching.