Nonsense, if the data set being pulled through the cache is larger than the cache then there is no replacement policy that isn't going to result in thrashing of that cache.
Of course there is. Lock a segment you want by MVA or by MVA block.
Note that typically these types of caches use random or psuedo LRU replacement policies.
Typical CPU cache does. If something is to be treated as a system cache, that's usually not what you want.
If that peice of data represents a significant part of the required BW and other things don't result in it being evicted from the cache (which they will) then I'd agree with you, but generally this isn't the case.
As a matter of interrest you'll notice that ARM show the GPU with it's own L2 cache, there are very good reasons why this is a seperate cache instead of sharing the L2 with the CPU.
Sure, if both caches are simply psuedo-LRU caches, then you avoid contention and thrashing (and likely read/write latency due to large array size).
But that doesn't mean that it doesn't have its disadvantages compared to a unified, shared L2. What happens when the GPU isn't working on anything and the CPU has to crunch numbers?