This type of prefetching is strictly in programmer's hands, so it's safe to shift this burden onto him.
The kind of prefetching I'm talking about is well in the hands of programmers as well. It is the dominant form used.
It's safe to assume the vector load case would not be used as a prefetch, or much less than it would be otherwise.
Loops that prefetch far ahead of the actual usage may take on too much overhead if some kind of bounds checking is included or the data region is padded enough so that the corner case can't wander a stride too far. It's also mostly useless to make a safety check on the SIMD architecture, since the case here is that divergence breaks up the prefetch run anyway.