I just came across the _mm_prefetch intrinsic in intel's sse reference. I have a small doubt which I would like clarified.
It says that this intrinsic loads a "cacheline of data" at into caches. And there are no alignment requirements on the address. My understanding was, (and is, till the matter is cleared), that the cpu fetches data in multiple of 64 bytes (my cpu's cacheline size, checked with CPU-Z), and aligned at 64 bytes. In view of the statement in the reference, either
1) this intrinsic would load a[64] into the cache.
2) this intrinsic would throw away the lower 6 bits of a, (ie align it to 64 bytes) and then fetch the cache line.
I think former is the case but I want the latter. So when I am giving prefetch hints, should I manually align it to 64 by throwing away the last 6 bits. I'll probably will have to, but I thought it's better to confirm first.
Thanks in advance.
It says that this intrinsic loads a "cacheline of data" at into caches. And there are no alignment requirements on the address. My understanding was, (and is, till the matter is cleared), that the cpu fetches data in multiple of 64 bytes (my cpu's cacheline size, checked with CPU-Z), and aligned at 64 bytes. In view of the statement in the reference, either
1) this intrinsic would load a[64] into the cache.
2) this intrinsic would throw away the lower 6 bits of a, (ie align it to 64 bytes) and then fetch the cache line.
I think former is the case but I want the latter. So when I am giving prefetch hints, should I manually align it to 64 by throwing away the last 6 bits. I'll probably will have to, but I thought it's better to confirm first.
Thanks in advance.