While that may be true, the more interesting question is why, it should be relatively straight forward for any cross platform dev to run a set of tests and determine that.
My guess would be some overhead from the hypervisor, or possibly some weird timing quirk with the bus.
In terms of compiler optimizations, the MS compiler is not particularly stellar, it does some things well, but others not so much.
Optimizing for modern processors is hard, and in some cases less that obvious, I recently saw a case where calling a function to increment a memory resident variable was faster than just incrementing it in line because the write is attempted speculatively by the processor which fails and has to be rescheduled the function call gives enough dead time for the write to succeed, so doesn't have the overhead of the reschedule and so runs faster.
My guess would be some overhead from the hypervisor, or possibly some weird timing quirk with the bus.
In terms of compiler optimizations, the MS compiler is not particularly stellar, it does some things well, but others not so much.
Optimizing for modern processors is hard, and in some cases less that obvious, I recently saw a case where calling a function to increment a memory resident variable was faster than just incrementing it in line because the write is attempted speculatively by the processor which fails and has to be rescheduled the function call gives enough dead time for the write to succeed, so doesn't have the overhead of the reschedule and so runs faster.