Speaking of thread management, if someone here have access to a Bulldozer system, I would be grateful for a test run with this little console application in the post attachment. It measures the sync latency between all the CPUs and cores (physical/virtual/local/remote) that are present in the system.
It's quite...slow. 30ns for same module, 116 for across module. At 3.6GHz that translates into about 108 cycles for same module and a whopping 417.6 cycles across modules. Now, I don't know how the test is structured (if there are any details please share), but if it's doing some message passing (Send->RSVP, for example), there'll be some overhead associated with OS messaging - still, it looks rather sub-mediocre. This is under Win 8, by the way.