ArchitectureProfessor
Newcomer
That sounds truly fascinating! I wonder though, how does this compare to lock-free software algorithms? They too don't aquire a real lock and check for 'collisions' before committing their results, or roll back and try again.
Lock free algorithms are a really good idea in some contexts, and they are the best you can do in many cases without additional hardware support. However, they are fiendishly complicated. For example, you can actually publish an academic paper if you come up with a new lock-free data structure. It is that tricky. (Then you can publish a second paper in which you formally prove it correct, but I digress.)
Recall all the x86 memory ordering issues we were talking about earlier in this thread? Well, if you write lock-free code, you really need to consider all those nasty memory ordering issues. There is also an issue called the "ABA" problem in which the a pointer to an object has been recycled and reused at just the wrong time. In such cases, doing a simple comparison to confirm the "old" and "new" values isn't enough, you need an epoch timestamp as well.
Also, since the programmers of Larrabee are very experienced I'm not sure if it's necessary to make it software friendly to that degree (or will it be opened to the public)?
Although we won't know until more hardware with speculative locking exists, but I bet it would help even experienced programmers. The problem is, if you make locks too fine grained, the lock acquire/release overhead kills you. If you make locks too coarse, the lock contention kills you. In fact, the right granularity of locking likely depends on the number of cores and the relative cost of acquiring a lock (and cache-to-cache memory sharing). Even for experienced programmers, this is a tough balance to strike.
The promise of speculative locking is to break this fine-grain/coarse-grain locking problem. The hope is it will give the parallelism of fine-grain locking, the low-overhead of coarse-grain locking, and the programming simplicity and maintainability of coarse-grain locking.
Consider a simple binary tree. Sure, you could put locks on every node and do hand-over-hand locking to get lots of parallelism. In fact, you could do read/writer locks to allow multiple read-only lookups of the tree to proceed in parallel.
But, in a system with speculative locking, a simple tree with a single lock will likely be both faster and more concurrent.
Let's just consider the read-only case. Even if you're just grabbing a reader/writer lock for read-only access, you're still writing a shared memory location. Thus, you're going to take misses on these blocks are you obtain write permission to them as you walk the tree. Under SLE, all of the tree locations could be cached in read-only mode by all the processors' caches, allowing each of them to in essence walk their private copy of the tree. Of course, if an occasional update occurs, speculative locking will still do the right thing.
In the cases of mixed updates and lookups, speculative locking has the same effect of eliminating the overhead of actually acquiring all the locks as you traverse down the tree. In also gives more parallelism, because it is unlikely that two threads are modifying the exact same part of the tree at the same time (assuming a big enough tree).