ERP said:
The problem with dynamic hinting in tree traversal is as far as I can see you have to do the same amount of work to do the hint as you do to do the branch, so I'm not sure I see how it saves you in this case.
Depends on the type (and amount) of work we're doing at each node I guess.
and the L2 cache bails you out.
Even when the L2 latency is more then 2x that of a branch mispredict?
I think we already know from experience what effect latencies of roughly this size have on an in-order CPU if you don't take special care with the code. And that was with much shorter pipelines to boot...