I know it said that of element-to-element comparisons, but 1 pair comparisons does not equal a solution for a given point. And now that I look back at it, I do see that I was mixing up my figures -- numbers came out 510,000 results actually computed per second on a 6800, not 150... which is still total crap, but not as bad as I had in my mind. It still doesn't change the real problem it had when I tried it (and I haven't really thought about it since, so I'd totally forgotten what sort of throughput I got out of it) -- it could not maintain that sort of throughput beyond scene complexities of a certain size. While it supposedly does scale O(n log n) in the algorithmic sense, it doesn't scale well at all when that hierarchy gets big and deep because you'll end up memory access bound really fast. You might have noticed that the demonstrations don't go beyond the point of a scene containing even 20k verts, and that's probably also true of the Fantasy Lab demonstrations.
The only major problems which he claims to have fixed with the newer version include not needing to do multiple passes to get multiple bounces and working in conjunction with the displacement mapping technique.
Thanks for the insight, that's good info. It'll be interesting to see if and when Fantasy Lab ever comes to demonstrating the game they're supposedly developing, to get an idea of scalability now.