There is also an upcoming I3D paper about GPU raytracing using optimized kd-tree algorithms and static scenes. (I happen to be one of the authors, and the work is an evolution of the Foley et al. work). The short answer is that GPUs are faster than a single CPU, but they aren't great at raytracing because of divergence in execution between rays. As the execution traces diverge in the acceleration structures, you end up with a lot of SIMD execution stalls. GPUs also have to currently to a bunch of extra work because there isn't an effective way to do a stack, so it has to be emulated or worked around via algorithm modifications. Sadly, the G80's 16KB of global memory between the threads isn't very helpful as it's too small to really do a stack for the number of parallel execution contexts to run efficiently, however, there might be fruit here. We currently are talking ~19Mray/s on an X1900XTX (Conference room, shadow rays), and about the same on a G80 with DirectX and the current state of the drivers and shader compilers. Using simpiler scenes, we can execute at much faster rates, but those aren't realistic. (All the current published fast raytracing numbers also do not do anything but trivial shading, but GPUs obviously do well here...) With heavier tuning via CTM/CUDA, we might be able to squeeze out a little more, but unless we can regain ray coherence, it's difficult to do leaps and bounds better.
Cell is actually a raytracing monster, compared to other non-custom architectures, in certain situations. The Saarland folks (and others including Stanford) have Cell raytracers >60Mrays/s for primary rays. Multi-core CPUs are also showing great promise as people are showing >5Mrays/s per processor for comparable designs (i.e. no tricks that only really work for primary rays), and there is impressive work from Intel on really optimizing the heck out of raytracing on CPUs. My main concern about CPU implementations is their ability to shade fast. It's going to be interesting to see hybrid CPU/GPU implementations here...
In our I3D paper, we argue that what you would likely do on a GPU is rasterize the primary hits, and raytrace secondary effects such as shadows, reflections, and refractions depending on your ray/frame rate budget. We have a demonstration implementation in Direct3D (Brook + D3D) as well as CTM+GL that demonstrates this hybrid system, and it was running in the ATI booth at SIGGRAPH and shown during the CTM presentations. The paper should go up when finalized in late January and will be presented at I3D 2007 by Daniel Horn. For raytracing to really get faster on GPUs, we need a way to deal with the cost of instruction divergence, and more importantly perhaps, ways to really build complex data structures.
Regardless, we are still quite aways away from the projected 400Mray/s needed to approach the performance of current games. (I can't remember who stated this as a rough lower bound, but it was in the SIGGRAPH 2006 raytracing course.) We need a few more years of Moore's Law and a few algorithm tricks, mostly involving dynamic scenes, and things will start to get interesting. But, rasterization will also increase at or above Moore's Law as well and game developers will continue to find tricks and hacks to make things look right, or use partial raytracing solutions like POM.
(Sorry for the long post)