I have heard GPUs(as well as CPU) suck at ray-tracing.
Davros wants to see zsouthboys toaster making soup
I think toasters have alot to offer. They are the unsung hero of modern times.
That's like saying toasters suck at making soup - you can do it, it's just not as efficient.
If its not efficient, then how did ATI demo team do it with good frame rates?
GPUs aren't really that bad at raytracing... what they're bad at is building the acceleration structure, but once that's there, they're actually pretty efficient. They get hurt on ray divergence, but so does *every* architecture, that's just physics .
Cache coherency is painful
Is the new Ruby demo really raytracing on 4800 series in real-time? I have heard GPUs(as well as CPU) suck at ray-tracing.
Well certainly improved DB performance would "help", but we're at the point where it's only "slow" when it diverges... so it's just a case of losing SIMD coherence. From a hardware point of view, we seem to be settling on 4-16ish SIMD widths, so it's probably just something that we're going to have to live with. Now this is where typically people start to think along the lines of "well this is the only reason rasterization is faster... because we've designed SIMD hardware for it", but I'm not totally convinced. Then again, I'd be happy for the hardware people to go ahead and prove me wrong by making a MIMD design with similar throughput to a SIMD oneWould improved DB perf. help here, or will Real-Time Ray-Tracing (RT RT) on GPUs require a paradigm shift?
Well that's a hard question. Certainly any algorithms that recapture coherence in some set of rays would improve things (on all architectures), which is really where the only big improvements are going to come at this point. As far as tracing random, totally incoherent rays, I doubt we're going to do much better than we can now. The point being that the "general" ray tracing problem throws out all information that would allow you to make it faster... kd trees are pretty much optimal if you have *no* information about the rays.Could a new algorithm solve the perf. issues? Or is it purely a hardware limitation?
Well certainly improved DB performance would "help", but we're at the point where it's only "slow" when it diverges... so it's just a case of losing SIMD coherence.
From a hardware point of view, we seem to be settling on 4-16ish SIMD widths, so it's probably just something that we're going to have to live with. Now this is where typically people start to think along the lines of "well this is the only reason rasterization is faster... because we've designed SIMD hardware for it", but I'm not totally convinced. Then again, I'd be happy for the hardware people to go ahead and prove me wrong by making a MIMD design with similar throughput to a SIMD one
Well that's a hard question. Certainly any algorithms that recapture coherence in some set of rays would improve things (on all architectures), which is really where the only big improvements are going to come at this point. As far as tracing random, totally incoherent rays, I doubt we're going to do much better than we can now. The point being that the "general" ray tracing problem throws out all information that would allow you to make it faster... kd trees are pretty much optimal if you have *no* information about the rays.
That said, the interesting problem is to find good ways to group and organize large sets of ray queries so that they can be efficiently and coherently evaluated in parallel. Incidentally one of those ways is called rasterization
There is a specific format I have done some research on that I am starting to ramp back up on for some proof of concept work for next generation technologies. It involves ray tracing into a sparse voxel octree which is essentially a geometric evolution of the mega-texture technologies that we’re doing today for uniquely texturing entire worlds
Well more that just as soon as rays diverge in the data structure traversal (i.e. at triangle edges), the SIMD stuff effectively gets scalarized to the point where you wind up with 1/16 throughput. Still not terribly (and still generally faster than a lot of similarly-priced CPUs!), but definitely noticable. So ironically, you have trouble with scenes with lots of tiny triangles and high-frequency data structures... remind you of any very similar rendering algorithm that you know of?So is it a case of "too many branches" (for modern hardware) then?
No issues - certainly a lot of this stuff will come to the forefront in the next few years since we're definitely at the point where it's quite possible to implement an efficient ray tracer on consumer graphics hardware. Again, we're not quite at the point of generating the data structures there (ironically the best data structure scatter/sort unit on GPUs is currently the rasterizer ), but that will improve with time too.Thanks for all the great info!