Raytracing

Forrest

Newcomer
Is the new Ruby demo really raytracing on 4800 series in real-time? I have heard GPUs(as well as CPU) suck at ray-tracing.
 
I think toasters have alot to offer. They are the unsung hero of modern times.

Excuse me? How dare you place toasters on a pedestal and ignore our valiant friend the refrigerator! I am offended and demand an apology immediately. Also some milk.
 
If its not efficient, then how did ATI demo team do it with good frame rates?

Consumer GPUs are designed for rasterization, and as such are not best-suited to handle this task. CPUs are even worse, having no specialized hardware for any graphics rendering (unless you care to count SSE units under some loose definition). At least with GPUs you have an array of ALUs that can handle vast amounts of FP math, so it's not as bad as their narrower CPU brethren. It takes a lot of CPUs to handle RT in RT (haha, acronym confusion FTW!)
 
GPUs aren't really that bad at raytracing... what they're bad at is building the acceleration structure, but once that's there, they're actually pretty efficient. They get hurt on ray divergence, but so does *every* architecture, that's just physics :).
 
GPUs aren't really that bad at raytracing... what they're bad at is building the acceleration structure, but once that's there, they're actually pretty efficient. They get hurt on ray divergence, but so does *every* architecture, that's just physics :).

Thanks for the correction, Andy. I wondered if I should've mentioned the whole kdtree traversal issue... Would improved DB perf. help here, or will Real-Time Ray-Tracing (RT RT) on GPUs require a paradigm shift? Could a new algorithm solve the perf. issues? Or is it purely a hardware limitation?
 
Would improved DB perf. help here, or will Real-Time Ray-Tracing (RT RT) on GPUs require a paradigm shift?
Well certainly improved DB performance would "help", but we're at the point where it's only "slow" when it diverges... so it's just a case of losing SIMD coherence. From a hardware point of view, we seem to be settling on 4-16ish SIMD widths, so it's probably just something that we're going to have to live with. Now this is where typically people start to think along the lines of "well this is the only reason rasterization is faster... because we've designed SIMD hardware for it", but I'm not totally convinced. Then again, I'd be happy for the hardware people to go ahead and prove me wrong by making a MIMD design with similar throughput to a SIMD one ;)

Could a new algorithm solve the perf. issues? Or is it purely a hardware limitation?
Well that's a hard question. Certainly any algorithms that recapture coherence in some set of rays would improve things (on all architectures), which is really where the only big improvements are going to come at this point. As far as tracing random, totally incoherent rays, I doubt we're going to do much better than we can now. The point being that the "general" ray tracing problem throws out all information that would allow you to make it faster... kd trees are pretty much optimal if you have *no* information about the rays.

That said, the interesting problem is to find good ways to group and organize large sets of ray queries so that they can be efficiently and coherently evaluated in parallel. Incidentally one of those ways is called rasterization ;)
 
Well certainly improved DB performance would "help", but we're at the point where it's only "slow" when it diverges... so it's just a case of losing SIMD coherence.

So is it a case of "too many branches" (for modern hardware) then?

From a hardware point of view, we seem to be settling on 4-16ish SIMD widths, so it's probably just something that we're going to have to live with. Now this is where typically people start to think along the lines of "well this is the only reason rasterization is faster... because we've designed SIMD hardware for it", but I'm not totally convinced. Then again, I'd be happy for the hardware people to go ahead and prove me wrong by making a MIMD design with similar throughput to a SIMD one ;)

I agree on the SIMD vs. MIMD issue, case-in-point: R6xx ALU utilization rates.

Well that's a hard question. Certainly any algorithms that recapture coherence in some set of rays would improve things (on all architectures), which is really where the only big improvements are going to come at this point. As far as tracing random, totally incoherent rays, I doubt we're going to do much better than we can now. The point being that the "general" ray tracing problem throws out all information that would allow you to make it faster... kd trees are pretty much optimal if you have *no* information about the rays.

That said, the interesting problem is to find good ways to group and organize large sets of ray queries so that they can be efficiently and coherently evaluated in parallel. Incidentally one of those ways is called rasterization ;)

Thanks for all the great info!
 
So anyone has an idea about how did they implement it? During the presentation they said something about voxels. I also read the article interviewing John Carmack in which he says something about the voxel octree datastructure.

link

There is a specific format I have done some research on that I am starting to ramp back up on for some proof of concept work for next generation technologies. It involves ray tracing into a sparse voxel octree which is essentially a geometric evolution of the mega-texture technologies that we’re doing today for uniquely texturing entire worlds
 
So is it a case of "too many branches" (for modern hardware) then?
Well more that just as soon as rays diverge in the data structure traversal (i.e. at triangle edges), the SIMD stuff effectively gets scalarized to the point where you wind up with 1/16 throughput. Still not terribly (and still generally faster than a lot of similarly-priced CPUs!), but definitely noticable. So ironically, you have trouble with scenes with lots of tiny triangles and high-frequency data structures... remind you of any very similar rendering algorithm that you know of? ;)

Thanks for all the great info!
No issues - certainly a lot of this stuff will come to the forefront in the next few years since we're definitely at the point where it's quite possible to implement an efficient ray tracer on consumer graphics hardware. Again, we're not quite at the point of generating the data structures there (ironically the best data structure scatter/sort unit on GPUs is currently the rasterizer ;)), but that will improve with time too.
 
Forgive me for such an utter noob question, but I'm trying to conceptualize the inherent problems with raytracing on current hardware.

The way I understand it, there are two issues. First, you are trying to solve how a one dimensional object (the ray) interacts with a two dimensional object (a surface) within a three dimensional space (a scene) over a fourth dimension (time). Since you can't tell in advance which rays are going to be more important for the render (assuming a simple, unbiased renderer), you have to (essentially) build a database that describes every possible (within memory constraints) "bounce" a ray can take until it's no longer visible. If you're lucky, more than one ray from different sources will land on the same point on a surface at the same angle, allowing you to reuse that particular ray "path." However there's no guarantee that as any ray makes more "bounces" it will coincide with a previous ray, thus creating a problem with trying to reduce the number of calculations needed.

The second issue involves having dynamic objects within a scene. You can skip recalculating rays that didn't intersect with an object before it moved, and you can skip recalculating any rays that don't intersect with it in its current position. However, there's no guarantee that any new ray calculated for the moved object will coincide with any previously calculated ray path, again making it difficult to reduce the total number of calculations needed.

I know in the real world, there's all sorts of approximations and short cuts one can make to help reduce the number of calculations. But in general, is the above conceptualization correct, should it be expanded, or should I be riding the short bus?
 
Back
Top