g80/Cuda for raytracing ?

ebola

Newcomer
Anyone got any opinions on the applicability of Cuda to raytracing - e.g. as part of a global illumination solution such as photon-mapping.

How restrictive is it in the ability to traverse the sort of BVH or KD tree structures a ray-tracer requires: skimming over the specs, I was kind of under the impression that despite being able to calculate arbitrary adresses, reads from memory between multiple threads DO need to have decent coherence, not going through a cache as effectively as textures ?

How would a GPU compare to, say, an implementation on the Cell processor (paging in portions of a cache-friendly tree ordering - able to parse structures as complex as a cpu), raytracing on the Ageia (similar principle i imagine.. available in some form through the phys-x sdk??)
 
Anyone got any opinions on the applicability of Cuda to raytracing - e.g. as part of a global illumination solution such as photon-mapping.

How restrictive is it in the ability to traverse the sort of BVH or KD tree structures a ray-tracer requires: skimming over the specs, I was kind of under the impression that despite being able to calculate arbitrary adresses, reads from memory between multiple threads DO need to have decent coherence, not going through a cache as effectively as textures ?

How would a GPU compare to, say, an implementation on the Cell processor (paging in portions of a cache-friendly tree ordering - able to parse structures as complex as a cpu), raytracing on the Ageia (similar principle i imagine.. available in some form through the phys-x sdk??)

Are you thinking of the Raycasting in the PhysX SDK?
raycastingsq5.jpg


raycasting1on8.jpg
 
There is also an upcoming I3D paper about GPU raytracing using optimized kd-tree algorithms and static scenes. (I happen to be one of the authors, and the work is an evolution of the Foley et al. work). The short answer is that GPUs are faster than a single CPU, but they aren't great at raytracing because of divergence in execution between rays. As the execution traces diverge in the acceleration structures, you end up with a lot of SIMD execution stalls. GPUs also have to currently to a bunch of extra work because there isn't an effective way to do a stack, so it has to be emulated or worked around via algorithm modifications. Sadly, the G80's 16KB of global memory between the threads isn't very helpful as it's too small to really do a stack for the number of parallel execution contexts to run efficiently, however, there might be fruit here. We currently are talking ~19Mray/s on an X1900XTX (Conference room, shadow rays), and about the same on a G80 with DirectX and the current state of the drivers and shader compilers. Using simpiler scenes, we can execute at much faster rates, but those aren't realistic. (All the current published fast raytracing numbers also do not do anything but trivial shading, but GPUs obviously do well here...) With heavier tuning via CTM/CUDA, we might be able to squeeze out a little more, but unless we can regain ray coherence, it's difficult to do leaps and bounds better.

Cell is actually a raytracing monster, compared to other non-custom architectures, in certain situations. The Saarland folks (and others including Stanford) have Cell raytracers >60Mrays/s for primary rays. Multi-core CPUs are also showing great promise as people are showing >5Mrays/s per processor for comparable designs (i.e. no tricks that only really work for primary rays), and there is impressive work from Intel on really optimizing the heck out of raytracing on CPUs. My main concern about CPU implementations is their ability to shade fast. It's going to be interesting to see hybrid CPU/GPU implementations here...

In our I3D paper, we argue that what you would likely do on a GPU is rasterize the primary hits, and raytrace secondary effects such as shadows, reflections, and refractions depending on your ray/frame rate budget. We have a demonstration implementation in Direct3D (Brook + D3D) as well as CTM+GL that demonstrates this hybrid system, and it was running in the ATI booth at SIGGRAPH and shown during the CTM presentations. The paper should go up when finalized in late January and will be presented at I3D 2007 by Daniel Horn. For raytracing to really get faster on GPUs, we need a way to deal with the cost of instruction divergence, and more importantly perhaps, ways to really build complex data structures.

Regardless, we are still quite aways away from the projected 400Mray/s needed to approach the performance of current games. (I can't remember who stated this as a rough lower bound, but it was in the SIGGRAPH 2006 raytracing course.) We need a few more years of Moore's Law and a few algorithm tricks, mostly involving dynamic scenes, and things will start to get interesting. But, rasterization will also increase at or above Moore's Law as well and game developers will continue to find tricks and hacks to make things look right, or use partial raytracing solutions like POM.

(Sorry for the long post)
 
If your posts are as informative as the above, it's fine by me even if theyr'e much longer :thumbs up: ;)
 
Cell is actually a raytracing monster, compared to other non-custom architectures, in certain situations. The Saarland folks (and others including Stanford) have Cell raytracers >60Mrays/s for primary rays. Multi-core CPUs are also showing great promise as people are showing >5Mrays/s per processor for comparable designs (i.e. no tricks that only really work for primary rays), and there is impressive work from Intel on really optimizing the heck out of raytracing on CPUs. My main concern about CPU implementations is their ability to shade fast. It's going to be interesting to see hybrid CPU/GPU implementations here...

Nice to know. :cool:
 
@mhouston:

How about MLT and some nice random sampling? Any thoughts on the effectiveness?


My first guess, just judging from your initial comments RE raycasting/tracing on the GPU, was correct. I had predicted, over on the Indigo forums (Indigo is a free MLT/Monte Carlo tracer that I use extensively), when someone postulated that the then just-released 8800GTX could do things to make our current CPUs bite the dust. :)

And make sure you post here when that paper is published!
 
And just to toss is out, during the Summer Jawed posted a neat lecture from Cornell which emphasized sparse sampling for Global Illumination. On a single core of a Pentium D 2.8GHz they were able to render scenes using GI with ~ 500k-1.5M polygons in ~ 60-180 seconds. The technique used worked for hard and soft shadows, direct and indirect illumination, HDR, and very high quality anti-aliasing (which was stated to be very cheap). In the lecture it is noted that even as processing technologies get faster, the other major problem (if not larger) is memory access. The lecture lightly touches on speeding up raytracing (but notes others were lecturing on such) and GPUs. I think this reinforces that there are major gains to be made on the software end as well as the hardware. This can cut into the "days per frame" rendering... of course I think offline GI for movies will always be like that, because just as you can do it faster you just add more stuff.
 
I believe the smoke demo from the G80 launch as marketed as "raytraced".

heh people market allsorts of half truths :)
You could class 'virtual displacement mapping' as a form of raytracing.

Thanks for the answer.. Yeah i'd kind of thought the Cell would excell at this sort of thing, and have always had difficulty perceiving the GPU's handling advanced datastructures (though i've been impressed with whats been done)


Regardless, we are still quite aways away from the projected 400Mray/s needed to approach the performance of current games. (I can't remember who stated this as a rough lower bound, but it was in the SIGGRAPH 2006 raytracing course.)

The point of the question was more RT for preprocessed global illumination... but i'm also interested in the whole realtime RT thing as a curiosity. I dont think realtime RT would ever do the same job as rasterizers, but often think - seeing how far GPU's have come from the original openGl accelerators - what future realtime techniques would you see on something designed for RT. (cell=RT monster.. agiea like cell... shame agiea seems dead in the water :) but interesting that CUDA's come along )
 
heh people market allsorts of half truths :)
You could class 'virtual displacement mapping' as a form of raytracing.

Not so sure about the half truth...

As explained here by the creator:
The solver also tracks free surfaces (e.g., the interface between air and water). The water images below were generated using a GPU level set ray tracer, which runs alongside the solver in real time. The smoke images were similarly rendered via ray marching.

Wouldn't it be harder to rendering the smoke (and probably also the water) without ray-tracing?
 
It's ray-traced, but not in the traditional sense. I'd assume it goes something like:

Draw a rectangle for each face of the box
For each pixel on the face, you know where you're starting and what angle you're being viewed from. March through the 3d texture (that represents the fog/water) and calculate density (fog) or surface crossing (water).

So yes, it's ray traced, but it's a single ray shot through a single object with basically everything known. What's always been hard about GPU ray tracing is managing the scene graph and finding ray / object intersections. Once you know the intersection (like this) the rest is trivial.
 
Disable anisotropic filtering and it should work fine.

I did, and it mostly fixed it. But it still has some artifacts which disappear when I press "b" key.

However, I am surprised that antialiasing doesn't seem to work (I set it to "Enhance application setting" and 16xQ) and the picture is still ugly:
broken2.jpg


But there is something worse -- I believe I just hit a driver bug:

I got a BSOD on otherwise perfectly stable machine.

Message is:

Code:
SYSTEM_SERVICE_EXCEPTION

STOP 0x0000003B (0x00000000C0000005, 0xFFFFF97FFF4B17A2, 0xFFFFF97FFF484000)

nv4_disp.dll - Address 0xFFFFF97FFF4B17A2 base 0xFFFFF97FFF484000 datestamp 0x4575BCED

What I was doing? Folding@Home at 100% on one core and I started smoke simulation. At one point it stopped rendering and then the system completely hung after several seconds I got the above mentioned BSOD. Looks like some scheduling/threading problem in OpenGL driver to me.
 
I3D GPU k-D tree raytracing preprint here
Hehe, it's amazing how incomprehensable that 7 word sentence would be to the vast majority of the population :LOL: . Thanks for the link. I'm reading over the paper now but you should check the text on the website (unless you meant "eorts" and "xed-size stack"). Also the text for "Figure 1" is in a different order than the pictures.
 
Back
Top