Re: HW raytracing
GeLeTo said:
1. You need an axis-aligned BSP of the whole scene. When something changes you have to rebuld the affected BSP nodes. This is very slow (especialy when the changed geometry spans a root node) and can not be done efficiently in hardware.
That's not fully true. While the hardware design presented is not compatible with arbitrary
changing geometry, it can handle hierarchical animation. This is done by subdividing the
scene into objects (much like GL display lists) and then instantiating the objects. A separate
BSP's is built for each object and then a top-level BSP is build ober the bounding boxes
of all objects. As this top-level BSP can easily be rebuilt for reasonable numbers of objects
(say a few thousands) per frame, you can do animations with it. Together with keyframe
animation you still can't do anything rasterisation can but enough to implement, say Quake.
GeLeTo said:
3. I don't see how this architecture can be less complex than the classic hardware implementation - they replace the very simple triangle rasterization+depth compare units with a bunch of paralel raytracing units that perform BSP traversal and triangle intersections.
Ok, that's a "why"-question which usually are a little difficult to answer. I'll try anyway. In
computer graphics, there is a term which is called - output-sensitive. That means that you
only calculate what you see. Rasterisation is not output-sensitive only in a limited, Z-Buffer
related way. To improve this, you build a scenegraph layer into your application. This leads to
the situation that you have your visiblity-information split: in a highlevel part, like a BSP thats
in the application, and a per-pixel part that's on your graphcs card. So far no problem. But
now you want to do effects. Due to the non-recursive pipeline-approach of rasterisation,
you have to apply multi-pass rendering (if your scene needs a reflection, you have to render
the reflection map in advance). THE problem is that the control-flow for this multipass
rendering comes from your application, which only has limited visiblity information. In other
words: at the point in time when you have to decide if you want to calculate a reflection map,
and at what resolution, you don't know yet if it's gonna be visible in the end.
This results in huge overhead and very high memory bandwidths. Furthermore, the design
can't be parallelized well in most stages of the pipeline (shading is an exception, though) , so
high throughput is a must. Both together leas to challanging chip design.
In RayTracing hardware, the control flow lies fully in the card, which is the right way to do it,
because only here all visiblitly informations are available. The draw-back is that you don't get a
real pipeline but more like loop-thing, where a shader can generate new rays and then throw
it backwards into the tracing core.
But you gain the fact that everything is parallelizable at will. Some raytracing pioneer (I don't
remember his name) once said: it is embarassingly parallel. So you don't even try to build a
fast intersector or fast shading unit, but you keep it simple, simple and simple. Now you have
a slow intersector, and a slow shading unit, but they are tiny. And then you pack _lots_ of them onto a chip. This isn't a solition for general purpose hardware, as you usually got lot's of dependencies, so most programms don't run twice as fast on twice as many CPU's, but raytracing does up to the point where you have one CPU per pixel on the screen. That gives
a very simple and highly efficient chip: lots of identical, small and simple unit. You got twice the area on the die, you just pack on twice as manu units. That's scalablilty
GeLeTo said:
All other disadvantages of the standart raytracing still apply - lots of context switching, needs the whole scene in memory at once, etc.
Here you are right. But on the other hand Raytracing hardware needs by far less bandwidth
then rastersiation hardware. So you have to spend a few gig (1-2?) of ram for your graphics-card, but it's not as bad as it sounds, because you can take the cheapst RAM available instead of heading for SRAMS ans 256 bit busses as nvidia and ati
does.