Real-Time Ray Tracing : Holy Grail or Fools’ Errand? *Partial Reconstruction*

fascinating... I didn't read it all but I feel like posting something NOW :smile:


Bob said:
I have a great idea for optimizing ray-tracing: Instead of intersecting rays to triangles, why don't we intersect triangles to rays? If the rays can be arranged in a orthogonal grid of some sort, it becomes very easy to resolve the intersection test...

I think this comment from Bob is getting straight to the point... Along the same line:
http://www.sci.utah.edu/~wald/Publications/2006///Grid/download//grid_slides.ppt

Imo, what's important is whether the chosen algorithm is:
- bringing the observer to the scene (the "eyes" are querying where are the objects)
- or the scene to the observer (the objects are "seeking" the eyes).

Nowaday there's a strong match between:
- observer-to-scene <=> raycasting
- scene-to-observer <=> rasterizing

But I'd say it's just where the story went so far... There's absolutely nothing to prevent us doing:
- observer-to-scene with rasterizing: this is more or less what the coherent rays is all about (*).
- scene-to-observer with raycasting: well that's already done with ambiant occlusion technique where the objects are casting rays.

(*) Note that it requires a generalization of what it rasterizing: in the end it's all about coherency. scaning line after line is not such a big deal after all, what's important is that it's cache friendliness.

Btw, I believe that rasterizing (resp. raycasting) is more or less equivalent to ordered (resp. random) access iterators.
In-order access is important in some cases to maintain coherent memory access and make a good use of the cache. The thing is, as the paper show, random access can be turned into ordered access. It does so using extra queue and by introducing latencies... Does that sound familiar??




Arun said:
It's obvious that algorithmic innovations will happen (and have happened, as you pointed out) and that hardware will keep getting faster. The implicit disagreement we have is on the timeframes, and whether the 1% of a GPU's die size directly dedicated to rasterization will be eliminated to simplify its design. The former does matter but honestly, the latter simply doesn't. That's an implementation detail (whether you want to raytrace or rasterize primary rays once you have 10+ rays per pixel anyway) and it isn't worth making it the centerpiece of the debate, IMO.

Even if it is 1% of the die size, using rasterizing has a massive impact on the overall design, especially in terms of "thread creation": the way ot works now is that a triangle create new threads (ie: fragment-program).

My 1 penny bet is that removing the rasterizer (or making it optional, which has the same impact), by generalizing a lot of assumptions/constraints, could make it very hard for the hardware designer (imo, a GPU without rasterizer is a Cell processor, but that's slightly OT, granted!).



Allow me to conclude by rephrasing myself:
- raytracing vs rasterising is not as important as the choice between those 2 "dual" approaches: observer-to-scene and scene-to-observer
- raytracing/rasterising can be seen as two ways of iterating a container (the scene): order/random access
- The question remains: what would be a GPU if rasterizing was removed.


I apologize in advance in those stuff have been discuss already! And thanks for reading
 
I think this comment from Bob is getting straight to the point... Along the same line:
http://www.sci.utah.edu/~wald/Publications/2006///Grid/download//grid_slides.ppt
Heh, Bob was taking a little jab at the raytracing supporters there by making rasterization sound revolutionary and clever. We all got a kick out of it.

Allow me to conclude by rephrasing myself:
- raytracing vs rasterising is not as important as the choice between those 2 "dual" approaches: observer-to-scene and scene-to-observer
As far as the context of this thread and Intel's plans go, they are identical. To me, the debate is answered when you consider what realtime raytracing could offer over rasterization in 5 years:
A) High poly count
B) Order independent transparency
C) Correct shadows
D) Correct reflections

None of these are big issues today. Looking at how little die space we devote to triangle setup, we could do a few billion on-screen tris per second today if we wanted to, and with LOD you don't need more than that. Transparency is a bit of a headache in terms of sorting, but it's been mostly solved for 99.9% of the scenes we'll encounter. Shadows are getting better all the time, and by the time they're realtime with raytracing, we'll have great soft shadows from rasterization. Reflections don't need to be correct to appear correct.

The cons of raytracing are just too big. You need all your triangle data available at any time, and that is a far bigger hassle to realtime applications than any of the above advantages.

There's the academic goal of a near-perfect simulation of a camera in a virtual world, and there's the practical objective of being perceptibly close to reality that permeates all realtime 3D applications today. For the latter, rasterization is good enough.
- The question remains: what would be a GPU if rasterizing was removed.
It'll be a lot more than Cell, I'll tell you that. The texture filtering and high efficiency when doing arbitrary texture reads from memory is a huge differentiation point.
 
Heh, Bob was taking a little jab at the raytracing supporters there by making rasterization sound revolutionary and clever. We all got a kick out of it.

Well, I think he actually had a good point (whether he was joking about it or not) :oops:



It'll be a lot more than Cell, I'll tell you that. The texture filtering and high efficiency when doing arbitrary texture reads from memory is a huge differentiation point.

Ok for the filtering stuff (but that's easy to port to other paradigm provided that rays are not just zero-width line).


I also agree that on the current GPU, the texture and memory access are a huge differentiation point. What I am questionning is whether that can be kept in the die if the rasterizing constraint is removed.

As I said in the previous post, in spite of being just 1% of the die size, it has a massive influence on the overall design of the chip (basically, it's freezing the whole pipeline into a very well known path). For instance, it makes it easy to predict what memory is going to be fetch and local coherency is automatically insured as well.

One can still enfore the same property without rasterizing (ie: within a more general architecture) but I'm not so sure it's going to be simple enough to be handle by the hardware.



As far as the context of this thread and Intel's plans go, they are identical..

I agree and somehow I think it's a strategical mistake. What they want to sell is a new way of doing graphics (and to take some market's share from nvidia/ati). But they shouldn't have to pick ray-tracing really, that was far too easy to crush by the beyond3d gurus (for good reasons).

Well, at the same time, everybody is talking about it that might be a good choice in the end. Except that they'll need to show something actually working at some point...


In the longer run, they should bite the bullet and show off what a Larabee version of a rasterizer is. Of course that would be slower. But they could compensate that by using nicely the extra flexibility. An lot of application (not games maybe) would happily sacrifice some speed for better shadow/reflexion coming with a hybrid approach.

Off course that's far more complex to do (primarily because the comparison will be tough) but that's the real challenge imo.
 
A) High poly count
Until you get into high double digit visible polygons per pixel image order rendering really doesn't offer much of an advantage over traditional object order rendering for primary rays (same goes for shadow rays). Even then, there is no reason you couldn't construct an object order renderer which would hierarchically cull unsampled geometry too.

Image order rendering just doesn't make sense when you have rays diverging from a point origin.
 
Until you get into high double digit visible polygons per pixel image order rendering really doesn't offer much of an advantage over traditional object order rendering for primary rays (same goes for shadow rays).
Oh I agree. I think you misunderstood the purpose of my post. I was just going through all the reasons that raytracing preachers present as reasons to switch.
 
Looking at how little die space we devote to triangle setup, we could do a few billion on-screen tris per second today if we wanted to, and with LOD you don't need more than that.
Ummmmmm... a few billion tris per second? Only if we have more bandwidth than all the Gods combined.

Transparency is a bit of a headache in terms of sorting, but it's been mostly solved for 99.9% of the scenes we'll encounter.
Well, it's solved in the sense that sorting does work. That doesn't mean order-independent transparency has no value, or that sorting is inherently "good."

Reflections don't need to be correct to appear correct.
Oh yeah. It doesn't even need to be that correct in terms of source material. I laugh whenever I see a reflection in a cave is actually a shot of some random pub... at least you can find out which studios have people with a soft spot for Corsendonk... :smile:
 
Ummmmmm... a few billion tris per second? Only if we have more bandwidth than all the Gods combined.
Why? 50 bytes per vertex and 2:1 vertex:tri ratios in high poly scenes means the 50-100 GB/s today are enough. If we devoted more logic to setup, yeah, it's definately doable. Moreover there are many ways to compress vertices today too, but few people bother due to the effort not paying off. Right now, vertex limited parts of the scene usually have gobs of BW to spare.

Well, it's solved in the sense that sorting does work. That doesn't mean order-independent transparency has no value, or that sorting is inherently "good."
Basically, I'm just saying that it's not a big enough problem to warrant all the cons of switching to raytracing. Heck, loosely speaking you have to sort everything in raytracing.
Oh yeah. It doesn't even need to be that correct in terms of source material. I laugh whenever I see a reflection in a cave is actually a shot of some random pub... at least you can find out which studios have people with a soft spot for Corsendonk... :smile:
:D

One example in this thread (before the site crash) was pointing out car headlights. Honestly, who sees the headlights in GT5 as a weak point? They're fantastic. Car environment maps, which lack correct reflection of one car on another, could easily be hacked if people wanted it.
 
Why? 50 bytes per vertex and 2:1 vertex:tri ratios in high poly scenes means the 50-100 GB/s today are enough.
If you get it all available to vertices, which isn't easy when you've got pixel shading, and possibly even vertex texturing running at the same time. Also considering how big a post-transform cache might have to be to avoid reprocessing some verts when you've got enough geometry to push billions of verts per second (although unless you have a billion particles, 2:1 vertex:tri ratio probably accounts for that). Although 50 bytes per vertex isn't a bad number, 160 bytes per vertex isn't unheard of (a certain company I worked at used some huge vertex formats -- excluding the shadow/Z-only passes, the smallest vertices were 128 bytes a piece). Moreover, it's generally the case as of late that the number of vertex attributes we have available to us today (well, in the context of consoles here) isn't enough even when you do pack a bunch of terms together, and especially the number of interpolators you get out of the vertex shader is still only about half-3/4 of what would be "nice" (but then that's mainly because we all want to put a million things onto each pixel).

Basically, I'm just saying that it's not a big enough problem to warrant all the cons of switching to raytracing. Heck, loosely speaking you have to sort everything in raytracing.
Yeah, the loss of immediate mode rendering is a big one. The potential need for several independent memory channels is a big one for hardware designers. Though on another tack, I'd say we're kind of approaching those ever so slowly as it is.

One example in this thread (before the site crash) was pointing out car headlights. Honestly, who sees the headlights in GT5 as a weak point? They're fantastic.
Yeah, that one kind of made me wonder. Perhaps he was talking about the illumination patterns emitted by a car headlight (i.e. a caustics problem) as opposed to rendering the model of the headlight itself? If you wanted something truly dynamic and generic, okay, that demands some sort of bidirectional raytracing at least... Even so, for a racing game, you have a sufficiently limited scope of headlight projection that you can get by with a handful of projected textures.
 
Also considering how big a post-transform cache might have to be to avoid reprocessing some verts when you've got enough geometry to push billions of verts per second.
I don't see why the cache would need to increase, as increasing the number of vertices in a mesh generally doesn't increase the connectedness or similar. That's why the post-transform caches have remained comfortable in the "few dozen" range for quite a while, and the relevant mesh optimization papers show quickly diminishing returns beyond that level.

Although 50 bytes per vertex isn't a bad number, 160 bytes per vertex isn't unheard of (a certain company I worked at used some huge vertex formats -- excluding the shadow/Z-only passes, the smallest vertices were 128 bytes a piece).
Sure, but if you're passing so much per-vertex information it will almost certainly reduce the per-fragment load. Especially in the case of a "billion" polygons (=> many polygons per pixel), pixel shading becomes rather redundant.

Moreover, it's generally the case as of late that the number of vertex attributes we have available to us today (well, in the context of consoles here) isn't enough even when you do pack a bunch of terms together, and especially the number of interpolators you get out of the vertex shader is still only about half-3/4 of what would be "nice" (but then that's mainly because we all want to put a million things onto each pixel).
Dunno about consoles, but I believe the interpolator limit is doubling (16 -> 32) in D3D10.1 which should be plenty.

Plus as Mintmaster said, these sorts of things can easily be handled if we *wanted* to push that many polygons through the pipeline. I don't think bandwidth concerns are a real issue, as the amount of bandwidth that you need per-pixel is fairly constant (or at least easy to upper-bound for most reasonable shading), and so smaller triangles would just imply less pixel data, etc. Then again you'd expect LOD to take care of the case of having *too* many more triangles than pixels.
 
I don't see why the cache would need to increase, as increasing the number of vertices in a mesh generally doesn't increase the connectedness or similar.
Depends on what types of objects dominate your scene. Stuff that demands a lot of geometric density to look good or stuff that simply exists in large quantities.

Sure, but if you're passing so much per-vertex information it will almost certainly reduce the per-fragment load. Especially in the case of a "billion" polygons (=> many polygons per pixel), pixel shading becomes rather redundant.
What actually makes it sad is that a lot of the cases where I've seen the need for lots of per-vertex data is where people are doing it to overcome a lack of room elsewhere (e.g. not enough constant registers, for instance).

Dunno about consoles, but I believe the interpolator limit is doubling (16 -> 32) in D3D10.1 which should be plenty.
Wait a few weeks. :D

Plus as Mintmaster said, these sorts of things can easily be handled if we *wanted* to push that many polygons through the pipeline. I don't think bandwidth concerns are a real issue, as the amount of bandwidth that you need per-pixel is fairly constant (or at least easy to upper-bound for most reasonable shading), and so smaller triangles would just imply less pixel data, etc. Then again you'd expect LOD to take care of the case of having *too* many more triangles than pixels.
Certainly if you have some real need to move that many triangles in the first place, I'd say rasterization was never suitable for you anyway. Those are the types of things you don't *draw*... you sample them.

On a side note about arguments in favor of raytracing, I'd also add proper interpolation of quantities across surfaces. This is not something that rasterizers *can't do* per se, but that it is infinitely easy to run into cases that rasterizers as we know them fail miserably because they only rasterize triangles. Artists spend hours working around this when modeling, but you typically hit these in procedural geometry or cases where the modeled geometry can change shape radically at runtime (e.g. cloth).
 
Then again you'd expect LOD to take care of the case of having *too* many more triangles than pixels.
You'd be using it up close though ... the artifacts of LOD systems won't be covered up by distance anymore. IMO the LOD system in this case would have to be dynamic and image space adaptive. Which is a PITA.
 
You'd be using it up close though ... the artifacts of LOD systems won't be covered up by distance anymore. IMO the LOD system in this case would have to be dynamic and image space adaptive. Which is a PITA.
Well as long as you're doing LOD on sub-pixels (what matters is not distance - it's just the triangles/pixel ratio) I don't think "popping", etc. would be a huge issue. I'm not sure this necessarily implies dynamic/image-space adaptive algorithms. Surely it's not a simple problem for dynamic scenes with tons of moving polygons, but it's no more difficult than raytracing, and it's much more forgiving :)
 
Well as long as you're doing LOD on sub-pixels (what matters is not distance - it's just the triangles/pixel ratio)
I just meant that at the moment LOD doesn't usually change on objects which are the center of attention.

Not being adaptive might work out faster on average, but you will have lots of areas where you are going to have order of magnitude more triangles per pixel than necessary (assuming you try to keep them subpixel). That raises the specter of aliasing.
 
Not being adaptive might work out faster on average, but you will have lots of areas where you are going to have order of magnitude more triangles per pixel than necessary (assuming you try to keep them subpixel). That raises the specter of aliasing.
Oh certainly - I'm trying to say that it's a simple problem. But I think most people would agree that it's at least as easy as the similar problems in raytracing.

To put it simply, I see no compelling reason that raytracing helps with the scene complexity problem - indeed it often makes things more difficult since it is less tolerant of rough LOD solutions, etc. That's all I was trying to say :)
 
To put it simply, I see no compelling reason that raytracing helps with the scene complexity problem - indeed it often makes things more difficult since it is less tolerant of rough LOD solutions, etc. That's all I was trying to say :)
Usually, the only arguments made are the whole "logarithmic" argument and that you don't need to apply LOD reduction to make things work, but of course the problem with no LOD on billions of tris is aliasing and getting halfway decent sampling, but I'd say that's a problem for anybody. Another argument that people don't tend to make but probably should is that increases in scene complexity and scale thereof don't necessarily damage your ability to perform a lot of things as robustly as you might otherwise have been able to do, which is not something you can say of rasterization in general (e.g. shadows).

In any case, the barriers are huge, and the benefits to the transition are ones that we wouldn't see in realtime until very far down the line. Most all the immediate benefits that we'd see today are not really deniable, but largely superficial.
 
[...] but of course the problem with no LOD on billions of tris is aliasing and getting halfway decent sampling, but I'd say that's a problem for anybody.
Exactly. I consider the argument of "we don't need LOD to get logarithmic complexity when raytracing" to be equivalent to "we don't need filtering on textures". Yes you do. And once you have LOD, both raytracing and rasterization are logarithmic => dead heat here.

Another argument that people don't tend to make but probably should is that increases in scene complexity and scale thereof don't necessarily damage your ability to perform a lot of things as robustly as you might otherwise have been able to do, which is not something you can say of rasterization in general (e.g. shadows).
I'm not sure I understand what you're saying here. On one hand, shadow volumes do have trouble with complex scenes (and they're probably the most direct analog to ray cast shadows). On the other hand, shadow maps are pretty desirable IMHO even when raytracing, or else you run into significant aliasing problems. I'd say this is at least a another dead heat, if not a win for rasterization (easily the most efficient way to generate a shadow map).

Anyways I took another look at Samuli Laine's instant radiosity stuff today and he gets some very nice GI results (the best I've seen at that speed) in real-time using pure rasterization. Thus I'm even less convinced that rasterization is out of that race yet either.
 
I'm not sure I understand what you're saying here. On one hand, shadow volumes do have trouble with complex scenes (and they're probably the most direct analog to ray cast shadows). On the other hand, shadow maps are pretty desirable IMHO even when raytracing, or else you run into significant aliasing problems. I'd say this is at least a another dead heat, if not a win for rasterization (easily the most efficient way to generate a shadow map).
When densities get really high (e.g. the famed Sunflowers scene), the loss of information and undersampling you innately have when using shadow maps can create a ghastly mess of errors, aliasing, self-shadowing artifacts, all of which cannot be killed by the same bullet(s).

Anyways I took another look at Samuli Laine's instant radiosity stuff today and he gets some very nice GI results (the best I've seen at that speed) in real-time using pure rasterization. Thus I'm even less convinced that rasterization is out of that race yet either.
I'm more concerned with seeing these things scale up, though. Making an awesome-looking Cornell Box in realtime is not that big a deal to me. The scene complexity has quite a large impact on the number of virtual lights you need in order to get a decent result. Depending on what raytracing-based approach you use, and the types/depth of phenomena you want to simulate, the scene complexity's effect on the total number of secondary samples (not the cost of individual samples) is not as rapidly-growing, FWIW. Though there's no denying that your starting point of "decent number of samples" for raytracing is quite enormous.
 
When densities get really high (e.g. the famed Sunflowers scene), the loss of information and undersampling you innately have when using shadow maps can create a ghastly mess of errors, aliasing, self-shadowing artifacts, all of which cannot be killed by the same bullet(s).
With modern techniques like frustum/face partitioning and warping as well as some sort of nice filtering (PCF with proper filter regions, VSM, CSM, etc), and especially with shadow MSAA (VSM/CSM) it's not terribly difficult to get sub-camera-pixel accuracy with shadow maps, at which point you're already doing better than ray-traced shadows. This is particularly true if you're doing an "equal cost" comparison in which you could easily afford 4 or more 2048^2 shadow map partitions at the same cost as even the fastest GPU/Cell/CPU packet raytracer.

I'm sorry, but I need more to be convinced on the shadows front :)

I'm more concerned with seeing these things scale up, though. Making an awesome-looking Cornell Box in realtime is not that big a deal to me. The scene complexity has quite a large impact on the number of virtual lights you need in order to get a decent result.
I dunno, I thought the cathedral example was pretty compelling on that front and had no noticeable GI artifacts, even with only 4-8 shadow maps updated per frame. Again with an equal-cost comparison you could easily be doing literally thousands of lights for the cost of photon mapping or similar, and at that point I don't think you'd have too much trouble with scene complexity.

Please point me to a paper that discusses the relevant trade-offs here, but in my experience most things that in the scene that cause trouble for a instant radiosity-like approach is going to cause at least as much trouble for the alternatives (excepting maybe AO, if you consider that a GI technique).

Anyways I could certainly see some of the examples in that paper being relevant to games in the near future (the lighting inside the cathedral looked great, and I've played similar levels to that in games :D), and the technique seems reasonably scalable to me. If I had more time I'd code it up and play around with it some more, but I really wouldn't be surprised if something similar (deferred instant radiosity type approaches) gain traction in the next while. They're an elegant and fairly efficient way of recapturing some of the coherency that exists in a proper GI solution.
 
Back
Top