Real-Time Ray Tracing : Holy Grail or Fools’ Errand? *Partial Reconstruction*

REYES subdivision handles the silhouettes ... but it won't handle something like say gratings or aliasing caused by shading or displacement mapping (which is why it also used stochastic supersampling). To handle that sort of stuff without just throwing samples at it you need imposters (which will be hard to generate automatically when there is animation or tricky shaders are used).
Pixar has interesting LoD mothod for highly complex objects.
http://graphics.pixar.com/StochasticSimplification/paper.pdf
 
I'd rather see most games try to get their gamma correction situation in order. The feature at the top of my wishlist is to get 4x multisampling to an fp16 render target fast enough for a console.
That's my main bummer with consoles: I use my LCD TV as monitor, but I do see those jaggies at that close distance.
 
As for "one year later", really if you think about it for STATIC scenes both ray tracing (RT) and rasterization (RZ) can scale the same.

RT depends on off-line pre-build acceleration structures for finding ray intersections. Likewise you can easily build RZ acceleration structures for static geometry to pre-cull hidden geometry given a viewpoint/region. One obvious current example of this, http://www.insomniacgames.com/tech/articles/1107/occlusion.php.

For anything useful, ie highly dynamic, RT looses its scalability do to having to rebuild the acceleration structures for the ray traversal (a process which doesn't scale well). A few characters running around in a static indoor scene (quake) is not dynamic, try ray tracing an outdoor scene with trees where the wind is moving the branches and leaves.

This is a really important point to keep in mind in these discussions!

The theoretical O(log n) of ray intersection is all nice and fine[*], but if your geometry is animated (which tends to be desirable), then in general you need to process all of it to re-build an acceleration structure. And that processing takes at least... O(n) time. So as far as asymptotic run time, the O(n) dominates and the O(log n) is meaningless.

There has been a bunch of recent research in lazily building those acceleration structures, so you only do the work to build them in the areas of the scene that rays are flying through (see e.g. the Razor work, http://www-csl.csres.utexas.edu/gps/publications/razor_tog07/index.html). But there is no (asymptotic) free lunch--the parallels to acceleration structures and the same asymptotic behavior as rasterization remain. (But that stuff helps the constant factors a lot!)

In the end, the state of the art in either ray tracing or rasterization is going to tend to lead to a small-ish superset of the visible geometry being processed during rendering. As such, IMHO, there's not really an argument that one or the other has an advantage on the asymptotic scaling side...

-matt

(My opinions only, not those of Intel.)

[*] If you want to be really pedantic, I believe that the O(log n) has only been proven if you use O(n^5) memory to build an acceleration structure. Which is, uh, not feasible. The observed behavior tends toward O(log n) in practice, but I don't believe that it has been proven that the regular ray tracing acceleration structures (kd-trees, etc) deliver that guaranteed asymptotic performance.
 
REYES subdivision handles the silhouettes ... but it won't handle something like say gratings or aliasing caused by shading or displacement mapping (which is why it also used stochastic supersampling). To handle that sort of stuff without just throwing samples at it you need imposters (which will be hard to generate automatically when there is animation or tricky shaders are used).

Definitely agreed that there are big open problems to be solved with doing imposters well..

FWIW, though, I think that those problems are likely to be solved well in time due to the payoff from doing so--as scenes get more and more complex and as aliasing starts to become a problem, the alternative of using brute force--100s of shaded points per pixel or 100s of rays traced per pixel--is an incredibly wasteful way to solve the aliasing problem. If someone can come up with a better solution, then their game (or what have you) will be able to visually differentiate on scene complexity much better than the other ones, which leads to a pretty strong incentive.

The pixar paper on LOD that someone posted was a good example of this force in action--as they had scenes with 100s of different visible objects in a pixel, their rendering performance was killed--the fact that REYES decouples shading from visibility did them no good, since all of the visibility samples were hitting different objects, so each one essentially required a unique (very expensive) shading calculation. To be able to make their movie, they had to figure something out...

With similar forces acting on interactive graphics (the vision of the artists running up against the limitations of the state of the art), I am optimistic that these issues will in time be better and better addressed...

-matt

(My opinions only, not those of Intel.)
 
Well, one way or another, in the pipeline from artist conception to colored pixels, you somewhere pass the stage that tessellates the stuff. It doesn't really matter if that is done by the 3D art package or the rendering pipeline. And at some point it will become simpler to just render the initial curved (or tiny tessellated) surfaces instead of breaking those down into many different material parts and shader programs, to be reassembled (stacked) at the end. If only because it makes the whole pipeline a lot simpler and straightforward, and so much faster to do in hardware.

IMHO.
 
It doesn't really matter if that is done by the 3D art package or the rendering pipeline.

That's not true. A lot of real time rendering for games comes down to figuring out what you can precompute and moving work from the game's rendering loop to offline pre-processing tools. There may be good reasons to move tesselation from a pre-process to a runtime operation but the two options will have very different performance implications and trade-offs. Saying it 'doesn't really matter' glosses over a lot of important considerations that do really matter.
 
That's not true. A lot of real time rendering for games comes down to figuring out what you can precompute and moving work from the game's rendering loop to offline pre-processing tools. There may be good reasons to move tesselation from a pre-process to a runtime operation but the two options will have very different performance implications and trade-offs. Saying it 'doesn't really matter' glosses over a lot of important considerations that do really matter.
I know that. But consider this: most optimizations require lots of textures "stacked" (for various effects) on the geometry, to get the initial fine details back. And those textures are in many cases the limiting factor. You can't stack them endlessly.
 
For the moment we are stuck with setup rates more than an order of magnitude smaller than peak fillrate and pixel quads making small triangles inefficient to render though ... so the pixel shaders have to do the work (poorly).
 
I know that. But consider this: most optimizations require lots of textures "stacked" (for various effects) on the geometry, to get the initial fine details back. And those textures are in many cases the limiting factor. You can't stack them endlessly.

The only limit I'm aware of is the number of textures that can be bound at once and that limit is already pretty high for DX10 (you can have up to 128 shader resources bound simultaneously and with array textures an individual resource can contain many channels of data). At some point you'll run into performance concerns but generally textures have a lot of advantages when it comes to efficiently representing a signal over a surface vs. a mesh based representation, particularly with regards to filtering / LOD.

In many cases the tesselation schemes used in art packages have been driven at least as much by concerns of the benefits of the surface representation for intuitive modelling as by concerns of efficiency or suitability as a representation for rendering. That's one reason I'm skeptical of simply translating popular tesselation schemes used in art packages to run in game.
 
For the moment we are stuck with setup rates more than an order of magnitude smaller than peak fillrate and pixel quads making small triangles inefficient to render though ... so the pixel shaders have to do the work (poorly).
Yes, but that's with current hardware, that has a lot less geometry processing power than shading power. And half that (with unified GPUs) is draw rate, which can be fixed by having the GPU do more of the setup and geometry processing (next to simply transforming it), by also calculating the interactions. And not the GPGPU way (one or the other at any one time), but as a part of the render pipeline.

With a possible 2-6 million polygons in each scene and unified architectures, it becomes tempting to do away with the pixel shader stage, as long as you can process the material specifications in the geometry stage.

The end result is very roughly the same workload.
 
The only limit I'm aware of is the number of textures that can be bound at once and that limit is already pretty high for DX10 (you can have up to 128 shader resources bound simultaneously and with array textures an individual resource can contain many channels of data). At some point you'll run into performance concerns but generally textures have a lot of advantages when it comes to efficiently representing a signal over a surface vs. a mesh based representation, particularly with regards to filtering / LOD.

In many cases the tesselation schemes used in art packages have been driven at least as much by concerns of the benefits of the surface representation for intuitive modelling as by concerns of efficiency or suitability as a representation for rendering. That's one reason I'm skeptical of simply translating popular tesselation schemes used in art packages to run in game.
Well, the simplest limitation is the amount of available memory: take a previous-gen console game, double the texture sizes, quadruple the amount of textures by size (detail, specular, bump-mapping and shadow) and you just increased the memory needed sixteen times.

And of course, you want more textures as well, to distinguish the objects from their close relatives.

So, while you might be able, theoretically, to stack 128 of them, you won't be able to do so in a real game.

And while we all like materials that need less textures and can be computed instead, the simplest way to do that is decreasing the texture count needed by shrinking the sizes of the polygons, or calculate hybrid materials on the spot by using higher order surfaces. Both are roughly equivalent.
 
Well, the simplest limitation is the amount of available memory: take a previous-gen console game, double the texture sizes, quadruple the amount of textures by size (detail, specular, bump-mapping and shadow) and you just increased the memory needed sixteen times.
Well, obviously more textures uses more memory but extra geometry isn't free either. You can visually approximate a very high poly model pretty well with a model with a couple of orders of magnitude fewer vertices and a normal map. If you want other shader parameters that vary across the surface (spec, gloss, ambient occlusion) then you can represent them using different resolution textures according to the frequency of the signal (amb occ may not need as high a res texture as your normal map for example). It's difficult to vary signal sampling density if you're storing everything per vertex.

Given that we have pretty decent texture compression support in hardware the overall memory usage for a given visual quality level is likely to be better using textures to represent surface varying shader parameters than using geometry. Add to that the filtering benefits of textures and I'm not convinced very high res geometry is the best representation for in game models.
 
With a possible 2-6 million polygons in each scene and unified architectures, it becomes tempting to do away with the pixel shader stage, as long as you can process the material specifications in the geometry stage.

Here is some more food for though,

... when the polygon and pixel converge ...

Some day perhaps. Would probably need some type of full hierarchical representation of all geometry to handle LOD issues. Once you get everything into a hierarchical representation where final output primitives are very small and similar in size, handling LOD and overdraw when rendering (even in dynamic scenes) becomes a problem which is solvable in a way which scales with framebuffer size and is very temporally coherent (I know this because I am doing this right now, just not at the pixel size). Where you prune, the hierarchical tree of each dynamic object stays very consistent between frames.

The advantage in rendering over ray-tracing even using the above concept is that for fully dynamic scenes, the hierarchy of dynamic objects need not be redone, in fact you can allow primitives even to loosely violate their parent bounding volumes (when animating the geometry). No update needed of the hierarchical acceleration structures for dynamic scenes when rendering.

In ray tracing you are doing a logarithmic search per pixel for static scenes, and in the above rendering method you basically have a logarithmic overdraw per pixel for dynamic scenes, both scaling by framebuffer size.
 
in fact you can allow primitives even to loosely violate their parent bounding volumes
Why be non conservative and run the risk noticeable pop in because of occlusion culling? Especially with a fine grained hierarchy the benefits don't seem worth the risk to me.
 
Such as adaptive frameless rendering, http://www.cs.virginia.edu/~luebke/publications/pdf/afr.egsr.pdf .... it is brutally obvious that during animation there is tremendous temporal coherence, simply compare the compression ratios of motion-jpeg (spatial only) to mpeg (spatial+temporal).
The problem with this is that it usually only helps framerate where you don't need it. Great for offline rendering where all you care about is total render time, but not for realtime graphics where you mandate, say, P(frametime > 33 ms) < 1%.

I think Intel should focus on incoherent raytracing so that we can do approximate GI. If you had a very low poly appromation of the scene, you could calculate some decent, fully dynamic secondary illumination approximation for each object and then maintain immediate mode rendering for the high detail rendering. Rasterization will always be cheaper for coherent rays, and raytracing there has neglible visual benefit.

So basically I want them to do a complete 180. Forget high poly, coherent raytracing and go for low poly, incoherent raytracing. That what GPUs hate, and that's where we have the biggest need for smarter computation in realtime graphics.
 
Why be non conservative and run the risk noticeable pop in because of occlusion culling? Especially with a fine grained hierarchy the benefits don't seem worth the risk to me.

Who said anything about exact occlusion culling ;) , you only need exact occlusion during moments when the dynamic motion slows such that you can visually make out the detail. Little gaps during motion can be easily filled with motion blur.
 
Here is some more food for though,

... when the polygon and pixel converge ...

Some day perhaps. Would probably need some type of full hierarchical representation of all geometry to handle LOD issues. Once you get everything into a hierarchical representation where final output primitives are very small and similar in size, handling LOD and overdraw when rendering (even in dynamic scenes) becomes a problem which is solvable in a way which scales with framebuffer size and is very temporally coherent (I know this because I am doing this right now, just not at the pixel size). Where you prune, the hierarchical tree of each dynamic object stays very consistent between frames.
Even more: when your input is complex surfaces, you only have to tessellate what is actual visible, front to back. So, you don't need different LOD geometry.
 
The problem with this is that it usually only helps framerate where you don't need it. Great for offline rendering where all you care about is total render time, but not for realtime graphics where you mandate, say, P(frametime > 33 ms) < 1%.

I think Intel should focus on incoherent raytracing so that we can do approximate GI. If you had a very low poly appromation of the scene, you could calculate some decent, fully dynamic secondary illumination approximation for each object and then maintain immediate mode rendering for the high detail rendering. Rasterization will always be cheaper for coherent rays, and raytracing there has neglible visual benefit.

So basically I want them to do a complete 180. Forget high poly, coherent raytracing and go for low poly, incoherent raytracing. That what GPUs hate, and that's where we have the biggest need for smarter computation in realtime graphics.
Sounds like the ticket for complex surfaces. ;)
 
I think Intel should focus on incoherent raytracing so that we can do approximate GI. If you had a very low poly appromation of the scene, you could calculate some decent, fully dynamic secondary illumination approximation for each object and then maintain immediate mode rendering for the high detail rendering. Rasterization will always be cheaper for coherent rays, and raytracing there has neglible visual benefit.

So basically I want them to do a complete 180. Forget high poly, coherent raytracing and go for low poly, incoherent raytracing. That what GPUs hate, and that's where we have the biggest need for smarter computation in realtime graphics.

Some more food for though, perhaps I can put the last nail in ray tracing's coffin.

If you have a low poly approximation (say for LOD) you can render a cubemap and do your GI from that. No need for classic ray tracing. You can even search in the cubemap for accurate reflection/refraction if you have a cubemap Z buffer. GPU GEMS 3 book had a good example of this. I forget if the chapter talked about using cubemap arrays to get reflections off occluded geometry.

Basically the concept is to ray trace from within a fragment shader into a cubemap. Personally I think a cubemap mipmap would be better, each layer would only have half the detail, but it would keep the search cost in check and reflections from occluded geometry would naturally become more diffuse based on distance... Another choice is to use the un-layered cubemap, and use the mipmap for getting more diffuse reflections (set LOD in the texture lookup).

Now combine this with something similar to what Jawed posted, smart re-projection of the previous frame's results (so you don't have to redo the entire ray trace each frame) and bang, you will have interactive GI while rendering.

The concept is that you re-use the previous frames cubemap ray intersection position (through re-projection) and just search from that point. Perfect way to amortize the ray intersection cost across multiple frames. Ray intersections would converge over multiple frames (and the GI would also), but the convergence artifacts will be hidden in any motion (especially if you are doing some proper motion blur).

I'd go as far as saying you can skip most the traditional fragment lighting using this technique.

In fact this technique can be done on current hardware right now!

Lots of current research is converging on this method, so even if I didn't post it here, you'll probably see a paper on it soon. Also I'm guessing that you might see something similar to this technique in a finished title sometime in 2008 (assuming I get it done :oops:).
 
Back
Top