D. Kirk and Prof. Slusallek discuss real-time raytracing

WaltC said:
What I'd like to see is a definition of "10-20x the floating point of the Opteron"...;) I'd really be interested in how that little number is reached.
Obviously he was looking at pure FP processing power (muls, adds, and the like).

What Kirk needs to do is to demonstrate a box without a cpu, powered exclusively by an nV40 gpu running Windows or Linux, and running something like Lightwave, and doing ray-trace rendering 10x-20x faster than the lowly Opteron. Heh...;) When Kirk can do that, I'll sit up and pay attention! Otherwise, it seems like PR gobbledegook to me.
Um. He never said it could run 10x-20x faster than an Opteron. He said that if you can't make it run twice as fast you're a bad programmer. He's obviously considering that when you attempt to adapt a GPU to a general algorithm that you lose efficiency. How much? That we'll have to see.
 
Chalnoth said:
Obviously he was looking at pure FP processing power (muls, adds, and the like).

Yes, nice of him to be specific as usual, in his banal generalities, isn't it?...;) My point was that I don't see much point in comparing general cpu fp operations to gpu-bound, fp color-precision operations done entirely within gpu pixel pipelines for the express purpose of rendering pixels. Secondly, since he isn't suggesting the nV fp gpu is capable of doing *anything* sans direct cpu system support, it seems like all he's doing is talking about using the gpu as a comparatively very inefficient co-processor. Much better, I would think, to simply multithread your ray-trace application and run it on two cpus simultaneously. I also think this would be a much cheaper solution all the way around.

Um. He never said it could run 10x-20x faster than an Opteron. He said that if you can't make it run twice as fast you're a bad programmer. He's obviously considering that when you attempt to adapt a GPU to a general algorithm that you lose efficiency. How much? That we'll have to see.

Hmmm....so according to Kirk, then, it takes 10-20x the fp processing power of an Opteron to run run your fp ray-tracing software 2x as fast? Heh. Does this mean nV fp gpus are 5x-10x less efficient than an Opteron per clock for this purpose?

Also, I'm not interested in Kirk's opinion of "bad programmers"--I'd like to see him produce a nVidia-programmed version of LightWave (or Maya, etc.) that will run those standard programs in ray-trace mode 2x as fast as the standard versions of those programs provide. A question for Kirk to ponder is whether he thinks the reason the Lightwave and Maya developers are somewhat slow to see the advantages to programming for their ray-trace renderers for nV3x/4x gpus might be because they don't agree with his PR assertions in that regard, and can think of much better, easier to implement co-processors--like multiple cpus, for instance? Just a thought...;)
 
The NV4x hasn't been out for very long. He did not tout the NV3x as being able to do raytracing efficiently. He touted the NV4x. I'm sure somebody will write a raytracing program for the NV4x, and so if we keep our eyes open, we'll be able to figure out how well it performs in this respect. But it may be a few months.
 
What Mr. Kirk is saying is that GPUs fp power is much higher than current highend CPUs.
At this time you can't program your GPU as a CPU, they have very different programming models, CPUs aren't stream processors.
The power is there but it's also difficult to come up with the good way to tap it, it's a kind of new field, at least for most of us.
Things will be better in the future..with a more flexible programming model and virtualized resources.

ciao,
Marco
 
Re: HW raytracing

morfiel said:
GeLeTo said:
Maybe an on-chip cache of the recently fetched BSP nodes and triangles may help, but this still will not be effective enough.
Obviously, you need a on-chip cache. Experiments have shown that for practical scenes a buffer of 4 KB of BSP-Node Cache plus 2 KB of Triangle-Cache plus 6 KB of Matrix-Cache are sufficient to reach cache-hit-rates of >99 %.

Ok, it's more clear now. My initial comments were based on the earlier paper (SaarCOR —A Hardware Architecture for Ray Tracing) where the "packet of rays" approach was used rather than the more generic on-chip cache. So I am more convinced now - the architecture should handle complex scenes reasonably well.

Still I think that there is litle use for a dedicated raytracing solution - the architecture should be used togather with a conventional rasterizer - most likely using pixel shaders to mimic the SAACOR functionality. I think what current graphic hardware lacks is:
1. Recursive shaders
2. Fast context switching

#1 could be implemented using modified f-buffers (where the shader can write more than one (or zero) entry in the buffer).
#2 can be done with DX-next's unlimited shader lengths, flow control and texture indexing.

One example of using the hybrid approach would be to perform a simplified global illumination calculations on a low-LOD instance of the scene for each pixel rendered by a traditional rasterizer - similar to the approximated GI that PDI did in Shrek 2
 
Evildeus said:
davepermen said:
but he lacks proof.
I don't know, you are surely right, but do you have proof yourself that it is the case? :?:

if kirk could show his gpu beats saarcor, he would show. he can't, so he don't.

i would be happy to see him proving. this would mean competition.

still. the gf6 would be sort of a p4-prescott for x86, and the saarcor would be the pentium-m.

smaller, cheaper, less transistors, all in all much more efficient.

and this, kirk can not beat.
 
davepermen said:
if kirk could show his gpu beats saarcor, he would show. he can't, so he don't.
BS. It takes time (and therefore money) to produce such a program, and neither Kirk nor nVidia is in the business of writing such software.

It couldn't have been shown yet even if Kirk wanted it to, and neither do I think nVidia really cares to produce robust raytracing software support.
 
Ehehe..who knows..AFAIK Nvidia Gelato is developed by the same people behind BMRT
 
Chalnoth said:
The NV4x hasn't been out for very long. He did not tout the NV3x as being able to do raytracing efficiently. He touted the NV4x. I'm sure somebody will write a raytracing program for the NV4x, and so if we keep our eyes open, we'll be able to figure out how well it performs in this respect. But it may be a few months.

i've done raytracing on the radeon 9700 pro i own. the card is definitely very very fast. but in programability, much too limited. it simply isn't usable to try any realowrld scenes, the complexity is hell, too much limits everywhere.

of course, the gf6 widens those limites massively, but still, it isn't capable of solving the problem in a generic, proper way, for arbitary massive datasets.

saarcor style solutions are. cpu solutions, too.
 
Chalnoth said:
davepermen said:
if kirk could show his gpu beats saarcor, he would show. he can't, so he don't.
BS. It takes time (and therefore money) to produce such a program, and neither Kirk nor nVidia is in the business of writing such software.

there is research in this direction as well. research gets sponsered by nvidia, ati, etc.

till now, there is no research showing the gf6 can handle it really.

and, believe me, that research is much much work.

the gpu's raw power definitely DOES show interesting statistics. but it shows as well too much limits. lack of integer, pointer arithmetic makes data-accesses _VERY_ difficult. lots of other limits exist (scatter/gather algos are BIG issues..).

i hope to get proven wrong. i love competition in the raytracing scene. till now, i haven't. thats all i state.
 
Re: HW raytracing

GeLeTo said:
Still I think that there is litle use for a dedicated raytracing solution - the architecture should be used togather with a conventional rasterizer - most likely using pixel shaders to mimic the SAACOR functionality. I think what current graphic hardware lacks is:
1. Recursive shaders
2. Fast context switching

#1 could be implemented using modified f-buffers (where the shader can write more than one (or zero) entry in the buffer).
#2 can be done with DX-next's unlimited shader lengths, flow control and texture indexing.
I haven't really looked at raytracing algorithms that much. What specifically do you mean by Recursive shaders?

And wouldn't #2 already be handled by the GeForce 6800?
 
raytracing is an inherently recursive algorithm, just google around for tutorials and watch the picts to get a general idea.

shader hw of today, and possibly tomorrow, is till now not designed for any recursion. a max call depth of 4 is defined, i think, and the function can not call itself eighter. (but i would need to read that up again).

the bigger problem for now is the issue with accessing non-textural data, with sorting, searching algorithms, and much more.


and there will one thing be true all the time: nothing is as perfect designed as saarcor => best performance per chip, energy, heat, resources, size, etc.
 
davepermen said:
Evildeus said:
davepermen said:
but he lacks proof.
I don't know, you are surely right, but do you have proof yourself that it is the case? :?:

if kirk could show his gpu beats saarcor, he would show. he can't, so he don't.

i would be happy to see him proving. this would mean competition.

still. the gf6 would be sort of a p4-prescott for x86, and the saarcor would be the pentium-m.

smaller, cheaper, less transistors, all in all much more efficient.

and this, kirk can not beat.
Well, that also means you can't prove your claims. And your logic is a bit flawed don't you think? :?
 
Re: HW raytracing

Chalnoth said:
And wouldn't #2 already be handled by the GeForce 6800?
Kind of. You can pack all pixel shaders into one and use conditional flow to determine which one to run (or beter use single do-it-all shader with many parameters if possible). But binding all textures may be more troublesome - the texture indexing fuctionality of DX-Next should be more appropriate (maybe PS3.0 can do the same thing with some hack - packing many textures into one or using a 3D texure as a stack of 2D textures, or if not many textures are used - just bind them all).

davepermen said:
shader hw of today, and possibly tomorrow, is till now not designed for any recursion. a max call depth of 4 is defined, i think, and the function can not call itself eighter. (but i would need to read that up again).
That is why I proposed the f-buffer like approach. Currenly a shader can store the parameters for ONE fragment in the f-buffer. What I propose is to extend this to be able to write the inputs for more than one (or zero) fragments. This should be enough to allow for much higher recursion depths.
[Edit] - and it should work as FILO, not FIFO which makes it a stack, so not exactly f-buffer like, but still easy to implement.

davepermen said:
and there will one thing be true all the time: nothing is as perfect designed as saarcor => best performance per chip, energy, heat, resources, size, etc.
This is similar to using fixed-function vs programmable shading. The fixed-function beats the programmability in "best performance per chip, energy, heat, size, etc." but it is not flexible enough and it wastes silicon if programmable logic is already avaiable.
Besides - have you seen some code for ray-tracing though a BSP? It's only a few lines - I bet the bottleneck will be mostly in the bandwidth, not in the shader. A GPU canl deal with the bandwidth and the caching of BSP data(in the texture cache) just as well as the SAACOR design.
And for the case that I mentioned in my previous post ( using a very simplified geometry for GI calculations ) the raytracing can be performed using a Solid BSP tree with arbitrary(not necesary axis-aligned like in SAACOR) planes - this way the ray-triangle tests are not necesary.
 
Re: HW raytracing

GeLeTo said:
This is similar to using fixed-function vs programmable shading. The fixed-function beats the programmability in "best performance per chip, energy, heat, size, etc." but it is not flexible enough and it wastes silicon if programmable logic is already avaiable.
hm. rastericing was never programable. only 2 parts of it got, recently. part of the t&l, and part of the per-pixel tasks. the rest is still fixed function. and while small in die size, it's a big bit in the logic size, a.k.a. the gpu is very inflexible. it can just rasterice, and what ever you wanna do, you have to rasterice it.

i'm talking about having a raytracing logic part here instead. the rest can still be shaders (and is designed to get that). it's just replacing the fixed rastericing logic with a fixed raytracing logic.


Besides - have you seen some code for ray-tracing though a BSP? It's only a few lines - I bet the bottleneck will be mostly in the bandwidth, not in the shader. A GPU canl deal with the bandwidth and the caching of BSP data(in the texture cache) just as well as the SAACOR design.
And for the case that I mentioned in my previous post ( using a very simplified geometry for GI calculations ) the raytracing can be performed using a Solid BSP tree with arbitrary(not necesary axis-aligned like in SAACOR) planes - this way the ray-triangle tests are not necesary.

about that..
 
We arent interested in recursive raytracing, nor models with vast amount of visible tris per pixels. We are interested in relatively low detail scenes (tris larger than a pixel) for which we want intersections for primary and shadow rays, and we want programmable shading ... with or without saarcor we dont have the power to go further than that at the moment.

For a couple of decades professional CGI developers have already decided that for those tasks the raytracer is not the best algorithm in software ... and it is not like raytracing is better suited to hardware than rasterization, so why would you expect it to change there?

Wether saarcor can beat a rasterizer at what it's good at is a non issue, can it beat it at something we are actually interested in seeing?
 
http://www.splutterfish.com/sf/gallery_index.php

http://graphics.ucsd.edu/~henrik/images/

the ability to scale to the perfect image. something, where raytracing scales by far much bether. wich is why people never got a full rendering solution working without at least partial raytracing, or restriction to certain scene-styles.

there is much raytracing involved with newer cgi. there is even much raytracing involved in current games. spherical harmonics, polypump, they all use raytracing.

algorithms fit much bether to raytracing, and thus fit more efficient with raytracing hw.

there are a lot of people denying this fact. definitely not artists. most are marketing parts, to make people believe in rastericing, and, that gpu's one day will get shrek, lotr, and all this realtime.

we ARE interested in the recursive algorithm, we ARE interested in implementing it the real way. only THEN we can get to a hw that is theoretically able to render ANYTHING in reasonable time. of COURSE saarcor can not yet render brazil images realtime. but compare gf6 to the first gpus and imagine what processing power scale-up we could get on saarcor (and the technique is yet here, unlike for gpu's.. we can yet develop .13 and soon smaller). thus, the imagination does scale to brazil, it does scale over shrek, lotr, etc.

gpu's of today don't have this nice line anymore. they are yet now heaty, big, power hungry, always at its limits.
 
Back
Top