I don't know, you are surely right, but do you have proof yourself that it is the case?davepermen said:but he lacks proof.
I don't know, you are surely right, but do you have proof yourself that it is the case?davepermen said:but he lacks proof.
Obviously he was looking at pure FP processing power (muls, adds, and the like).WaltC said:What I'd like to see is a definition of "10-20x the floating point of the Opteron"... I'd really be interested in how that little number is reached.
Um. He never said it could run 10x-20x faster than an Opteron. He said that if you can't make it run twice as fast you're a bad programmer. He's obviously considering that when you attempt to adapt a GPU to a general algorithm that you lose efficiency. How much? That we'll have to see.What Kirk needs to do is to demonstrate a box without a cpu, powered exclusively by an nV40 gpu running Windows or Linux, and running something like Lightwave, and doing ray-trace rendering 10x-20x faster than the lowly Opteron. Heh... When Kirk can do that, I'll sit up and pay attention! Otherwise, it seems like PR gobbledegook to me.
Chalnoth said:Obviously he was looking at pure FP processing power (muls, adds, and the like).
Um. He never said it could run 10x-20x faster than an Opteron. He said that if you can't make it run twice as fast you're a bad programmer. He's obviously considering that when you attempt to adapt a GPU to a general algorithm that you lose efficiency. How much? That we'll have to see.
morfiel said:Obviously, you need a on-chip cache. Experiments have shown that for practical scenes a buffer of 4 KB of BSP-Node Cache plus 2 KB of Triangle-Cache plus 6 KB of Matrix-Cache are sufficient to reach cache-hit-rates of >99 %.GeLeTo said:Maybe an on-chip cache of the recently fetched BSP nodes and triangles may help, but this still will not be effective enough.
Evildeus said:I don't know, you are surely right, but do you have proof yourself that it is the case?davepermen said:but he lacks proof.
BS. It takes time (and therefore money) to produce such a program, and neither Kirk nor nVidia is in the business of writing such software.davepermen said:if kirk could show his gpu beats saarcor, he would show. he can't, so he don't.
Chalnoth said:The NV4x hasn't been out for very long. He did not tout the NV3x as being able to do raytracing efficiently. He touted the NV4x. I'm sure somebody will write a raytracing program for the NV4x, and so if we keep our eyes open, we'll be able to figure out how well it performs in this respect. But it may be a few months.
Chalnoth said:BS. It takes time (and therefore money) to produce such a program, and neither Kirk nor nVidia is in the business of writing such software.davepermen said:if kirk could show his gpu beats saarcor, he would show. he can't, so he don't.
I haven't really looked at raytracing algorithms that much. What specifically do you mean by Recursive shaders?GeLeTo said:Still I think that there is litle use for a dedicated raytracing solution - the architecture should be used togather with a conventional rasterizer - most likely using pixel shaders to mimic the SAACOR functionality. I think what current graphic hardware lacks is:
1. Recursive shaders
2. Fast context switching
#1 could be implemented using modified f-buffers (where the shader can write more than one (or zero) entry in the buffer).
#2 can be done with DX-next's unlimited shader lengths, flow control and texture indexing.
Well, that also means you can't prove your claims. And your logic is a bit flawed don't you think? :?davepermen said:Evildeus said:I don't know, you are surely right, but do you have proof yourself that it is the case?davepermen said:but he lacks proof.
if kirk could show his gpu beats saarcor, he would show. he can't, so he don't.
i would be happy to see him proving. this would mean competition.
still. the gf6 would be sort of a p4-prescott for x86, and the saarcor would be the pentium-m.
smaller, cheaper, less transistors, all in all much more efficient.
and this, kirk can not beat.
Kind of. You can pack all pixel shaders into one and use conditional flow to determine which one to run (or beter use single do-it-all shader with many parameters if possible). But binding all textures may be more troublesome - the texture indexing fuctionality of DX-Next should be more appropriate (maybe PS3.0 can do the same thing with some hack - packing many textures into one or using a 3D texure as a stack of 2D textures, or if not many textures are used - just bind them all).Chalnoth said:And wouldn't #2 already be handled by the GeForce 6800?
That is why I proposed the f-buffer like approach. Currenly a shader can store the parameters for ONE fragment in the f-buffer. What I propose is to extend this to be able to write the inputs for more than one (or zero) fragments. This should be enough to allow for much higher recursion depths.davepermen said:shader hw of today, and possibly tomorrow, is till now not designed for any recursion. a max call depth of 4 is defined, i think, and the function can not call itself eighter. (but i would need to read that up again).
This is similar to using fixed-function vs programmable shading. The fixed-function beats the programmability in "best performance per chip, energy, heat, size, etc." but it is not flexible enough and it wastes silicon if programmable logic is already avaiable.davepermen said:and there will one thing be true all the time: nothing is as perfect designed as saarcor => best performance per chip, energy, heat, resources, size, etc.
Evildeus said:Well, that also means you can't prove your claims. And your logic is a bit flawed don't you think? :?
hm. rastericing was never programable. only 2 parts of it got, recently. part of the t&l, and part of the per-pixel tasks. the rest is still fixed function. and while small in die size, it's a big bit in the logic size, a.k.a. the gpu is very inflexible. it can just rasterice, and what ever you wanna do, you have to rasterice it.GeLeTo said:This is similar to using fixed-function vs programmable shading. The fixed-function beats the programmability in "best performance per chip, energy, heat, size, etc." but it is not flexible enough and it wastes silicon if programmable logic is already avaiable.
Besides - have you seen some code for ray-tracing though a BSP? It's only a few lines - I bet the bottleneck will be mostly in the bandwidth, not in the shader. A GPU canl deal with the bandwidth and the caching of BSP data(in the texture cache) just as well as the SAACOR design.
And for the case that I mentioned in my previous post ( using a very simplified geometry for GI calculations ) the raytracing can be performed using a Solid BSP tree with arbitrary(not necesary axis-aligned like in SAACOR) planes - this way the ray-triangle tests are not necesary.
I know what can be done, that doesn't tell me how it compares to a 6800.davepermen said:Evildeus said:Well, that also means you can't prove your claims. And your logic is a bit flawed don't you think? :?
uhm? http://www.openrt.de/ -> http://www.saarcor.de/ -> http://graphics.cs.uni-sb.de/~jofis/SaarCOR/DynRT/DynRT.html
and much more? i don't have to prove saarcor (nor intrace www.intrace.com). they can do that themselfes.