Papers from graphics hardware 2006

Gents, this seems to have enough common applicablity that I'm moving this thread to 3DTech and leaving a permenant redirect in Console to maximize visibility/participation.
 
http://www.sci.utah.edu/~wald/Publications/2006///Grid/download//grid.pdf

We present a new approach to interactive ray tracing of moderate-sized animated scenes based on traversing frustum-bounded packets of coherent rays through uniform grids. By incrementally computing the overlap of the frustum with a slice of grid cells, we accelerate grid traversal by more than a factor of 10, and achieve ray tracing performance competitive with the fastest known packet-based kd-tree ray tracers. The ability to efficiently rebuild the grid on every frame enables this performance even for fully dynamic scenes that typically challenge interactive ray tracing systems.
From:

http://www.sci.utah.edu/~wald/Publications/

Jawed
 
For the future of CELL<->GPU development (aka PS4), - would geometry shaders on the GPU be reduntant compared to a multiplicity of SPUs in a future iteration of CELL?

Just thinking aloud here, as the SPUs seem like very flexible GS'... well, by the time of PS4, GS' would also be more flexible, which is why I see a redundancy there...
Given enough SPUs VS and GS on the GPU might be redundant. The question is how much is enough considering the same can be said about vertex shaders on RSX being redundant.
 
I don't see what relevance D3D10 has. It's still a restricted programming model and there will still be datastructures and algorithms best built by traditional CPU, which still means there may be an opportunity for CPU<->GPU interconnect design to enable some innovation.
 
Given enough SPUs VS and GS on the GPU might be redundant. The question is how much is enough considering the same can be said about vertex shaders on RSX being redundant.
The answer is pretty simple: nothing is enough, never. VS on RSX being redundant? far from it.
 
Hey Matt, good to see you respond here :)


Am I the only one who just doesn't care about coherent/packet raytracing? We have a great way to do that... it's called rasterization. I'm certainly overstating this, but it seems to me that the people that are most excited with raytracing primary rays are those who are trying to compete with GPUs...

Indeed raytracing shows off one of the advantages of a Cell-like architecture and in particular, a high-speed interconnect: use the GPU rasterizer to shade primary rays (and arguably shadow rays too), and any shaders that shoot secondary rays can be handled on the SPUs. Indeed one could do only the intersection of secondary rays on the Cell and the shading on the GPU if the interconnect is fast enough.

Now of course either piece of hardware *can* do the whole process, but not efficiently. In particular traversing the kd-tree and doing intersections will neatly bottleneck even the most dynamic-branching-friendly GPUs (maybe G80 will help, we'll see) while shading primary rays badly hurts the otherwise stellar intersection performance of the Cell.

I totally agree with Matt in that there are some severe problems with the current GPU memory model going forward. One of the reasons why data structure traversal can so easily bottleneck a GPU is because we have no algorithmic control over data movement. Without local arrays and the ability to move *exactly* the data that we want, we blow a whole lot of bandwidth on incorrect assumptions that the GPU is making about graphics-like data access patterns. On the Cell we can arrange a data structure in a traversal-efficient manner and pull in a bit chunk of it to local storage and while we work on that, pull in another chunk, etc. To some extent data structures can be organized according to the GPU's cache structure, but that's both limitting and only partially helpful; it also doesn't work for every data structure.

Anyways there are many other examples - graphics and otherwise - that demonstrate the usefulness of having a high-speed interconnect between pieces of hardware that are efficient at different tasks. Maybe someone will eventually make the perfect processor that does everything well (although I doubt it), but even DX10 doesn't get there. It helps, but it's not there yet...

Cheers,
Andrew Lauritzen
 
Last edited by a moderator:
Kinda superfluous ... but I agree. Eventually raytracing will become useful for primary rays when the average polygon size is <<1pixel, before that it's a waste of time.
 
Eventually raytracing will become useful for primary rays when the average polygon size is <<1pixel, before that it's a waste of time.
Even in that case, there are other ways to rasterize (ex. REYES), that we seem to be trying to reinvent with all this "packet" stuff.
 
Even in that case, there are other ways to rasterize (ex. REYES), that we seem to be trying to reinvent with all this "packet" stuff.
REYES still transforms each triangle so it wouldn't work. You could forward map a hierarchical bounding volume representation of the geometry to the sample points (ie. rays) and perform hierarchical irrelevance culling, not raytracing but I wouldn't call that rasterization exactly either. It's most similar to splatting, but not really in the way most people would expect when they hear the term.

PS. I personally would like to see an implementation of such a forward mapping scheme, there's not a lot of uses at the moment ... but Id like to see it compared against raytracing on that Boeing model the Saarland guys are fond of using. My guess is that it would win by a landslide.
 
Last edited by a moderator:
REYES still transforms each triangle so it wouldn't work.
I'm not saying it doesn't, but why should REYES transform every triangle? Surely subdivision can be done in homogenous spaces and so only the final "projection" need be done.
 
In the situation Im talking about there is no subdivision at all. All the detail in the models is well below pixel size to begin with. It's a far future scenario (although as the Boeing model shows there are some cases where it can already happen today) where stochastic sampling is the only practical way to deal with aliasing. Unless we find a LOD geometry representation which doesn't suffer from rendering artifacts.
 
Last edited by a moderator:
All the detail in the models is well below pixel size to begin with.
Yes but that's just as much of a problem for a packet-based raytracer as a rasterizer... LOD is necessary in both cases - both to avoid aliasing, and to efficiently compute intersections. I'm still not convinced that coherent packet raytracing is of any real benefit in the long run.
 
No, LOD is an alternative to stochastic supersampling ... for the moment just not a very good one (stochastic supersampling looks good, LOD often produces artifacts).
I'm not sure what you're trying to say... my point is just that low coherence introduced by highly tesselated geometry relative to pixel sizes would introduce just as many problems to a packet-based raytracer as a rasterizer.

As kind of a side note, LOD can help a lot with aliasing, although it's certainly not a 100% solved problem, as you note.
 
can u explain the difference between programmable shading and programmable graphics.
From slide 13:
• Programmable shading
– Vertex and fragment shaders, texture composition, pattern generation, lighting models
• Programmable graphics
– Shaders implement graphics algorithms using complex data structures
 
Slide #27 or so quotes the GPU-to-graphics-memory number as currently around 30GB/s--the low 1GB/s number is the bandwidth people are seeing from GPU to main memory (and back). (And while PCI-E promises the potential of 4GB/s there, no one has seen anything near that in practice so far...) So it's that big bandwidth shortcoming that prevents the CPU from doing much other than just blindly sending stuff to the GPU on the PC today...



Long time listener, first time caller. :D Always happy to have an online discussion, especially when it starts with me being called full of hot air. :D

-matt

Your presentation is very interesting and I have some questions if you can answer.

Do you think the CELL or XeCPU are powerful enough to generate data structure like ambient occlusion disk trees for dynamic ambient occlusion or shadow map quadtree at interactive rate(in game)?

I imagine the data structure cannot be updated each frame for dynamic ambient occlusion or shadow map quadtree. Maybe, it is not for PS3/360 but for later.

After reading slides from your presentation I read the following documents:

Ambient occlusion disk trees (GPU Implementation)
http://www.cad.zju.edu.cn/home/ygong/publications/eg06.pdf

Light cuts (offline rendering, it is clearly not for realtime rendering):
http://www.graphics.cornell.edu/~bjw/papers.html

shadow maps quadtree(GPU Implementation):
http://graphics.idav.ucdavis.edu/research/glift/supplement/adativeShadowMapsOnGPU_glift.pdf
 
Last edited by a moderator:
is this only speculation or is it cuurrently applied?

Are GS and VS also considered fragment processors?
So what is better to use, SPUs or fragment processors?
 
Are GS and VS also considered fragment processors?

No. A fragment is a pixel and its associated data as it's being processed. That's why a pixel shader should really be called a fragment shader, but in DirectX terminology 'pixel' is used for both the existing pixel in the framebuffer and the one that's coming down the pipe. GS and VS processes primitives and vertices respectively and not fragments.
 
correct me i am wrong. As far as i understood, gfx processing in app. 4 years will be as follows:

The SPU-like cores in future CPUs will build data structures, eliminating the need for VS and GS, the fragment processor implements them, then, the CPU will process rays until divergence and then hand it over to the rasterizer (ray tracing), and the texture unit will do its job. Although what i said here doesnt come into the correct order, but this is what will happen, right?
 
Back
Top