Cell and interactive raytracing? in IEEE 2006 Synopsium Raytracing 18-20 Sept 2006

DudeMiester said:
Say you partitioned the scene into blocks, where each block is sized to contain roughly the same about of geometrical data. Each of these blocks of data is assigned to a processor. Then as you do your complex raytracing, you simply have the processor that's responsible for the space the ray is entering deal with that ray.

Besides the big jump in inter-node communication, you have the risk of having a hot region. That is, a region where a large amount of secondary rays pass through.

Think big glass sphere/lens with a focalpoint. Because you map your workload spatially onto nodes, the node that handles the focal point is essentially handling secondary rays for a large chunk of your primary ones and thus your big fat multi-core computing device is running with basically one core active.

Cheers
 
Last edited by a moderator:
My comment was simply directed at pointing at those pictures and saying "this is why RT is a dead-end for realtime rendering". Again, I recognise the probable effort/reward issues there, but I do think that they could do better...still might not be worth it of course :p

In terms of general usefulness, I don't know enough to pass judgement (and wasn't intending to suggest anything in that regard with the above post). The authors evidently fancy their chances, though. But since you raise the point about recursive shading - are the authors there referring to its suitability for the ray traversal model used, or the shading? Or are these tied at the hip? When I first read that, my (lay-man's) interpretation marked that as a concern for shading and not something general to the approach (they seemed to be linking that issue with the unsuitability of Cell for the shading portion of the algorithm, which in another system could perhaps be done on a GPU, but perhaps it's an issue with the pure ray tracing side too?).
 
Last edited by a moderator:
Gubbi said:
What makes RT interesting is the (almost) correct refraction and reflection effects that can be obtained. This however requires recursive processing of secondary rays, a task that doesn't agree well with the ray-packet-based approach the authors use in the paper to achieve the speed in the first place (they admit this themselves on page 8).

IMO, this reduces the paper to the "interesting, but not useful" category (same as GPU raytracing).
Two areas raytracing could be benficial. 1) To do things other than image creation. You have per pixel accuracy on every aspect of the data that you could perhaps combine with rasterizing in useful ways. 2) HOS - HOS that fit into SPE's LS would be vastly more efficient due to reduced memory accesses. You could raytrace a snooker table with incredible efficiency using HOS versus having large triangle meshes for the balls. In fact, you could probably describe a snooker table in a way that fits entirely in the LS, and raytrace the scene without ever going into RAM except to write the pixels, including true reflections.
 
Titanio said:
But since you raise the point about recursive shading - are the authors there referring to its suitability for the ray traversal model used, or the shading? Or are these tied at the hip? When I first read that, my (lay-man's) interpretation placed marked that as a concern for shading which the authors ideally would be doing elsewhere.

You could resolve the entire ray traversal first, then shade the resulting tree, bottom up, to calculate a fragment for the primary ray.

But resolving secondary rays with their packet-based approach is really not feasible. Better to use a two pass approach. Packet-based for primary rays, then resolve secondary rays with a regular recursive approach. Second pass being potentially vastly more costly than the first.

Cheers
 
Titanio said:
My comment was simply directed at pointing at those pictures and saying "this is why RT is a dead-end for realtime rendering". Again, I recognise the probable effort/reward issues there, but I do think that they could do better.
What is RT going to add? If you find a raytraced image that looks as good as a rasterized game that you'd want to play, you'll find it takes a darned sight longer to produce than realtime. RT adds realism by massive processing (and memory) cost. Of course this paper is likely less to do withvideo games are more CGI workstation designs.
 
Shifty Geezer said:
What is RT going to add? If you find a raytraced image that looks as good as a rasterized game that you'd want to play, you'll find it takes a darned sight longer to produce than realtime. RT adds realism by massive processing (and memory) cost. Of course this paper is likely less to do withvideo games are more CGI workstation designs.

I think you're missing my point. As I've said a number of times now, I don't disagree regarding the effort/reward ratio involved, I just don't think these pictures are the best that could/can be done. Better to use the best to make that point (even if it would no doubt still hold), that is all.

Gubbi - I assume you're referring to reflect/refract rays as being secondary, and everything else primary..? Thanks for the reply, btw.
 
I read on the Saarbrücken site raytracing can be used in games not for rendering, but for better collision detection/physics by using scene geometry itself.
Does anybody know if this could be possible in an usable way, or would this be slow as well ?
 
Shifty Geezer said:
You could raytrace a snooker table with incredible efficiency using HOS versus having large triangle meshes for the balls.
HOS is great for snooker tables, it's all pretty much just spheres and blocks of various shapes and sizes. How do you describe an old gnarled tree with HOS tho?

Most anything organic-looking easily becomes blasted difficult to model with HOS, which is probably the big reason for the continued popularity of the good ol' polygon...
 
As I was thinking about what you said, I can to the realisation that I had confused photon mapping and ray tracing in my crazy mind. However, then I came to further see that ray tracing is just a specialized type of photon mapping, where instead of scattering photons, you scatter observation points. Thus, using the same algorthim for photon mapping, which is basically what I have here, you can scatter your observation points, taking care to maintain a tree of their dependancies (i.e. from this observation point, these child points were generated). Then at the end of the frame you simply walk the tree, gathering the observations to create your final image. Thus, you have the duality of scatter and gather.

I further concluded that the reason rasterisation is faster is that it doesn't require the scatter step, instead directly gathers the contributions from the geometry itself. Of course, that is less dynamic then if you include the scatter stage. Plus, if you are doing photon mapping for GI, which I would assume is the case since we are talking about very advanced lighting, you can scatter your observers alongside the photons, leaving only the gather stage and making it conceptually just as fast as rasterisation.

Now, as for the issues with the algorithm, the idea isn't to have nodes absolutely locked onto a region, but to give them an awareness of the overall flow of points (be they photon or observers) and exploit the efficencies of such knowledge. If you know a certain volume of points tends to flow through a certain region of space, you can make a proportional number of processors responsible for that space. This keeps the data associated with a region of space in cache, saving system bandwidth, at the expense of inter-node bandwidth, which is much cheaper assuming all the nodes are on one chip. Of course, to get this information you have to be running the simulation, so you have to make the hardware able to respond to the changes in flow as they occur. This is similar to the efficencies of the whole unified shader thing, and where the PPE in the CELL would probably come into play as a manager.

Looking at the converse, when a ray goes in an odd direction, diverging from the main flow and entering an area with little activity, this can be solved with a bit of caching in a central manager (the PPE). You simply store the ray in a list associated with its region, waiting for a sufficent volume to build up to warrent assigning a processing node. Collectively, these lists should be small enough to remain in the L2, or at least mostly in it. At the end of the main body of processing any remaining points can be dealt with if there is enough time, or ignored entirely. Ignoring it shoud be a fairly safe thing to do, because regardless if there was or wasn't a major flow to the area and its just a straggler, its small contribution should be of little consequence. It's a sort of built-in importance sampling.

As I said before, the observer points would stream out a tree of their dependancies into main memory. However, this data is not used until after the scatter operation, so any latancy is a non-issue.

Don't forget, there is a whole host of other optimisations you can do when there are constraints on the observer/light's movement. You could exploit temporal coherency by recording the regions each point passes through, steamed out alongside the observer tree, only reworking points that passed through disturbed regions. This would be useful, for example, with a fixed camera or light, and even if it rotates you could probably still reuse some of the data. You can precalculate an inital state for these sorts of objects as well, to shorten loading times. I'm sure there are many more methods.
 
Guden Oden said:
HOS is great for snooker tables, it's all pretty much just spheres and blocks of various shapes and sizes. How do you describe an old gnarled tree with HOS tho?
A relatively simple contorted SDS cylinder, perhaps 64 vertices, and displacement mapped for the detail. Probably one of those for each of the major limbs, with the rest of the sticky-twiggy bits being texture planes. Though displacement mapping adds a texture dependency, which loses much of the RAM advantage, and basically kills the advantage!

It's not an ideal solution, but even in realtime purposes there's bound to be some uses for RT over scanline systems. It's probably something a lot of devs haven't given much thought to as it hasn't really been an option. I guess if there is a use for RT in concert with scanline, we'll see it this gen on PS3. Maybe all it'll ever amount to is fluffy clouds? Or maybe we'll see a per-pixel surface displaced reflection/refraction composite rendering method?
 
Shifty Geezer said:
A relatively simple contorted SDS cylinder, perhaps 64 vertices, and displacement mapped for the detail.
Ok boss, now just try to figure out how to combine real RT with real DM without killing performance. If you do figure it out, feel free to patent it and retire on a carribean island ;)

Uttar
 
Titanio said:
Not that I disagree with regard to effort versus reward using techniques like this versus regular old rasterisation, but in fairness to those pictures, they are using a very simple shading model. If, as the paper suggests, they had a PS3, they could probably offload the shading to the GPU which would allow them to up the shading complexity significantly (and or improve performance).
Well sure, but then you're basically just talking about Cell taking over rasterization duties, which is the fastest part of RSX. It only occupies a small part of RSX's die and performs 10-100x faster than raytracing on Cell.

The only real visual advantage of raytracing is with dynamic interreflections, which we aren't seeing in this paper. Until that becomes realtime, which I assume will need at least 10x the power of Cell, I don't see the use of RT at all.

DudeMiester said:
I don't think a CPU doing software rasterisation would fare that much better in those scenes, so I'm not sure what you're getting at here. I would think examples like the Saarcor raytracing accelerator is proof that what you say is not the case.
The performance delta between software and hardware rasterization - again, for a given piece of silicon - is astronomical compared to that for raytracing. Just because software rasterization is only a bit faster (I'm not sure you correct in this assertion, but for the sake of argument I'll assume so) doesn't mean hardware will be only a bit faster. Rasterization maps to hardware very well.

DudeMiester said:
But, yeah, we probably won't see it until we need dynamic calculation of inter-reflection in game graphics.
Exactly. It's not the shaders. If those scenes had interreflection and caustics at these framerates, then I'd be impressed even without textures or fancy BRDF lighting models.
 
Gubbi said:
Besides the big jump in inter-node communication, you have the risk of having a hot region. That is, a region where a large amount of secondary rays pass through.

Think big glass sphere/lens with a focalpoint. Because you map your workload spatially onto nodes, the node that handles the focal point is essentially handling secondary rays for a large chunk of your primary ones and thus your big fat multi-core computing device is running with basically one core active.

you can always address that with a hotspot detection and further sub-dividing the hot-spot and redistributing the work (as you have idling nodes anyway). of course that's associated with extra work for the further partitioning, but it's more of a data multiplicity than a 'dataminig' issue. the inter-node communicatin issue still remains though.
 
Shifty Geezer said:
What is RT going to add?
as mintmaster mentioned there are certain things that cant be done accuratly with conventional methods eg reflections. with conventional methods u pick a point in the world and (usually) generate a cubemap based on the view in the 6 directions. the problem with this is the texture lookup is only accurate from this point, thus all reflections in the scene are gonna look wrong. raytracing does it perfect.
whilst we wont see any ps3 raytraced games (i assume) except for maybe snooker/ bowling, it gets interesting in the future with the release of many multiple core systems, intel want a 32core cpu in 3-4 years. so who knows where the state of affairs will be in a decade time, 1000cores is prolly to much to ask for. but with such cpu power the question is raised what do i need gpu's for? ala unification of cpu + gpu
 
zed said:
as mintmaster mentioned there are certain things that cant be done accuratly with conventional methods eg reflections. with conventional methods u pick a point in the world and (usually) generate a cubemap based on the view in the 6 directions. the problem with this is the texture lookup is only accurate from this point, thus all reflections in the scene are gonna look wrong. raytracing does it perfect.

shouldn't position-dependent cube-maps actually alleviate the single-POV reflections problem considerably?
 
darkblu said:
shouldn't position-dependent cube-maps actually alleviate the single-POV reflections problem considerably?
im not understanduing what u mean by position dependant.
if youre suggesting creating multiple cubemaps (ie each for a certain area) then obviously its gonna hurt performance greatly (+ is still not accurate)
 
zed said:
im not understanduing what u mean by position dependant.
if youre suggesting creating multiple cubemaps (ie each for a certain area) then obviously its gonna hurt performance greatly (+ is still not accurate)

no, i meant cube mapping where the point being textured does not reside at the origin of the cube but rather at an (controllable) offset to the origin. so not only reflection counts but also positon.
 
Back
Top