Cell and interactive raytracing? in IEEE 2006 Synopsium Raytracing 18-20 Sept 2006

chris1515

Legend
Supporter
http://gametomorrow.com/blog/index.php/2006/06/19/ray-tracing-receiving-new-focus/

Ray-tracing has always been the algorithm of choice for photorealistic rendering. Simple and mathematically elegant, ray-tracing has always generated lots of interest in the software community but its computationally intensive nature has limited its success in the interactive/real-time gaming world. However while the rendering time of traditional polygon rasterization techniques scales linearly with scene complexity, ray-tracing scales logarithmically. This is becoming increasing important as gamers demand larger more complex virtual worlds. Ray-tracing also scales very well on today’s multi-core “scale-out” processors like Cell. It falls into the category of “embarrassingly parallel” and therefore scales linearly with the number of compute elements

http://www.sci.utah.edu/RT06/

We plan to participate as we feel this topic is very important to the future of gaming and graphics in general.


If I remember correctly, Philipp Slusallek work to implement a real time raytracing renderer on CELL with IBM helps, I'm not sure. He works on Saarcor FPGA before. I hope that Barry Minor present some works on a ray tracing renderer on cell.
 
Last edited by a moderator:
i'd prefer if they use some multi-opteron-systems where they can plug fpga's into the slots and thus make sorta-dedicated hardware. (and do the other tasks that are left on the opterons still as they rock anyways, smile).
 
whilst the cell is the right direction for raytracing on the ps3 with its 7 cores its gonna be nowhere near enuf.
one with 1000s cores each rendering say 40x30pixels is a different story though, any ideas when they'll reach this capability 5-10years?
 
zed said:
whilst the cell is the right direction for raytracing on the ps3 with its 7 cores its gonna be nowhere near enuf.
one with 1000s cores each rendering say 40x30pixels is a different story though, any ideas when they'll reach this capability 5-10years?
I 'm just curious and I want to know Cell performance for ray tracing.

Barry Minor answer rmy question:

http://gametomorrow.com/blog/index.php/2006/06/19/ray-tracing-receiving-new-focus/#comments

If we can schedule it we would like to bring Cell HW out to this event and show some of the work we have been doing. We have one ray-caster and two ray-tracers running on Cell so it would be fun to show what Cell is capable of in this arena.

Yes Phillip and his students have been working on Cell for about a year now. He demonstrated a ray-tracer running on Cell at Siggraph last year. It only took his students about 2 weeks to get it running on Cell - a very good effort.
 
Without going into detail about the number of rays cast in the scene or a plethora of other detail settings that need to be considered, discussing about how fast Cell is at ray-tracing, and how 1000s cores will be faster, is a bit useless.

There are "ray-tracing programs" running on Cell, but we don't know what kind of images they produce.

In general, up until now, "normal" rasterisation has been the best possible option for real-time graphics when it comes to the IQ/speed ratio, and things won't change for a long time. It is much cheaper and faster to "fake" things (which are already fake anyway, so let's not get into the "but they're all fake" discussion) using shaders and realtime shadowing techniques used in games in the last few years, all at very high resolutions, than building a processor that can raytrace a scene at an unknown detail level (which could in the end look worse than a rasterised scene when it comes to output that enters our eyes) with impossibly high performance requirements.
 
I guess the biggest problem with Cell is the unability of SPE to access any memory other that its local store.. I think it is not a problem to fit a raytracing engine to its local memory, but the amount of obejct data will be a different story... One or two spheres, or a simple object would work just fine.. But, if you start to load many objects, than it might start to choke because of constant DMA access to move object data in-out of local memory.
 
silhouette said:
I guess the biggest problem with Cell is the unability of SPE to access any memory other that its local store.. I think it is not a problem to fit a raytracing engine to its local memory, but the amount of obejct data will be a different story... One or two spheres, or a simple object would work just fine.. But, if you start to load many objects, than it might start to choke because of constant DMA access to move object data in-out of local memory.
That's no different to a conventional cache! One or two spheres, or a simple object, might fit into an Athlon's cache and be readily available, but the moment you start to access loads of objects you have to constantly move data in and out of the caches, slowing you down to the speed of main RAM. And that's the big issue with ray tracing that all processors face - truly random scatter-gather processing that's constantly thrashing the RAM BW.
 
Shifty Geezer said:
That's no different to a conventional cache! One or two spheres, or a simple object, might fit into an Athlon's cache and be readily available, but the moment you start to access loads of objects you have to constantly move data in and out of the caches, slowing you down to the speed of main RAM. And that's the big issue with ray tracing that all processors face - truly random scatter-gather processing that's constantly thrashing the RAM BW.

No no, I agree.. You are right, this is a problem with any CPU. However, there is even more strain on Cell as an Athlon's memory controller simply can fetch the required data from main memory, while on Cell, SPE must communicate with PPU about the missing object and then, PPU will find that data in main ram and fetch it for SPE.. It will be a lot more complicated for Cell.
 
No, the SPE's don't need the PPU for data. As I understand it each SPE contains a Memory Flow Controller that despatches DMA requests to be fed via the Cell chip's Memory Interface Controller. That is, every core on the processor can access meory through the same IO, no different to any other multicore processor with shared cache. The difference for Cell is that when you want to access memory you can't read directly from that memory address and have the 'cache' fetch it automatically if not present, but have to request it to be copied into the 'cache' if not present.
 
Shifty Geezer said:
No, the SPE's don't need the PPU for data. As I understand it each SPE contains a Memory Flow Controller that despatches DMA requests to be fed via the Cell chip's Memory Interface Controller. That is, every core on the processor can access meory through the same IO, no different to any other multicore processor with shared cache. The difference for Cell is that when you want to access memory you can't read directly from that memory address and have the 'cache' fetch it automatically if not present, but have to request it to be copied into the 'cache' if not present.

That's right. I thought PPU is only needed to set up the logical memory map. Subsequent access to different memory areas (including local stores of a different SPU) will be transparent to the PPU since it's done via DMA. Can someone confirm this ?
 
ray casting, shading&shadows
2.4GHz x86 7.2 3.0 2.5
Single-CELL 58.1 (8x) 20 (6.6x) 16.2 (6.4x)
PS3-CELL 67.8 (9.4x) 23.2 (7.7x) 18.9 (7.5x)

Table 5: Performance in frames/sec on a 2.4 GHz SPE, a single respectively
dual 2.4 GHz CELL processor system, and a 2.4 GHz x86 AMD Opteron
CPU using pure ray casting, shading, and shading with shadows (at 10242
pixels). Opteron data and 2.4GHz-CELL data are measured, data for the
7-SPE 3.2 GHz processor (as used in the Playstation 3) has been extrapolated
from that data. For pure ray casting, our implementation on a single
2.4 GHz SPE is almost roughly on par with a similarly clocked Opteron CPU.
In addition, a CELL has 7–8 such SPEs, and can be clocked at a higher rate.
thanks for the link
interesting it looks like cell does live up to the hype (beating one of the top cpu's available today by a heavty margin) still not good enuf for a game though, though as expected performance does seem to scale well WRT number of cores
 
CBIMG002-1.png


Nice!
 
zed said:
thanks for the link
interesting it looks like cell does live up to the hype (beating one of the top cpu's available today by a heavty margin) still not good enuf for a game though, though as expected performance does seem to scale well WRT number of cores

For these kind of applications I don't think there ever was a doubt that cell would beat a normal CPU...
 
Those scenes just show you how much of a dead end raytracing is for realtime graphics.

For quite a few years from now, a given area of silicon will always produce much better graphics with rasterization than raytracing. The only reason I can see this changing is if rasterization becomes so fast that we run out of things to render, and reach a bit of a realism plateau that can't be overcome without completely switching over to multiple-bounce raytracing. I don't even know if we'll ever reach that point.
 
Not that I disagree with regard to effort versus reward using techniques like this versus regular old rasterisation, but in fairness to those pictures, they are using a very simple shading model. If, as the paper suggests, they had a PS3, they could probably offload the shading to the GPU which would allow them to up the shading complexity significantly (and or improve performance).

As a paper, it's very interesting. As Platon says, it's the type of thing we might have thought would be really great for Cell, but reading the paper you wouldn't necessarily think that :p What with the branching, and random access to scene data which they've had to accomodate with some interesting techniques (the latter is particularly interesting, roping in software caching for scene traversal and a software threading model of sorts for the SPEs). The shading is also not particularly suited to Cell.

I don't think the paper mentions it, but if anyone wants scene complexity figures, the conference room scene is 280k triangles, and the VW Beetle scene is 680k triangles (that's taken from another Saarland RT paper - presumably the models are unchanged).
 
I don't think a CPU doing software rasterisation would fare that much better in those scenes, so I'm not sure what you're getting at here. I would think examples like the Saarcor raytracing accelerator is proof that what you say is not the case. But, yeah, we probably won't see it until we need dynamic calculation of inter-reflection in game graphics.

They were saying in the paper about the problems with doing complex interreflections because of the issue with memory usage, and I had a thought. Say you partitioned the scene into blocks, where each block is sized to contain roughly the same about of geometrical data. Each of these blocks of data is assigned to a processor. Then as you do your complex raytracing, you simply have the processor that's responsible for the space the ray is entering deal with that ray. To do this, each processor would store a map of the grid, perhaps with some occulsion information (i.e. if a ray enters here, it will definitly not hit anything, so check the next block). If an area and the associated processor get overloaded with rays, then you can temporarily split the load between that processor and an idle one. The net effect is that each processor keeps more data within its cache, and the tradeoff is that there is much more inter-processor bandwidth required (to pass ray responsibilty around). Of course, on say a CELL this is just fine.
 
Last edited by a moderator:
Titanio said:
Not that I disagree with regard to effort versus reward using techniques like this versus regular old rasterisation, but in fairness to those pictures, they are using a very simple shading model. If, as the paper suggests, they had a PS3, they could probably offload the shading to the GPU which would allow them to up the shading complexity significantly (and or improve performance).

But just like the GPU raytracing project they only deal with primary rays.

What makes RT interesting is the (almost) correct refraction and reflection effects that can be obtained. This however requires recursive processing of secondary rays, a task that doesn't agree well with the ray-packet-based approach the authors use in the paper to achieve the speed in the first place (they admit this themselves on page 8).

IMO, this reduces the paper to the "interesting, but not useful" category (same as GPU raytracing).

Cheers
 
Last edited by a moderator:
Back
Top