How does the Cell + RSX rendering relationship work?

scatteh316

Newcomer
when Cell does post-processing how does it work???

Does RSX write into the XDR for Cell to proocess it, or into its VRAM and Cell does it from there. Or does RSX pass it directly to Cell to process then Cell it-self either places the results in the XDR/VRAM.

Or is there some other way that they work.
 
I think it's entirely open to the devs how they want to do it. Any of those suggestions would work in theory.
 
Shifty Geezer said:
I think it's entirely open to the devs how they want to do it. Any of those suggestions would work in theory.

Any idea's on what one would be more effcient in saving bandwidth?
 
I don't think you can save much bandwidth by making RSX and CELL collaborate on a scene, on the contrary.
What you can do, is to distribute the task and bandwidth usage more evenly and make devices do what they are best at.
 
Squeak said:
I don't think you can save much bandwidth by making RSX and CELL collaborate on a scene, on the contrary.

Theoretically you certainly could, main memory bandwidth that is. The example that keeps coming up would be, if possible, to do some or all transparency rendering on cell (e.g. for particles). That could be a pretty bandwidth-intensive task depending on how much you were doing, and if RSX were doing it, barring cache-related savings, it'd be all done over main memory bandwidth. If Cell were doing it, it'd eat only internal bandwidth to local memory, of which there is a lot.

These things are theoretical of course, we'll have to wait to see how practical they'd be.
 
Making use of the FlexIO BW, You could pass data to RSX, render, and output straight to Cell for some post-processing work like colour balance and histogram jiggerypokery. I wouldn't want to hazard a guess what main ram BW savings devs will be getting, but I'm sure eventually they'll be using every scrap of system resources.
 
scatteh316 said:
when Cell does post-processing how does it work???

Does RSX write into the XDR for Cell to proocess it, or into its VRAM and Cell does it from there. Or does RSX pass it directly to Cell to process then Cell it-self either places the results in the XDR/VRAM.

Or is there some other way that they work.
If one wants to do some post-processing work on a SPE data should obviously reside in SPE's local memory. We know 256kb are hardly enough to contain a full frame so it doesn't make much sense to say (in the general case) that RSX renders or passes data directly to CELL since most of the time local store memories would not be enough and cause you're not going to generate your frame touching every single pixel just one time :smile:
What is likely to happen is that RSX render a frame into VRAM and SPEs read data from VRAM and write data into their local stores.
From there SPEs can do whatever they want..
Once this process is complete prostprocessed data has to be sent to VRAM/XDR if RSX needs them for further processing or just to display them.

ciao,
Marco
 
Could not each rendered pixel be diverted from going to the backbuffer to going to the Cell instead though? Obviously this is no use when you need adjacent pixel data such as in a blur. But if the RSX renders in quads, each quad could be sent to a SPE and processed, perhaps in batchs of 4 (dunno how big these quads are), and if possible in batchs of 9 that'd be enough data for tile-based blurring. I know some effects that can be performed per pixel/quad without needing a completed image.
 
nAo said:
If one wants to do some post-processing work on a SPE data should obviously reside in SPE's local memory. We know 256kb are hardly enough to contain a full frame

ciao,
Marco

Who said it should be stored in SPE memory??, it could be stored in the main XDR ram and worked on from there, it could even be worked on from the VRAM.

David Kirk: SPE and RSX can work together. SPE can preprocess graphics data in the main memory or postprocess rendering results sent from RSX.

Nishikawa's speculation: for example, when you have to create a lake scene by multi-pass rendering with plural render targets, SPE can render a reflection map while RSX does other things. Since a reflection map requires less precision it's not much of overhead even though you have to load related data in both the main RAM and VRAM. It works like SLI by SPE and RSX.

David Kirk: Post-effects such as motion blur, simulation for depth of field, bloom effect in HDR rendering, can be done by SPE processing RSX-rendered results.

Nishikawa's speculation: RSX renders a scene in the main RAM then SPEs add effects to frames in it. Or, you can synthesize SPE-created frames with an RSX-rendered frame.

David Kirk: Let SPEs do vertex-processing then let RSX render it.

Nishikawa's speculation: You can implement a collision-aware tesselator and dynamic LOD by SPE.

David Kirk: SPE and GPU work together, which allows physics simulation to interact with graphics.

Nishikawa's speculation: For expression of water wavelets, a normal map can be generated by pulse physics simulation with a height map texture. This job is done in SPE and RSX in parallel.
 
How come i cant EDIT???, not enough posts maybe??

Anyway a breakdown,

SPE processing options :

Direct results from RSX that go stright to Cell and bypass the RAM
Process results that are rendered by RSX that are stored in the XDR
Process results that are rendered by RSX that are stored in the VRAM
SPE's can create there own frames that are COMBINED with RSX frames.

P.S whats it mean by a " collision-aware tesselator and dynamic LOD by SPE "
 
scatteh316 said:
Who said it should be stored in SPE memory??, it could be stored in the main XDR ram and worked on from there, it could even be worked on from the VRAM.
SPEs can't work on data that are not in their local stores.
A SPE needs data on its registers, and to have some data in a register it has to be loaded from local store, and for a data to be present in a local store it has to be read from somewhere (or some unit has to write data into a SPE local store)
There's no workaround for this 'problem'. If you don't want to move data into SPEs local store you can use PPE to do some kind of post processing :smile:
To be fair there is another way to pass data on SPEs registers without using local store at all but it's completely useless when you need high bandwith.
Remember that DMA can write/read into a SPE local store as much as 128 bytes in a single clock cycle ;)

ciao,
Marco
 
scatteh316 said:
Who said it should be stored in SPE memory??, it could be stored in the main XDR ram and worked on from there, it could even be worked on from the VRAM.

SPUs only work directly on their local sram. The data needs to be there for the SPU to work on it. For frame work, you'd take a portion of it at a time.

edit - beaten!
 
Shifty Geezer said:
I know some effects that can be performed per pixel/quad without needing a completed image.
Then you'd better do that kind of work on the GPU anyway..
 
SPE post-processing would have to work on small chunks of framebuffer, if you don't want to stall around DMA loads/stores you need at least triple buffering in SPE LS. Which means around 64K tiles by the time you have space for programs etc.

Assuming your talking about FP32 framebuffer (HDR colour only), thats 16K pixels (128x128). Or around 60 tiles per 720p screen.
 
Code:
Titanio said:
But, sure, who would be doing that? :D :p

I don't know, I wouldn't... it would suck from a GPU point of view, its just extra bandwidth with no visual gain. However as FP16 isn't native for an SPU, your stuck... either get the GPU to waste its bandwidth decompressing FP16 to FP32 before spititng it over the bus or get the SPU to unpack FP16 to FP32 in its program.

Given the GPU has custom hardware, I suspect it will be better at it...
 
Titanio said:
But, sure, who would be doing that? :D :p
SPEs have no instructions (as you can check on public docs) to convert from fp16 to fp32 format..even if it shouldn't be too difficult to do some fp32 to fp16 conversion work just using some integer math ;)
edit: ops.. Dean was faster than me!
 
DeanoC said:
Code:

I don't know, I wouldn't... it would suck from a GPU point of view, its just extra bandwidth with no visual gain. However as FP16 isn't native for an SPU, your stuck... either get the GPU to waste its bandwidth decompressing FP16 to FP32 before spititng it over the bus or get the SPU to unpack FP16 to FP32 in its program.

Given the GPU has custom hardware, I suspect it will be better at it...


Hehe, I just thought of this actually, I remember someone saying that FP32 would be useful for sharing data with SPUs. Makes sense!

What would GPU decompression of FP16 to FP32 entail, or specifically how expensive would it be in terms of bandwidth? That'd obviously be something to consider when weighing up gains on one end vs losses on the other..

And cheers nAo! Good to know that is possible, even if it's a little more work.
 
Last edited by a moderator:
Which of these effects would have a benefit being run on SPEs?

Edge Detection and outline rendering
Image warping
Image bluring
Colour balancing
Convolutions
Textured media/'rough paper'

Would these not also benefit render-to-texture effects, such as rendering a painted animation in a realtime 'photorealistic' scene?

Seemingly most effects are better suited to the GPU, except very conditional effects, such as the EyeToy background removal process linked to a while back.

Which makes one wonder what use is the RSX>CPU transfer? How is that actually going to be used?
 
Shifty Geezer said:
Which of these effects would have a benefit being run on SPEs?

Edge Detection and outline rendering
Image warping
Image bluring
Colour balancing
Convolutions
Textured media/'rough paper'
Everything that has a 'small' kernel is a good candidate :smile:
(convolutions here represent most of the items in your list..)

Seemingly most effects are better suited to the GPU, except very conditional effects, such as the EyeToy background removal process linked to a while back.
Even if a GPU (as RSX) could be better suited for some tasks this doesn't mean a GPU has to be preferred for these tasks over SPEs all the time.
What if one can do all the post process work on the SPEs and let RSX be free to start to render the next frame? Even if SPEs spend time to perform a task than a GPU it could be a win overall.

ciao,
Marco
 
Back
Top