Using the CPU to post process GPU work

Since next gen consoles have this idea of "two way" rendering, that is, work done by the GPU can be passed back to the CPU for further processing, what kind of benefits does this bring?

Put differently, what kind of problems are better solved by a SIMD CPU after having gone through the GPU?
 
Here are some examples from a WatchImpress article tranlsated by one:

David Kirk: SPE and RSX can work together. SPE can preprocess graphics data in the main memory or postprocess rendering results sent from RSX.

Nishikawa's speculation: for example, when you have to create a lake scene by multi-pass rendering with plural render targets, SPE can render a reflection map while RSX does other things. Since a reflection map requires less precision it's not much of overhead even though you have to load related data in both the main RAM and VRAM. It works like SLI by SPE and RSX.

David Kirk: Post-effects such as motion blur, simulation for depth of field, bloom effect in HDR rendering, can be done by SPE processing RSX-rendered results.

Nishikawa's speculation: RSX renders a scene in the main RAM then SPEs add effects to frames in it. Or, you can synthesize SPE-created frames with an RSX-rendered frame.

David Kirk: Let SPEs do vertex-processing then let RSX render it.

Nishikawa's speculation: You can implement a collision-aware tesselator and dynamic LOD by SPE.

David Kirk: SPE and GPU work together, which allows physics simulation to interact with graphics.

Nishikawa's speculation: For expression of water wavelets, a normal map can be generated by pulse physics simulation with a height map texture. This job is done in SPE and RSX in parallel.

And here's the infamous slide about the 'two way rendering'
hofstee45ti.jpg


From a previous thread, Deno said the slide is rubbish as consoles have been doing "two way rendering" since PS1. I agree with Jaws/Jawed (can't remember which one :p) that the slide was talking more about one way PC archiectures.
 
Cell = 218GFlops.

RSX = 1.8TFlops.

Ponder on how useful Cell will be, with around 1/10th of the computing power of RSX.

If Cell is so good at graphics, why isn't PS3 a 2 or 3 Cell design. Why is RSX in there?...

The big win, using Cell to do post-processing, is when you have a light computing load to perform on lots of data. e.g. Depth of field combines a depth texture, a blur texture and the frame buffer. This can be programmed to chew XDR bandwidth instead of the precious GDDR3 bandwidth.

So there are opportunities. But they're not earth shaking. RSX is 10x more powerful than Cell at vector maths.

Jawed
 
That 1.8 GFlop figure is NV flops though. It's not infintely programmable and you can't set it doing whatever you want - it processes particular tasks. eg. The Cell can raytrace a terrain that the RX can't as it not a raytracer. One could raytrace terrain in Cell and add other details through RSX. I don't know how the Cell might be used in actual games, but it adds a level of flexibility that GPU's don't have by their nature. eg.

Nishikawa's speculation: For expression of water wavelets, a normal map can be generated by pulse physics simulation with a height map texture. This job is done in SPE and RSX in parallel.

You can't just say RSX=10x power of Cell. Otherwise why have a Cell, and not just have two GPUs! :p
 
Jawed said:
RSX is 10x more powerful than Cell at vector maths.

Jawed

And my cat is 10x more powerful than me at breaking sofas, but i can kick its ass as I am 10x more powerful at anything else.
 
Vaan said:
Jawed said:
RSX is 10x more powerful than Cell at vector maths.

Jawed

And my cat is 10x more powerful than me at breaking sofas, but i can kick its ass as I am 10x more powerful at anything else.

Breaking sofas? your cat is powerful as hell! :oops:
 
blakjedi said:
Vaan said:
Jawed said:
RSX is 10x more powerful than Cell at vector maths.

Jawed

And my cat is 10x more powerful than me at breaking sofas, but i can kick its ass as I am 10x more powerful at anything else.

Breaking sofas? your cat is powerful as hell! :oops:

Yeah, it can "paralelise" the sofa in five "threads" in one "cycle" of arm.

Cell pwned by my cat.
 
Jawed said:
RSX is 10x more powerful than Cell at vector maths.

Jawed

You sure? We don't specifically know about RSX, but Xenos can do 48 vec4 ops per cycle, Cell can do...8 (7 SPEs + 1 PPE)? Factor in clockspeed though, and per sec you're talking about 24,000 ops per sec (Xenos) versus 25,600 ops per sec (Cell). Anyone want to confirm if these are comparable figures? Of course, I'm looking only at programmable power here, but I think that's fair. Considering also that Cell brings a tonne more flexibility.
 
Jawed said:
These are Sony/NVidia's numbers, not mine:

Cell = 218GFlops.

RSX = 1.8TFlops.

Jawed

Hardly comparable given that probably less than 20% of RSX's power is programmable..? Same with any GPU. Maybe your point is that you wouldn't use Cell for post-processing given the amount of relatively fixed-function logic on the GPU that can handle that, but if we were talking about using Cell to help with programmable parts of the pipeline, I don't think you can compare Cell's programmable floating point power with ALL of RSX's (programmable + fixed function). Comparing to its shader power is fairer in that instance, and even then it doesn't account for the much larger amount of flexibility afforded by Cell versus a SM3.0+ shader.
 
Jawed said:
These are Sony/NVidia's numbers, not mine:

Cell = 218GFlops.

RSX = 1.8TFlops.

Jawed

¿What Mightyflops was the Xbox GPU supposed to be in comparison with the cpu?

¿80 against 3?
 
Cell will be most useful doing bandwidth-intensive tasks alongside the GPU, but leaving RSX to do the things it's best at.

That way the bandwidth of PS3 will be shared more evenly across the graphics workload. Horses for courses though.

Jawed
 
It' doensn't make sense to compare CELL fp ops per second number with RSX fp ops per second number..when RSX number is derived summing fixed (non programmable) stuff and full fp programmable shaders.
Just to point out some problem: do you think a common GPU rasterizer engine works with floating point math? :?
They're summing a lot of fixed hw operatiosn working on fp and fixed point math, with a variable number of exponent/mantissa/precision bits.
You can't compare them, it just doesn't work like that ;)
 
I suppose one unlikely scenario is that the GPU is maxed out doing normal rendering and there are still spare SPE cycles left. These cycles can be used to do post filtering.
 
Well, R500 has ~240 gflops of fairly generalized vector processing power at 500MHz while cell offers ~218 gflops at 3.2 GHz. I'm not sure about RSX, though, and how useful its vector processing units are in conjunction, since it's more of a fixed layout.
 
I think its much more desireable to have the GPU "assist" the Cell/SPE`s when it comes to task it specially was designed for - filtering Data, aka taking interpolating data from fields - than using Cell for Postprocessing.
 
nAo said:
It' doensn't make sense to compare CELL fp ops per second number with RSX fp ops per second number..when RSX number is derived summing fixed (non programmable) stuff and full fp programmable shaders.
Just to point out some problem: do you think a common GPU rasterizer engine works with floating point math? :?
They're summing a lot of fixed hw operatiosn working on fp and fixed point math, with a variable number of exponent/mantissa/precision bits.
You can't compare them, it just doesn't work like that ;)

You've just explained why doing various kinds of blending and geometry interpolation on Cell is a huge waste of resources, and why this functionality should be undertaken by RSX. ;) In case you haven't noticed, ATI has gone to the next level on this kind of functionality with R500.

Clearly there are algorithms that are ill-suited to RSX's 200-odd GFLops of programmable shaders and they should be run on Cell - though it's worth pointing out that Cell is a streaming processor, which means that the SPEs are only marginally more general purpose than the programmable streaming floating point functionality of RSX.

Cell and RSX are both stream processors with some gubbins round the edges to tie them together.

This goes back to the argument that XB360's CPU is "6 cores" of general purpose computing power with small amounts of vector processing tacked on the side.

Physics and AI are not a good fit for stream processors. It will work solely because there's a brutish amount of power on Cell sat doing not much else.

Jawed
 
Back
Top