GPU<->CPU interconnect...what's possible?

scificube

Regular
I've seen some discussion here and there about the GPU<->CPU interconnects/abilities of the next gen consoles. There seems to be some difficulty in thinking of things which would take advantage of the afore mentioned so I just thought it might be good to have a focused discussion about it. Maybe we could come up with some interesting things...maybe not but I at least want to try.

So far I've seen these things suggested:

For Cell:

tessellation (static, adaptive/dynamic)
post processing effects (such as?)
shadows overlaid in scene
procedural textures
physics that affect visuals (physics on sprites/alpha textures for smoke etc, fluid dynamics, cloth sim, hair sim)

For Xenon/XeCPU:

Aid in bump mapping
procedural textures
catch all: some of the stuff Cell can do at or to a lesser degree.

RSX:

So far pretty standard stuff. Cell seems the special spice on top of what it can offer.

Xenos:

Memexport may be used for physic on particles since it makes Xenos more GPGPU.
Memexport may be used to provide fluid dynamics.
Tessellation (static, adaptive/dynamic)

Xenos seem to serve up it own special toppins.

-------------------------------

Can Cell do true displacement mapping for the system?

idea: Cell can tessellate geometry and should be able to fiddle with data in a texture. Given this Cell takes a disp map(s) and tessellates the geometry before it sends the primitives to RSX. An algorithm can interpolate between disp maps for adaptive disp mapping.

Possible/Impossible? Good idea/Bad Idea?

Overlay:

I've been thinking about Cell's access to the framebuffer and some of the demos we've seen of Cell handling graphics on it's own like the terrain demo and supposed the Getaway demo at E3 last year. I also had the idea of overlaying shadows in a scene in my mind. (not my idea...I mean to say I was thinking about it.)

I was thinking the if Cell really can handle graphics from start to finish of course not as well as a GPU but still to a usably significant level perhaps this can be leveraged. I think this is a new idea so I want to put it out.

Why not have both Cell and RSX churn out graphics independently and the then synchronize and combine the final results before output. The idea is that RSX handles as much as it can in a traditional way and then Cell goes on the render more stuff and it's results are overlaid to RSX's before output. If possible this kind of overlay could be pretty versatile in adding more detail to a scene via Cell or having Cell completely handle portions of a scene.

To avoid double work Cell could send RSX simpler representations of what it is working on (volumes, planes, collision hulls...whatever) that ask for RSX to do no work on them but at the same time tell RSX portion behind them are occluded from view so that RSX doesn't do unnecessary work. Then Cell does it's thing and those portions of the scene are filled in with it's overlay of whatever is being drawn. Of course there will still be some overdraw but this still seems a valuable and straight forward optimization that could be used. Since RSX isn't drawing some portion of the scene more horsepower can be dedicated to the portions that it is drawing.

Perhaps alternate render targets where nothing is drawn in them could also be work out in some instances.

Impossible/possible? Good idea/Bad Idea?

Real time Ray-Traced Reflections?

Impossible/possible? Good idea/Bad Idea?

Well that's all the nonsense I can come up with for the moment.

note: My emphasis on Cell/RSX is due to how it seems the GPU<->CPU interconnect seems more important between them, how the bandwidth between them is greater than between Xenon/Xenos and can be dedicated solely to GPU<->CPU interactions and lastly how Cell has more resources it can dedicate to the effort. If someone sees differently I welcome the insight. I also welcome any other interesting ideas be they more inclined for the X360's or the PS3's components or indifferent to the platforms.

Please discuss :)
 
scificube said:
Can Cell do true displacement mapping for the system?

I believe Kutaragi or Kirk mentioned this explicitly.

scificube said:
I've been thinking about Cell's access to the framebuffer and some of the demos we've seen of Cell handling graphics on it's own like the terrain demo and supposed the Getaway demo at E3 last year. I also had the idea of overlaying shadows in a scene in my mind. (not my idea...I mean to say I was thinking about it.)

I was thinking the if Cell really can handle graphics from start to finish of course not as well as a GPU but still to a usably significant level perhaps this can be leveraged. I think this is a new idea so I want to put it out.

Why not have both Cell and RSX churn out graphics independently and the then synchronize and combine the final results before output.

I don't know about generally doing this, but it was mentioned before that you could render transparencies seperately on Cell and blend with the final buffer. This way, the rendering would be pretty much totally independent (you'd just need access to the z-buffer on Cell).

More generally, I wonder about the potential for Cell performing calculations and storing the results in a texture-map, for a shader on the GPU to pick up, basically doing some of the work the shader would have had to do before, or making the data the Shader takes in more "dynamic". Make shading a two-stage process between Cell and RSX, with texture maps shuttling Cell results to become input for RSX. The Doc Oc demo seemed to be doing something like this - according to Phil Harrison, Cell was doing "really heavy lighting calculations" for it.

I'd be really interested to see more on these things too - particularly have wondered about shadow rendering on Cell. Probably exposing my ignorance on shadow rendering algos, but how would work then? Is it or can it be fairly independent from other rendering?
 
Last edited by a moderator:
I think a key to understanding CPU<->GPU cooperation is an understanding of the specific fixed-function parts of a GPU.

These fixed-function parts, such as the Z-buffer with its high-efficiency rejection of pixels, or texture-sampling and filtering with latency-hiding, provide a wealth of algorithms with vastly more computing power than can be found in any CPU.

So, what we're looking for are concepts that don't fit into the rendering pipeline of these GPUs at all well. Otherwise they're better kept entirely on the GPU.

In addition, of course, there's the whole GPGPU concept. RSX can fulfill this role. There are applications of GPGPU which make extremely heavy use of "bandwidth", where the texturing ability of GPUs really comes into its own. In other words there's a decent chance that some very specific algorithms could work better on RSX than Cell, simply because of the difference in the way they treat memory. The problem, really, is finding them, and whether RSX (or Xenos) can afford to do non-graphics-rendering work.

Jawed
 
I understand Jawed.

The thrust of my post is not to find ways for Cell or Xenon to replace RSX and Xenos. It is to find way they can be helpful to the RSX and Xenos and/or can act independently to make the visuals we see on screen in the overall better.

I understand full well Cell and Xenon get whipped by a GPU at certain things and visa versa is true....at least I think I do.

I see there is power there to be used...I'm asking for ideas on how to use it. I'm trying hard to do this in the context of not directly taking an task explicity away from the GPUs unless of course it's obvious the CPUs can do the job better.

I think it's worth exploring both what these CPUs can bring to the table and in the case of Xenos what Memexport can allow for.

-------------------------

http://64.233.187.104/search?q=cach...++displacement+mapping&hl=en&client=firefox-a

http://translate.google.com/transla...&hl=en&ie=UTF-8&oe=UTF-8&prev=/language_tools

Found these on google.

Seems you're right Titanio in that KK does specifically mention the SPEs can be used for true displacement mapping. Good news to me. Dave's article on Xenos suggest true disp mapping is possible using Xenos's tessellator and Memexport functionality. Good new again!
 
Last edited by a moderator:
All I'm saying is by looking for algorithms that aren't directly supported by the fixed functions of a GPU, your search for CPU<->GPU collaborative algorithms gets easier :D

As it happens, I think tessellation may just be a fantastic example. It appears to be a process that suits streamed programming really well (some datapoints in, some of those data points deleted, some new datapoints added twixt those datapoints) and which doesn't seem to benefit greatly from a traditional GPU's pipeline - although it's prolly do-able.

It's arguable, for example, that Xenos's two-pass tessellation functionality is a bit of a hack (since it requires data be written to memory and then re-read shortly afterwards). A one-pass Cell-implemented tessellation algorithm could work more efficiently. Who knows, eh?

Jawed
 
To quote David Kirk :

David Kirk: SPE and RSX can work together. SPE can preprocess graphics data in the main memory or postprocess rendering results sent from RSX.

Nishikawa's speculation: for example, when you have to create a lake scene by multi-pass rendering with plural render targets, SPE can render a reflection map while RSX does other things. Since a reflection map requires less precision it's not much of overhead even though you have to load related data in both the main RAM and VRAM. It works like SLI by SPE and RSX.

David Kirk: Post-effects such as motion blur, simulation for depth of field, bloom effect in HDR rendering, can be done by SPE processing RSX-rendered results.

Nishikawa's speculation: RSX renders a scene in the main RAM then SPEs add effects to frames in it. Or, you can synthesize SPE-created frames with an RSX-rendered frame.

David Kirk: Let SPEs do vertex-processing then let RSX render it.

Nishikawa's speculation: You can implement a collision-aware tesselator and dynamic LOD by SPE.

David Kirk: SPE and GPU work together, which allows physics simulation to interact with graphics.

Nishikawa's speculation: For expression of water wavelets, a normal map can be generated by pulse physics simulation with a height map texture. This job is done in SPE and RSX in parallel

Cell can help ALOT it seems.
 
^^^nice post!

Jawed said:
All I'm saying is by looking for algorithms that aren't directly supported by the fixed functions of a GPU, your search for CPU<->GPU collaborative algorithms gets easier :D

As it happens, I think tessellation may just be a fantastic example. It appears to be a process that suits streamed programming really well (some datapoints in, some of those data points deleted, some new datapoints added twixt those datapoints) and which doesn't seem to benefit greatly from a traditional GPU's pipeline - although it's prolly do-able.

It's arguable, for example, that Xenos's two-pass tessellation functionality is a bit of a hack (since it requires data be written to memory and then re-read shortly afterwards). A one-pass Cell-implemented tessellation algorithm could work more efficiently. Who knows, eh?

Jawed

Who knows indeed!

I'm just thinking because a GPU is great at something why can't the CPU do more on top of that especially if the CPU has cycles/flops to burn. I guess I'm not ready to rule out that the situation can be 1+1 = 2 vs. 1 OR 1 = 1. Maybe I'm being foolish and I don't mind being shown that but I want to at least see if there's anything to this.
 
Last edited by a moderator:
mckmas8808 said:
That is nice. I wonder what devs like nAo, Mr. Wibble, and DeanoC have to say about this?

I'm scared they can't talk because of NDA's or won't talke because they don't want to give away any secrets to their competition. :(
 
scificube said:
I'm scared they can't talk because of NDA's or won't talke because they don't want to give away any secrets to their competition. :(

Oh crap I forgot about that. Dang it. Dang NDA's. I hate them I tell you.:mad:

Hopefully January or Feburary Sony will lift the NDAs. *prays to the gaming heavens.*
 
From what David Kirk has to Say i think this should be possible :

RSX Shades a texture and applies the standard bump maps, parralex maps etc etc etc.....Then whips it straight to VRAM/MainRam Then Cell either adds Further shadeing + Any post process you can think of.

In a fighting game RSX renders the Scene then whips it straight to Cell for some lighting + post proccess effects. Just imagin the lighting on the next gen Tekken..

The possibilities are endless ( well aslong as there is enough bandwidth )
 
scificube said:
^^^nice post!



Who knows indeed!

I'm just thinking because a GPU is great at something why can't the CPU do more on top of that especially if the CPU has cycles/flops to burn. I guess I'm not ready to rule out that the situation can be 1+1 = 2 vs. 1 OR 1 = 1. Maybe I'm being foolish and I don't mind being shown that but I want to at least see if there's anything to this.

He can correct me if I'm wrong, but I think he's just taking issue specifically with the idea of rendering parts of a scene on Cell (right through rasterisation), the rest on the GPU, and merging the results. It's not that this could never be beneficial - it could be depending on your game, where you bound is etc. - but the gap in performance between a GPU and CPU when it comes to something like rasterisation is particularly jarring, usually.

Again, correct me if I'm wrong Jawed, but you're suggesting to focus mostly on where a CPU can help up as far as (and including) the shading/programmable part of the pipeline, and beyond that leave the GPU to handle everything?
 
Last edited by a moderator:
Yeah, anything that a GPU does well with a combination of programmable and fixed-function blocks (e.g. texturing polys), it'll do so much better than a CPU that it's basically silly to move that process onto the CPU.

So you need to think about things that, for example, RSX doesn't directly support (according to the RSX~G70 theory). Adaptive tessellation seems like a great (and conceptually simple) example. Higher order surfaces are closely related, too (though I'm not sure what functionality for these exists within a GPU already).

I'm relatively loathe to whitter-on simply because I'm way out of my depth :oops:

We'll have to wait for the heavy-hitters to add insight.

Jawed
 
Jawed said:
Yeah, anything that a GPU does well with a combination of programmable and fixed-function blocks (e.g. texturing polys), it'll do so much better than a CPU that it's basically silly to move that process onto the CPU.

So you need to think about things that, for example, RSX doesn't directly support (according to the RSX~G70 theory). Adaptive tessellation seems like a great (and conceptually simple) example. Higher order surfaces are closely related, too (though I'm not sure what functionality for these exists within a GPU already).

I'm relatively loathe to whitter-on simply because I'm way out of my depth :oops:

We'll have to wait for the heavy-hitters to add insight.

Jawed

You always seemed pretty well versed on the technical side especially with GPUs... what is your expertise by the way?
 
Erm, just a bit of a geek really and ex-database programmer.

As a kid I wrote some 3D-maze graphics in machine code (didn't have an assembler) implemented on a 24x24 character display. I'm not sure, but that might be the only 3D game rendering I've ever done.

Back then I used to dream of a time when I could render in pixels.

My first chance to render pixels was on an 8-pin dot matrix printer - but that was for 3D plots of functions.

Jawed
 
I don't disagree with Jawed at all.

I don't wish to move tasks off the GPU the GPU is already better at.

I gather Cell doing even more of what a GPU is good at is not a good idea.
 
Rather than tesselation or displacement mapping (that's not really good if not micropolygonED ) i'd want to see material-based destructable environements ,where chuncks are cleverly created on the fly in relation with the matter it's made of.
 
ManuVlad3.0 said:
Any chance the RSX has only Pixel Shaders? I mean, Cell do vertex shaders and RSX only Pixel Shaders?

That's been suggested quite often. However, this would leave RSX's vertex engines sitting idle and no coder worth his salt likes wasting resources. Also RSX's vertex shaders are quite capable especially when compared against Cell.

Of course an obvious target for Cell is vertex related tasks because of how the SPEs seem well suited to them it's just that it's better to find a use for this potential that doesn't preclude RSX from utilizing it's potential at the same time.
 
How about AI, Physics and other CPU intensive tasks? Would things like that have to be compromised to do tasks that GPUs usually do?
 
Back
Top