Cell and RSX : their relationship.

Status
Not open for further replies.
Either this guy is lying or IGN doesn't know what the heck they are talking about. ;)

Phil: Well, I'll give you a couple of other examples. The terrain rendering demo that was done by STI, which is the people who developed the Cell, doesn't use the graphics chip at all. That 3D landscape was generated in real-time from two input data sources and a software renderer running on the Cell created the final image. All that it does is output as a bitmap straight to the video hardware - it doesn't even create a single polygon, there's no concept of a polygon in that demo.

http://www.gamesindustry.biz/feature.php?aid=9051

And it was already mentioned that they were using a 6800 in SLI GPU for the demos.
 
I think the determining factor will be cells ability to generate procedural geometry and textures. These are the two inputs the gpu accepts, if they can be generated quickly on a per frame basis than all the things you mentioned should be possible. Sadly in both the cases of geometry and textures memory and bandwidth will be the biggest hurdles. You preferably want fast random access to very large datasets during generation. Not to mention the math performance. We will see some of this stuff, but not all of it together, I think having 2+ gigs of memory would go a long way towards making cg quality a reality. I almost value lots of fast memory over increased cpu performance.
 
flick556 said:
I think the determining factor will be cells ability to generate procedural geometry and textures. These are the two inputs the gpu accepts, if they can be generated quickly on a per frame basis than all the things you mentioned should be possible. Sadly in both the cases of geometry and textures memory and bandwidth will be the biggest hurdles.

Well, one of the ideas with procedural work for vertices or bandwidth is to minimise (memory) bandwidth usage, no? Small input, large output? That kind of stuff would seem ideal for Cell, small input to the SPEs (to fit into LS), lots of computation, then out to RSX over flexio.

I'm wondering about their suitability for texture generation? Vertices seem obvious, but what about computing textures on the SPEs? That could open a lot of doors for "baking" lighting algos or other results on Cell into textures before passing to RSX pixel shaders (making normallly static data input dynamic).
 
Well procedural synthesis (hate that term) would only save you RAM space and bandwidth from RAM to Cell. And also it would save space on the disc.
After Cell calculates what it needs to calculate, be it textures or geometry, the amount of data going to RSX will be the same as it would have been if Cell used pre-defined textures and models...

Also, it might be the case that Cell itself would have to write the results of the procedural calculations back to RAM, before resending them to RSX, which makes things a lot slower, and RAM space would be used up quickly, however i'm not sure how often that would happen.
 
Also, it might be the case that Cell itself would have to write the results of the procedural calculations back to RAM, before resending them to RSX, which makes things a lot slower, and RAM space would be used up quickly, however i'm not sure how often that would happen.
If you're going to be doing procedural on the CPU and wanting to use it on the GPU, it always has to go into memory somewhere, even if you were to send it directly to the VRAM. If you actually want to save memory, you either have to make a vertex-level operation (which means the generated data is stored in the vertex data you anyway have) or you have to generate on the GPU (which pretty much means you have to make crap).
 
Why can't Cell create vertex data and pass it straight to RSX over FlexIO without writing to RAM? I thought that as the whole point of this direct link.
 
ShootMyMonkey said:
Also, it might be the case that Cell itself would have to write the results of the procedural calculations back to RAM, before resending them to RSX, which makes things a lot slower, and RAM space would be used up quickly, however i'm not sure how often that would happen.
If you're going to be doing procedural on the CPU and wanting to use it on the GPU, it always has to go into memory somewhere, even if you were to send it directly to the VRAM. If you actually want to save memory, you either have to make a vertex-level operation (which means the generated data is stored in the vertex data you anyway have) or you have to generate on the GPU (which pretty much means you have to make crap).

So i guess that answers my question as "very often".

Wouldn't it be possible to, say, bake something on Cell, send it straight to RSX and get it rendered on the fly with everything else in the frame?
 
Mythos said:
What about the microthreading feature from rambus will that be a feature for the PS3?

I think that's more an XDR-2 thing..

Also, about pushing data from Cell to memory before going to RSX - I guess that'd be how it'd have to happen anyway. RSX may not be ready for the data at the time Cell has finished with it, and it can't rightly hang around in Cell's on-chip memory waiting for RSX to take it, so it makes more sense to place it out in memory for RSX to pick up when it next needs it. That'd be no different than taking in any other "static" vertex or texture data as far as RSX is concerned.
 
Why can't Cell create vertex data and pass it straight to RSX over FlexIO without writing to RAM? I thought that as the whole point of this direct link.
Are you joking? How much data do you think goes into a single vertex? It's not like you can send a vertex a part at a time. A GPU will not do anything with partial data -- it wants a whole vertex at a time. And even if you create a vertex and send it off right away, what does a single vertex really represent? Really, you'd have to generate an entire renderpacket at a time and that can easily be a few KB of data, which means that even if always stays in cache with no writebacks to memory, you're going to have to at least allocate a block of memory for it. In theory, you could create this entirely on the SPE SRAMs and send it off a packet at a time that way, but then footprint becomes the issue. Character packets are small, but not, for instance, terrain packets.

Even otherwise, if you could do some kind of on-demand vertex streaming over FlexIO, that basically amounts to a hell of a lot of API calls. Each of those API calls has huge overhead. If you do what you suggest with OpenGL, you're easily going to be dead in the water at something like 50,000 polys per frame... with DirectX, it's probably more like 20,000.

Wouldn't it be possible to, say, bake something on Cell, send it straight to RSX and get it rendered on the fly with everything else in the frame?
Well, as I said above -- it's basically a matter of how big a thing you want to "bake." The other thing is that if it's something persistent like, say, a skydome... yeah, you can generate that in code, it will fit nicely in the SPE's local SRAM, and you can send it off all in one shot, but why would you bother wasting computational time every frame for something that only needs to be generated once?
 
ShootMyMonkey said:
Why can't Cell create vertex data and pass it straight to RSX over FlexIO without writing to RAM? I thought that as the whole point of this direct link.
In theory, you could create this entirely on the SPE SRAMs and send it off a packet at a time that way, but then footprint becomes the issue. Character packets are small, but not, for instance, terrain packets.
That's what I was thinking. The idea is for adaptive procedural geometry, always changing, and where a static model should be kept in RAM and fetched, this way changing models don't need to interfere with the main RAM BW.

But if what you say, that "If you're going to be doing procedural on the CPU and wanting to use it on the GPU, it always has to go into memory somewhere, even if you were to send it directly to the VRAM," the what's the FlexIO connection for? Just post-processing? KK even mentioned making use of this direct communication for working on the same Vertex data. I thought that's what it's there for :?
 
ShootMyMonkey said:
The other thing is that if it's something persistent like, say, a skydome... yeah, you can generate that in code, it will fit nicely in the SPE's local SRAM, and you can send it off all in one shot, but why would you bother wasting computational time every frame for something that only needs to be generated once?

Really? :D

Maybe if you want a boring static sky :p

What if you wanted realistically modelled clouds that morph and billow as they pass overhead? The "passing overhead" bit at least wouldn't be too intensive - just recalculate things every few frames or so since the sky is probably slow moving - but if you wanted instantaneous response (for example, when I wave this wand, I want a thunderstorm pronto!), modelling that would require more regular update. Of course, you can do such things without realistic or procedural modelling, but it's just not as unique then ;)

Sky Domes, though, are a reasonably "entry-level" usage for procedural CPU-involvement (at least for the "passing overhead" bit - not necessarily ultra realistic cloud modelling!) :devilish: I'm wondering what kind of information can be encoded in data to be passed to RSX and how much Cell can handle. It'll surely be very interesting..


Shifty Geezer said:
But if what you say, that "If you're going to be doing procedural on the CPU and wanting to use it on the GPU, it always has to go into memory somewhere, even if you were to send it directly to the VRAM," the what's the FlexIO connection for?

Yeah, I'm slightly confused too. On the one hand I can appreciate the issues involved in chip-to-chip direct communication/synchronisation etc. and how that can't be taken for granted, but on the other hand I then wonder why, for example, Cell doesn't just connect directly to VRAM and forgoes the Flexio connection altogether. Or if UMA was ever on the cards, why they don't just both have their own seperate busses into one pool of memory (one less bus than in the current setup!).
 
ShootMyMonkey said:
In theory, you could create this entirely on the SPE SRAMs and send it off a packet at a time that way, but then footprint becomes the issue. Character packets are small, but not, for instance, terrain packets.
Erhm - there's no 'theory' here, it's been done this way for the last 5 years on PS2, with a measly 16KB of available memory on VU1.
Heck terrain renderers are probably one of the best showcases of this type of processing on VU1, and SPEs are only more capable.
 
Status
Not open for further replies.
Back
Top