CELL and transparent surface rendering

chris1515

Legend
Supporter
I see this nAo post on another forum but I know, he had already express this idea on B3D:

For non opaque surfaces edram is exceptional as it gives you so much realworld fill rate to burn..but it seems to me that you have to completely design a GPU around that, while a simple on chip buffer that cover a small portion of screen (let say 64x64 pixels) would be enough to simply render transparent primitives (that usually are the vast minority of the on screen primitives) using the CPU to tile them (imhp 360 and PS3 CPUs would be extremely good at that..)
I have some questions for Cell developers if they can answer

He is speaking about a small tile cache on an hypothetical GPU but is it possible to do the same things on a SPE?

And if it is possible I understand one advantage ( SPE bandwidth) but what are the limitations?

Warhawk engine render volumetric raymarching cloud via CELL but it seems that not all the alpha blending are rendering via the CELL.

Do you think it will be better to divide the alpha blending between the CELL and the GPU or to do all the work on CELL?

And do yo know some PS3 title with alpha blending on CELL? I don't need title name.

It is a lot of question

thanks in advance

Chris
 
Last edited by a moderator:
When I read that comment..or one like it before, I thought the suggestion was to take advantage of framebuffer cache with tiling like that, which I believe GPU's/RSX does have..(?)

Still, that asides, many of chris1515's questions stand, whether you can use any framebuffer cache in such a manner or not, since they mostly relate to Cell doing that kind of processing, or part of it.

I'm not a developer, but judging from what's been said before, you could do all your transparent surfaces on Cell, or just some (though that may be more complicated), but whether it would make sense to do so instead of just leaving it all to RSX would probably depend on the game. In some situations it might make a lot of sense, in others it may not at all. I'm not sure if any developer has publicly spoken about doing this specifically.
 
He is talking about a hypothetical system - not the PS3. There is no 64x64 pixel buffer on the RSX.

I know.

My question is about the CELL. If someone renders the alpha on CELL, they probably need to tile the transparent primitive on the SPE local store.
 
Last edited by a moderator:
The idea doesn't seem to be totally new here are some more thoughts about this:

> Slightly off-topic, but a thought that occured to me in this regard was to
> tile rendering. Basically, do a logical divide of the framebuffer into
> rectangles of, say, 64x64 pixels. During rasterization, all primitives are
> split according to those tiles and rendered separately. This has some
> advantages:
>
> a) It could help reduce the interpolation issues you mentioned. It's
> obviously not a magic bullet, but it can avoid the need for insane
> precision in inner loops.
> c) Better control of the size of scratch structures, possibly even better
> caching behaviour.
> b) One could build a multi-threaded rasterizer (where work queues are per
> framebuffer tile), which is going to become all the more interesting once
> dualcore CPUs are widespread.
Slightly different application I know, but the idea of splitting the work load and the caching behaviour may be similar. :)
 
And if it is possible I understand one advantage ( SPE bandwidth) but what are the limitations?
Anything is theoretically possible, but rendering on CELL is not in any way a good idea. Even for having 7 SPEs, you simply don't have to ability to hide the latencies associated with random access in textures. There simply aren't enough SPEs or thread contexts per SPE to fill in those gaps. Maybe 100 SPEs will do the trick, but not 7.

Things like the raytracing demo still only really worked because it contained only a single heightmapped scene element and a single texture image, which is pretty much the limits of CELL as it is for rendering. And even otherwise, it's completely useless for a game as a demo that runs nothing more than rendering at a few fps and needs to buffer off a few frames ahead to account for unevenness in performance, means it is nowhere near fast enough for a real game.

Warhawk engine render volumetric raymarching cloud via CELL but it seems that not all the alpha blending are rendering via the CELL.
They don't render ANYTHING using CELL. CELL is used for computing the lighting on the particles and raycasts to test how much of the rest of the cloud's particles (which are probably represented as spheres internally) are intersected by the shadow rays to determine absorption/scattering of light. Rendering is still done by the GPU.

Do you think it will be better to divide the alpha blending between the CELL and the GPU or to do all the work on CELL?
Neither. Cell is for computation. There will never be a PS3 game containing software rendering functions that aren't entirely trivial performance-wise (e.g. UI), and even then, the CPU isn't going to do any alpha blending -- it will at best fill in a texture with an alpha channel blended onto the screen at the GPU. After all, alpha blending is an operation that involves reading from the framebuffer and then writing a blended result back to the same pixel(s).

All the talk about Cell aiding the GPU is about taking computational load off the GPU or preprocessing geometry to save GPU time or precalculating something so that the GPU has something to draw that looks better than it might have otherwise. Things like Warhawk's clouds are faster done on the CPU because it's so dependent on the state of the world, which is a large dataset that the CPU has full access to without having to pervert the datastream into something else to suit the GPU.
 
He is speaking about a small tile cache on an hypothetical GPU but is it possible to do the same things on a SPE?
It would be possible to do the same on a CPU as CELL, but I can't see how one can make it efficient.

Do you think it will be better to divide the alpha blending between the CELL and the GPU or to do all the work on CELL?
No, I don't, in the general case RSX would be much faster than CELL at that..it's not a good way to shift a workload from processor to another one.
 
Thanks Shootmymonkey and nAo. Ok the CELL is useful for preprocessing, giving some help for vertex shading if needed and maybe postprocessing but not for the rendering.
 
Last edited by a moderator:
They don't render ANYTHING using CELL. CELL is used for computing the lighting on the particles and raycasts to test how much of the rest of the cloud's particles (which are probably represented as spheres internally) are intersected by the shadow rays to determine absorption/scattering of light. Rendering is still done by the GPU.

As was presented at GDC, it sounded like they were going the last mile on Cell with the clouds and actually rendering them there, then compositing with the RSX rendering:

Dylan Jobe said:
These clouds I'm flying through right now are being rendered using a volumetric software ray-tracer running on another cluster of SPUs, and really this is a paradigm shift for us, because it is the first time we are mixing Cell-based software rendering, with RSX-based hardware rendering. Certainly, you don't need to use the Cell for software rendering - you can do so if you choose.
 
As was presented at GDC, it sounded like they were going the last mile on Cell with the clouds and actually rendering them there, then compositing with the RSX rendering:
I don't know about the GDC presentation, but when I read through the sketch and listened in on one of Sony's local presentations (which was a private affair they gave here at the office), they made it sound more like they were computing cloud lighting by raycasting and checking absorption and then computing scattering and just rendering the particles on the GPU. Apparently, they light per particle vertex (my original guess was per particle) to get nice gradations. Since the cloud particle textures they use are alpha-only, it works out nicely.

I don't know which to believe for sure, but simply lighting particles with raycasts sounds a lot more plausible and reasonably cheap. That too, they wouldn't be the first to do it -- they'd just be the first to write an SPE-driven parallel version on a large scale.
 
Question for either ShootMyMonkey or nAo

Not to get way off topic here, but I'm a little confuse. :???:

When a person like, Julian Eggerbrecht (Factor 5 / Lair) states on more than one occasion; The Cell generates all the polygonal models in the game, from the soldiers, creatures, and terrain (32 Kilometers square) while handling all the A.I. He also stated awhile back (video interview) they haven’t begun to dig into the RSX yet.

The word “generatesâ€￾ IMO seems to imply rendering by the Cell. I just don’t see him bragging about something like “calculationsâ€￾ …when that’s something the CPU is suppose to do anyway.

Is this PR FUD? Or am I missing something here? :oops:
 
maybe he meant they generate geometry with SPUs (think about procedural geometry or also common meshes that go through some kind of progressive refinement scheme) and then they send everything to the GPU (they could also pre cull non visibile geometry..) Why would one start to render stuff with CELL (it would take quite a while just to have a running sw shader/rasterizer) instead of using the GPU is beyond me..it does not make any sense.
 
The Julian Eggerbrecht interview I saw it was very clear that RSX was rendering the polys.

He was talking about the decision to go to 1080p and the fill rate hit. And he said that the fillrate of the RSX is satisfactory for 1080p as long as you use Cell to process the geometry and take care of culling before sending it to the RSX. And that is exactly what they are doing in Lair.
 
When a person like, Julian Eggerbrecht (Factor 5 / Lair) states on more than one occasion; The Cell generates all the polygonal models in the game, from the soldiers, creatures, and terrain (32 Kilometers square) while handling all the A.I. He also stated awhile back (video interview) they haven’t begun to dig into the RSX yet.

The word “generates” IMO seems to imply rendering by the Cell. I just don’t see him bragging about something like “calculations” …when that’s something the CPU is suppose to do anyway.

Is this PR FUD? Or am I missing something here? :oops:
To me, there isn't a single word in there that suggests "rendering on the CPU." "Generating polygonal models" in no uncertain terms says that the CPU is doing pretty much all the geometry work... in essence, taking the *vertex* shading load off the GPU. If they'd said "drawing all the polygonal models to the screen", I might agree with you. And what they are doing is plenty to brag about because of the sheer quantity of work that amounts to and that they're doing it every frame. It's not as though just any CPU can do any computational load like it was nothing just because "that's what CPUs do."

And "not digging into the RSX" is still several thousand leaps away from "not using the RSX." So yeah, you're missing something.
 
maybe he meant they generate geometry with SPUs (think about procedural geometry or also common meshes that go through some kind of progressive refinement scheme) and then they send everything to the GPU (they could also pre cull non visibile geometry..) Why would one start to render stuff with CELL (it would take quite a while just to have a running sw shader/rasterizer) instead of using the GPU is beyond me..it does not make any sense.

Could one of the reason be (mostly rumor on this part) is that Factor Five wants to pretty much preserve most of the RSX for pixel shader data. Thus; allowing the Cell SPUs too take care mostly if not all the vertex data and other chores.

Not to bad mouth Lair or Factor Five or anything…but Lair does look pretty raw at times.

Thanks for your reply nAo!! :D
 
To me, there isn't a single word in there that suggests "rendering on the CPU." "Generating polygonal models" in no uncertain terms says that the CPU is doing pretty much all the geometry work... in essence, taking the *vertex* shading load off the GPU. If they'd said "drawing all the polygonal models to the screen", I might agree with you. And what they are doing is plenty to brag about because of the sheer quantity of work that amounts to and that they're doing it every frame. It's not as though just any CPU can do any computational load like it was nothing just because "that's what CPUs do."

And "not digging into the RSX" is still several thousand leaps away from "not using the RSX." So yeah, you're missing something.


I thought so... :oops:

Thanks ShootMyMonkey!!

inefficient said:
The Julian Eggerbrecht interview I saw it was very clear that RSX was rendering the polys.

He was talking about the decision to go to 1080p and the fill rate hit. And he said that the fillrate of the RSX is satisfactory for 1080p as long as you use Cell to process the geometry and take care of culling before sending it to the RSX. And that is exactly what they are doing in Lair.

Inefficient I must not have seen that interview...I'll look it up.

Thanks!!
 
Last edited by a moderator:
Back
Top