Can the RSX framebuffer be split between GDDR3 and XDR RAM?

Shifty Geezer said:
Can't you render to texture on the eDRAM, then read that back in to post-process? There's a 2-way bus there so I can't see why not. Unless the eDRAM isn't addressable as texture-space for reads so there's no way to get the data directly, and as you say it has to be copied to GDDR before it can be processed.

From Dave's article:

Render to texture operations will also be rendered out to the eDRAM first and then read out to UMA memory, when complete, in order to be used as a texture surface for the final frame rendering.

http://www.beyond3d.com/articles/xenos/index.php?p=04
 
nAo said:
All details are already there since I was talking about frame buffer effects: in the vast majority of cases you'll end up being texture bw limited and last time I checked Xenos does not have 2 separate busses to fetch textures.

Actually, my post processing pipeline on 360 is not texture bw limited. It tends to be shader limited according to PIX, since I'm running shaders made of up to 200 ALU instructions and 20/30 fetches. That's where the 48 alu units comes handy. The big problem on the 360 is to reduce the number of temporary registers used by the shader to help hiding latency.

But with simple filter kernels and simple tonemapping, I can believe a post processing pipeline can quickly become texture bw limited. I think it all depends on what we are trying to do. I wouldn't claim that one is better than the other in every situation.
 
nAo said:
Sure, but as soon as you get serious with texture bw (thus you are texture bw limited) the ratio between color and texture bw is so small that it gets almost not relevant as the color bw just costs as much as another texel to sample..

That's very true indeed. For example, I can never see the EDRAM resolution being a performance issue during post processing, neither the bandwidth to the EDRAM.
 
Mintmaster said:
Xenos doesn't have a bandwith advantage for "framebuffer effects" (assuming that term is referring to post-processing). The eDRAM can't be read as a texture source, so the data must be copied out to the GDDR3 first. That extra copy takes time and bandwidth, and you don't get any performance gain over a system without eDRAM for these particular effects. For this reason he's probably right about Xenos being a bit slower.

Xenos' advantage is in the actual scene rendering before you apply these effects.

that would be true but after you take into account the fillrate factor where rsx without eDram would lose bandwidth over 4xAA applied compared to no loss on 360 (when using predicated tiling like most games would after the recent update of the API), the after effects factor bandwidth issue would be negated to the point that xenos has the advantage as AA (in terms of after application of effects) takes up alot of bandwidth.

wouldnt it be also an issue that there is a situation where the 360 UMA is 22.4 GB/s between the CPU/GPU while there is a whole other bandwidth procedure related to pixel bandwidth which is seperate in 360 at 36 or 256GB/s which would leevay a further advantage to 360 GPU.


I am interested in seeing the Heavenly Sword having more knowledge about what the 360 GPU can and cannot do when I dont think those guys even have a 360 kit so I wouldnt vouch for a 100% fair comment regardless its coming from a Sony 2nd party developer nomatter how honest and good the game itself is.
 
spdistro said:
I am interested in seeing the Heavenly Sword having more knowledge about what the 360 GPU can and cannot do when I dont think those guys even have a 360 kit so I wouldnt vouch for a 100% fair comment regardless its coming from a Sony 2nd party developer nomatter how honest and good the game itself is.

I don't think Marco is being unfair in his comment here, since he's just giving his opinion based on the techniques and shaders he is using. I think it's fair to say that in his situation the RSX might have an advantage on the R500, while I'd say the opposite in my situation with the shaders I'm using.

I think we are simply naturally optimising for the architecture we mostly develop for and assume that the other architecture is not quite as good at running that technique. Which is fair.

I wouldn't draw general conclusions from our experience though.
 
[maven] said:
Mostly generated I should think.
In that case, could one Xecpu core or a Spe be used to generate some of the textures needed for this operation and alleviate the hit on memory Bw? I mean, could both Xenos and rsx directely consume (by cache locking or acessing Ls) that kind of data?
 
Fran, just want to say welcome. It's nice to have someone working on the 360 side of development to answer some of the questions that come up here (sorry if i missed anyone else).
 
vblh said:
Fran, just want to say welcome. It's nice to have someone working on the 360 side of development to answer some of the questions that come up here (sorry if i missed anyone else).

Agreed, we would love to hear more about Xenos around here, your input is greatly appreciated.
 
scooby_dooby said:
Agreed, we would love to hear more about Xenos around here, your input is greatly appreciated.
absolutely

we hear a lot from the Sony dev camp (also greatly appreciated :)) with their speculations on Xenos but few and far between real time xenos stuff.

thanks
 
spdistro said:
that would be true but after you take into account the fillrate factor where rsx without eDram would lose bandwidth over 4xAA applied compared to no loss on 360 (when using predicated tiling like most games would after the recent update of the API), the after effects factor bandwidth issue would be negated to the point that xenos has the advantage as AA (in terms of after application of effects) takes up alot of bandwidth.
Remember that we're only talking about framebuffer effects. There's no AA work here, and you operate on the resolved framebuffer. eDRAM as implemented in XB360 doesn't help for post-processing.
I am interested in seeing the Heavenly Sword having more knowledge about what the 360 GPU can and cannot do when I dont think those guys even have a 360 kit so I wouldnt vouch for a 100% fair comment regardless its coming from a Sony 2nd party developer nomatter how honest and good the game itself is.
The HS devs are not out of line at all for standard effects like bloom and tone-mapping even if they don't have the hardware. However, dynamic branching can really make a difference in some post-processing effects. Motion blur and depth of field, to name two, have adaptive kernel sizes, so there's no need to take the same number of samples everywhere. That can really reduce the work that needs to be done.
 
Last edited by a moderator:
Mintmaster said:
Remember that we're only talking about framebuffer effects. There's no AA work here, and you operate on the resolved framebuffer. eDRAM as implemented in XB360 doesn't help for post-processing.


hm.. ok, maybe this was answered in the jungle of technobabble I don't understand above:

Q: What (reasonable) hardware features/extensions to eDRAM would developers/you want for assisting post-processing?
 
Fran said:
I think we are simply naturally optimising for the architecture we mostly develop for and assume that the other architecture is not quite as good at running that technique. Which is fair.

Couldn't agree more. Would you be able to share what works well with Xenos in practice ?
I have seen many good feedback about Xenos's high triangle setup rate, unified architecture with 48 ALUs, free 4xAA with eDRAM and MEMEXPORT; also somewhat negative comments about FP10 limitations, 22.5 GB/s shared (contentious) bandwidth between CPU and GPU, plus predicated tiling constraints. In general, they are fragmented discussions, mostly by non-XB360 programmers.

spdistro said:
that would be true but after you take into account the fillrate factor where rsx without eDram would lose bandwidth over 4xAA applied compared to no loss on 360 (when using predicated tiling like most games would after the recent update of the API), the after effects factor bandwidth issue would be negated to the point that xenos has the advantage as AA (in terms of after application of effects) takes up alot of bandwidth.

wouldnt it be also an issue that there is a situation where the 360 UMA is 22.4 GB/s between the CPU/GPU while there is a whole other bandwidth procedure related to pixel bandwidth which is seperate in 360 at 36 or 256GB/s which would leevay a further advantage to 360 GPU.

spdistro has a good start here by lining up the pipeline in a more comprehensive fashion and highlighting XB360's strength. Could you elaborate more please (possibly in a standalone thread since this is OT) ? please~~ :)
 
Fran said:
Actually, my post processing pipeline on 360 is not texture bw limited. It tends to be shader limited according to PIX, since I'm running shaders made of up to 200 ALU instructions and 20/30 fetches. That's where the 48 alu units comes handy. The big problem on the 360 is to reduce the number of temporary registers used by the shader to help hiding latency.

But with simple filter kernels and simple tonemapping, I can believe a post processing pipeline can quickly become texture bw limited. I think it all depends on what we are trying to do. I wouldn't claim that one is better than the other in every situation.

If I understand you correctly you're saying ALU op heavy shaders benefit from Xenos's unified shader model during post-processing since you are bound more by ALU ops than texture reads/writes. Unlike where you feel simpler shaders maybe become bound by texture bandwith it is not so much the case for you given the shader is busy working within the temp registers during intermidiate work more than anything else for you.

Given RSX's architecture (which of course is not 1:1 with the Xenos VPU of course) even though the ALU ratio dedicated to pixel and vertex work is fixed (and thus they sit idle while under utilized) there are quite a few ALUs to toss at similar methods to which you are using. Since as I understand each ALU in Xenos can issue a vec4+scalar, RSX in turn can with it's 24 ALUs dedicated to pixel work dual issue 2 vec3+scalar instructions while the vertex pipes can issue a vec4+scalar instruction simultaneously. It seems to me despite under utilization of the available ALUs at times this does not mean the power to handle ALU op heavy shaders just the same as with Xenos is not there in RSX.

Barring a lack of register space of course and ignoring dynamic branching for the moment.

My question is this. As I take it using ALU op heavy shaders is something that works well on Xenos, is it your opinion that doing the same on RSX is not and why do you feel that way if you do? Or could you elaborate why it may not be such a bad thing on RSX but is significantly better on Xenos as far as performance in the end goes?
 
patsu said:
I am interested in seeing the Heavenly Sword having more knowledge about what the 360 GPU can and cannot do when I dont think those guys even have a 360 kit so I wouldnt vouch for a 100% fair comment
They had 360 kits before PS3 ones (obviously they no longer do now).
Besides, guys like nAo, Deano and a few others here make it a point to keep informed about what technologies are out there, even if we don't work with them directly.
Sticking your head in the sand is never beneficial in this industry (though I've seen people still do it reardless).

Fran said:
I wouldn't draw general conclusions from our experience though.
Aside for ports/cross platform titles, generalizations across closed platforms are mostly meaningless anyhow.
If you're exclusive, you tailor the workloads to the target hw, not trying to shoehorn hw into performing some arbitrary chosen workload.

Alstrong said:
Q: What (reasonable) hardware features/extensions to eDRAM would developers/you want for assisting post-processing?
There are no special extensions needed - just take the eDram configuration of PS2, or better, PSP. That's pretty much as good as it gets for postprocessing as well as all render-to-texture stuff.
I'd also vote PSP as far as the whole ram layout goes (but not the bus configuration), neither PS3 nor 360 are my preferred choice there.
 
it's hm.. pretty easy to get full X360 documentation...You don't have to be an official x360 dev to get some curiosity satisfied...
 
scificube said:
Given RSX's architecture (which of course is not 1:1 with the Xenos VPU of course) even though the ALU ratio dedicated to pixel and vertex work is fixed (and thus they sit idle while under utilized) there are quite a few ALUs to toss at similar methods to which you are using. Since as I understand each ALU in Xenos can issue a vec4+scalar, RSX in turn can with it's 24 ALUs dedicated to pixel work dual issue 2 vec3+scalar instructions while the vertex pipes can issue a vec4+scalar instruction simultaneously. It seems to me despite under utilization of the available ALUs at times this does not mean the power to handle ALU op heavy shaders just the same as with Xenos is not there in RSX.
Something to bear in mind which isn't analoguous between the two architectures is that where each of Xenos's ALUs are the same, RSX's arent. If you just had a shader full of MADDs then you'd get full ALU utilisation out of it, but there are other instructions that are different between the two ALU's in G7x.
 
Dave Baumann said:
Something to bear in mind which isn't analoguous between the two architectures is that where each of Xenos's ALUs are the same, RSX's arent. If you just had a shader full of MADDs then you'd get full ALU utilisation out of it, but there are other instructions that are different between the two ALU's in G7x.

I understand this. I'm just asking for his perspective here :)
 
_phil_ said:
it's hm.. pretty easy to get full X360 documentation...You don't have to be an official x360 dev to get some curiosity satisfied...

documentation would be an empty glass, writing code would be glass half full, debugging the game would be a full glass
 
Fafalada said:
They had 360 kits before PS3 ones (obviously they no longer do now).
I know Deanos early HS development and discussion were based on Radeon 9800 and X800 but I don't recall anything centering around final dev kits. Do you know if they did get those?
 
Back
Top