There is a debate about whether RSX can really take advantage of "aggregate" bandwidth due to the expected situation with the framebuffer.
I'm wondering if this is more a matter of creativity than limitation so I've thought of some ideas and I'd like to know why or they're not possible and some ideas as to their merit.
----------
A common idea is placing texture and vertex data in XDR so that RSX could further utilize the available aggregate bandwidth.
Why not place the front buffer in XDR as well? RSX would access the front buffer for full screen post processing effects such as DoF, HDR blooms and such and the number of accesses to the front buffer should eat some decent bandwidth for these affects. Wouldn't this save on ROP work and ultimately leave more bandwidth for the back buffer? Lastly, wouldn't this be a good place for the front buffer if you elected to use Cell for post processing effects (additive on top of RSX even ) vs. having to contend with the added latency of going to VRAM with Cell to the same end?
----------
Why not exchange capacity in RAM for bandwidth to RAM by using more than one framebuffer...or rather backbuffer? (actually if VRAM bandwith is not saturated it could be looked at as eating capicity to free capacity in memory.)
This would probably require some work to pull off effectively and efficiently but maybe something like as follows could work.
1) Both buffers are exact copies.
2)Use split screen rendering to avoid double work
- split screen unevenly; VRAM buffer used for majority of what's in
the frustum because latency is lower to VRAM etc.;buffer in XDR is
is used for less of the scene conversely.
-use scissor tests to split the rendering
-the split is set dynamically based on load to maximise utilization of
aggregate bandwidth
3)Resolve to front buffer in later stages and continue with post processing etc
Alternative:
1) Buffers are not exact copies. i.e. one is FP16 one is FP32 or INT8 etc.
2) Screen is not split.
-Buffers are used for different stages in the rendering pipeline;
opaque geometry, high quality blending effects, low quality blending
effects etc.
-to be effective at least 2 stages in the rendering pipeline must be
active
-stages with higher backbuffer accesses are handled with the buffer
in VRAM; stages with lower backbuffer accesses are handled with
the buffer in XDR
-if possible have more stages active in the pipeline to further utilize
aggregate bandwidth to RSX.
3)Resolve to front buffer as above or use post processing as an active pipelined stage being worked on concurrently.
-----------------
Cell assisted TBDR of a portion of the scene to lower load on RSX and thus RSX's bandwidth needs
1) LS's act as on chip cache like in Kyro series
2) Cell transforms all geometry for a portion of the scene
3) Ray casts that portion of the scene for occlusion culling
4) skins, and shades geometry
5) Combine or overlay Cell rendered portion of the scene with RSX generated front buffer representing the rest of the scene in the frustum and output.
Cell may be able to handle it all given it doesn't have to render the whole or even most of the scene helping offset the cost of raycasting (simple-only for visibility tests, and simple lighting...the of the graphic work by other means) and the fact that Cell still is not a GPU some pixel shader like effects etc. will be handled less efficiently by Cell.
Alternative 1: Don't allocate a portion of the scene to Cell but rather intelligent select an amount of objects or environmental geometry just lower than the bound the processing you have left available on Cell to be. Selection could be dynamic or these objects etc. could be tagged in advance to remove overhead as long as things are within a range of predictability.
Cell will need to over lay what it's responsible for in the scene into the front buffer image before it's displayed and to do so correctly mostly like it will need to be done with it's work doen just before any full screen post processing affects like DoF are calculated and after and work leading up to post processing begins. If post processing isn't used but instead such affects happen elsewhere along the pipeline this idea may not be viable...not to say it is in the first place.
I've wondered of selective rendering by Cell of any sort could be used in conjunction with RSX's output to really make cut-scenes even more outstanding than expected. Err...I guess that's a question?
Alternatives2: Have Cell handle a stage or two in the rendering pipeline to save RSX work and have Cell consume XDR bandwidth leaving RSX more available bandwidth to VRAM. RSX gets pre-baked data that it can continue work on.
Alternative 3: Have Cell handle only part of the work of a stage(s). via method 1 or alternative 1.
Alternative 4: If TBDR won't work then by some other means.
-----------------------------------
Well that's my mental exercise for the day...any sense to anything I said?
Any interesting ideas out there still?
I'm wondering if this is more a matter of creativity than limitation so I've thought of some ideas and I'd like to know why or they're not possible and some ideas as to their merit.
----------
A common idea is placing texture and vertex data in XDR so that RSX could further utilize the available aggregate bandwidth.
Why not place the front buffer in XDR as well? RSX would access the front buffer for full screen post processing effects such as DoF, HDR blooms and such and the number of accesses to the front buffer should eat some decent bandwidth for these affects. Wouldn't this save on ROP work and ultimately leave more bandwidth for the back buffer? Lastly, wouldn't this be a good place for the front buffer if you elected to use Cell for post processing effects (additive on top of RSX even ) vs. having to contend with the added latency of going to VRAM with Cell to the same end?
----------
Why not exchange capacity in RAM for bandwidth to RAM by using more than one framebuffer...or rather backbuffer? (actually if VRAM bandwith is not saturated it could be looked at as eating capicity to free capacity in memory.)
This would probably require some work to pull off effectively and efficiently but maybe something like as follows could work.
1) Both buffers are exact copies.
2)Use split screen rendering to avoid double work
- split screen unevenly; VRAM buffer used for majority of what's in
the frustum because latency is lower to VRAM etc.;buffer in XDR is
is used for less of the scene conversely.
-use scissor tests to split the rendering
-the split is set dynamically based on load to maximise utilization of
aggregate bandwidth
3)Resolve to front buffer in later stages and continue with post processing etc
Alternative:
1) Buffers are not exact copies. i.e. one is FP16 one is FP32 or INT8 etc.
2) Screen is not split.
-Buffers are used for different stages in the rendering pipeline;
opaque geometry, high quality blending effects, low quality blending
effects etc.
-to be effective at least 2 stages in the rendering pipeline must be
active
-stages with higher backbuffer accesses are handled with the buffer
in VRAM; stages with lower backbuffer accesses are handled with
the buffer in XDR
-if possible have more stages active in the pipeline to further utilize
aggregate bandwidth to RSX.
3)Resolve to front buffer as above or use post processing as an active pipelined stage being worked on concurrently.
-----------------
Cell assisted TBDR of a portion of the scene to lower load on RSX and thus RSX's bandwidth needs
1) LS's act as on chip cache like in Kyro series
2) Cell transforms all geometry for a portion of the scene
3) Ray casts that portion of the scene for occlusion culling
4) skins, and shades geometry
5) Combine or overlay Cell rendered portion of the scene with RSX generated front buffer representing the rest of the scene in the frustum and output.
Cell may be able to handle it all given it doesn't have to render the whole or even most of the scene helping offset the cost of raycasting (simple-only for visibility tests, and simple lighting...the of the graphic work by other means) and the fact that Cell still is not a GPU some pixel shader like effects etc. will be handled less efficiently by Cell.
Alternative 1: Don't allocate a portion of the scene to Cell but rather intelligent select an amount of objects or environmental geometry just lower than the bound the processing you have left available on Cell to be. Selection could be dynamic or these objects etc. could be tagged in advance to remove overhead as long as things are within a range of predictability.
Cell will need to over lay what it's responsible for in the scene into the front buffer image before it's displayed and to do so correctly mostly like it will need to be done with it's work doen just before any full screen post processing affects like DoF are calculated and after and work leading up to post processing begins. If post processing isn't used but instead such affects happen elsewhere along the pipeline this idea may not be viable...not to say it is in the first place.
I've wondered of selective rendering by Cell of any sort could be used in conjunction with RSX's output to really make cut-scenes even more outstanding than expected. Err...I guess that's a question?
Alternatives2: Have Cell handle a stage or two in the rendering pipeline to save RSX work and have Cell consume XDR bandwidth leaving RSX more available bandwidth to VRAM. RSX gets pre-baked data that it can continue work on.
Alternative 3: Have Cell handle only part of the work of a stage(s). via method 1 or alternative 1.
Alternative 4: If TBDR won't work then by some other means.
-----------------------------------
Well that's my mental exercise for the day...any sense to anything I said?
Any interesting ideas out there still?