Can the RSX framebuffer be split between GDDR3 and XDR RAM?

bleon said:
The article into great detail about how these great looking effects are created. I'm really surprised that Marco was so open about the methods they used, especially since they seem to have done some research and come up with a solution thats pretty ingenious.

Aren't developers concerned about giving away secrets to their competitors? For example, I couldn't imagine Namco revealing their secrets to better IQ and AA on the PS2.
I know in these days of screwing people over software patents it seems strange any tricks of the trade would come out, but generally devs support each other, like in many creative fields. That's why you have developer conferences, where devs share their experiences and solutions in a hope to help others. There's a lot of research out there for research's sake that is shared in the hope of advancing the field. NAO32 is a progression of ongoing research in the field, building on the work of others, which in it's turn will likely be progressed further by other parties too.

You'll probably find that a good number of devs know exactly how Namco get their 'better IQ' on PS2 games but the techniques aren't useable for their games which place different demands on the system.
 
Ostepop said:
Because Xenos has a dedicated bus for pixel bandwidth.

The two busses on PS3 are the main Cell/XDR at 25.6GB/s (which FlexIO can access at a lower rate) and the 22.4GB/s for RSX/GDDR3 - here you can achieve a number of combinations, for example: Cell/XDR - system, RSX/GDDR3 - texture / pixel; Cell/XDR - system / texture, RSX/GDDR3 - texture / pixel; etc. With Xenon we have a 22.4GB/s UMA for system and graphics, and another 32GB/s(or 256GB/s) purely for pixel.

scificube said:
If pixel fill is not the limit then what does that matter?
Over the balance of operations for a modern graphics process pixel fillrate is the largest consumer of bandwidth - at least according to any of the engineers and independants I've spoken to. Pixel consumption is the primary user and texture the secondary for local bandwidths; which of course is why Xenos is designed in the fashion it is and why multiple consoles beforehand have also utilised some kind of eDRAM or fast on-chip pixel processing (i.e. PowerVR tiling).

If you're referencing the eDram I would think that has to do with accessing buffers...not texture access unless what you're saying is since buffer access is not something done in system memory Xenos has more free from the 22.4Gb/s available from system memory than RSX has from VRAM (22.4GB/s)+ XDR (measured 26.1GB/s theoretical 35GB/s) - framebuffer usage - Cell comsumption + whatever texture caches alleviate for texturing.
You'll note that I said local. And, yes, where that pixel data goes to the eDRAM on Xenos thats an alleviation of bandwidth on the UMA for other operations such as texturing (relative to not having the eDRAM).

As a note: You've got 26.1GB/s of measured FlexIO bandwidth, which I assume is a sum of the read and write bandwidths here; however I would have thought that this wouldn't be concurrent bandwidth, but individually measured in either direction and the concurrent bandwidth would be less than this (not least because this figure actually exceeds the maximum XDR memory bandwidth that it is ultimately reading and writing from according to those tests).

Lastly, it seems to me that pixel shaders often sample textures during intermediate work much more often then they output pixels in the end of course barring blending effects which are fillrate consumers...but then...why do you have to do both at the same time anyway?
Although G7x has skewed it slightly, if you look at the configurations or graphics processors overtime its the otherway around - the ratio of texture processors has decreased in relation to the number of ROPs. Not only that, but ROP's have generally increased in sample rate per ROP whereas textures have more or less stayed the same. MSAA is also a consideration here.
 
  • Like
Reactions: Geo
Over the balance of operations for a modern graphics process pixel fillrate is the largest consumer of bandwidth - at least according to any of the engineers and independants I've spoken to. Pixel consumption is the primary user and texture the secondary for local bandwidths; which of course is why Xenos is designed in the fashion it is and why multiple consoles beforehand have also utilised some kind of eDRAM or fast on-chip pixel processing (i.e. PowerVR tiling).

While I agree in general pixel fillrate is the largest bandwidth consumer.

The odds are that for a post processing operation Texture reads will be the limitation, most of the more recent ones I've seen read MANY (I've seen as high as 20) texels from the source image for every write to the frame buffer.

RSX has more texture units, but whether RSX would actually be faster at post processing filters is a little harder to answer, it depends on the filter, and the specific implementation. Some filters are seperable and better done as multiple passes in which case the EDRAM bandwidth is a win, and Some filters you can optimise with dynamic branches to limit texture reasds which favor Xenos.

no simple answer, but RSX isn't at as big a disadvantage in most of these cases as it shows in regular blending operations.
 
The odds are that for a post processing operation Texture reads will be the limitation, most of the more recent ones I've seen read MANY (I've seen as high as 20) texels from the source image for every write to the frame buffer.
Curious - where do you generally take the source from for these types of ops? Are we talking pre-stored textures or generated?
 
You're right that I did just sum the the bandwiths from XDR. I tend to keep thinking Cell can also produce data directly to RSX at the same time which in excess of what XDR could provide. Mental error. The rest of the post is more or less in line with common sense.

I understand where you are coming from with the pixel bandwith but still texturing will not occur from eDram with which the pixel shaders have a 32GB/s link with to the daughter die which you are describing. Texture bandwith may be secondary in the overall but is still significant especially considering the role multi-texturing still plays in games and the hit for this will be on the unified memory in the X360 not the eDram is all I have ever understood to be the case.

My question was what would it matter if pixel fill was not the limit but texture reads and that is why I wanted to know what you were thinking. I was more after why you thought the limit now was pixel fill vs. texture access and I only sought to qualify that with this not being the case then nAo's comments make a good bit of sense to me.

Again sorry for the quick sum of XDR bandwith's...I'm sure no dev in their right mind would ever approach saturing XDR with RSX requests as that would starve Cell and secondly would suggest doing this is a bandaid for a much bigger problem irregardless of whether access was concurrent in both directions or not.
 
Last edited by a moderator:
Dave Baumann said:
True enough, but unless you just aren't ouputting any pixels at all then it probably has a greater local texture bandwidth in the first place.
Sure, but as soon as you get serious with texture bw (thus you are texture bw limited) the ratio between color and texture bw is so small that it gets almost not relevant as the color bw just costs as much as another texel to sample..
Another important factor here is texture cache, complex post process effects require big texture caches cause they use wide fllter kernels and/or non coherent sampling patterns.
 
While we are talking about the 10MB eDRAM in Xbox360, since PS3 is supposed to incorporate hardware for backward compatibility with PS2 there must be presumably be 4MB eDRAM in the PS3.

Does anybody know if the eDRAM is accessible at all by RSX/Cell, what it might be usefully used for, and whether it will remain in the PS3 after software backward compatibility allows replacement of the PS2 components in PS3. I mean how do you emulate the 256GB/s or so speed using XDR or GDDR3 RAM?
 
SPM said:
While we are talking about the 10MB eDRAM in Xbox360, since PS3 is supposed to incorporate hardware for backward compatibility with PS2 there must be presumably be 4MB eDRAM in the PS3.

Does anybody know if the eDRAM is accessible at all by RSX/Cell, what it might be usefully used for, and whether it will remain in the PS3 after software backward compatibility allows replacement of the PS2 components in PS3. I mean how do you emulate the 256GB/s or so speed using XDR or GDDR3 RAM?

I don't know if that'll be practical, Dave's article here says that the framebuffer size would be too big for 4 MB of eDRAM, in fact only a SD frame with no antialiasing would fit, that's soooo 2001 ;). So it's not like you can use it to assist in rendering. I think the system space between the PS2 and PS3 hardware will be separate and one will not know the presence of the other.

And why would you want to emulate 256 GB/s throughput using PS3 hardware? It's not necessary.
 
Last edited by a moderator:
SPM said:
While we are talking about the 10MB eDRAM in Xbox360, since PS3 is supposed to incorporate hardware for backward compatibility with PS2 there must be presumably be 4MB eDRAM in the PS3.

Does anybody know if the eDRAM is accessible at all by RSX/Cell, what it might be usefully used for, and whether it will remain in the PS3 after software backward compatibility allows replacement of the PS2 components in PS3. I mean how do you emulate the 256GB/s or so speed using XDR or GDDR3 RAM?

It is my understanding that PS2 hardware included for backwards compatibility is just a temporary solution, which will be removed as soon as Sony improves the software emulation in order to save manufacturing costs.

With that approach, you can not rely on having the eDRAM there. It could be at the beginning, but we don't know for how long.
 
bleon said:
The article into great detail about how these great looking effects are created. I'm really surprised that Marco was so open about the methods they used, especially since they seem to have done some research and come up with a solution thats pretty ingenious.

Aren't developers concerned about giving away secrets to their competitors? For example, I couldn't imagine Namco revealing their secrets to better IQ and AA on the PS2.
I migth add to Xbs and Shiftys comments that showing you are using cutting edge technlogy and even pushing the envelope with your own research, is good for your company in many ways. It strengthens the image of the company which helps attract investors and helps attract competent personel, skilled people often want to work with skilled people. Ninja Theory probably recieves CVs from programmers all over the world that would kill for the opportunity to work with nAo et al. Once you established such an image it will be more or less self-propelled as long as you can keep the employees happy. :)

nAo said:
Another important factor here is texture cache, complex post process effects require big texture caches cause they use wide fllter kernels and/or non coherent sampling patterns.
Are you talking about an associative cache or some kind of addressable local store?:???:

Alstrong said:
minor correction: 48GB/s for PS2 eDRAM
Do you have a latency figure? I imagine that an on-die DRAM might be hard to emulate by using discrete DRAM ICs if the timing is crucial for the function being emulated.

EDIT: fixed
 
Last edited by a moderator:
Sorry, no. I was just going off of...memory.

(and btw, I didn't say what you first quoted ;) )
 
Crossbar said:
Are you talking about an associative cache or some kind of addressable local store?
Cache by definition isn't addressable.
Not to mention for purposes of texturing you'd need that LS to be quite big to be of any practical use.

Do you have a latency figure? I imagine that an on-die DRAM might be hard to emulate
Which one - latency for page buffer access or eDram<->page buffer copy? :p
Anyway problems to solve with emulating GS aren't really in bandwith, IMO.
 
Last edited by a moderator:
Fafalada said:
Cache by definition isn't addressable.
Yes, I just wanted to make sure that's what he meant.

Fafalada said:
Which one - latency for page buffer access or eDram<->page buffer copy? :p
Anyway problems to solve with emulating GS aren't really in bandwith, IMO.
Thanks, actually all latency figures would be of interest, but all in all it sounds like the GS would be a bitch to emulate with some slightly modifed PC GPU anyhow.:smile:

It sounds like they would need to integrate some heavily geared emulation hw if not more or less the complete GS logic.
 
Cell

Crossbar said:
Yes, I just wanted to make sure that's what he meant.


Thanks, actually all latency figures would be of interest, but all in all it sounds like the GS would be a bitch to emulate with some slightly modifed PC GPU anyhow.:smile:

It sounds like they would need to integrate some heavily geared emulation hw if not more or less the complete GS logic.

Or use super high bandwidth cell for full software emulation (EE & GS) and make gddr3 for framebuffer. So different SPE take turns for different EE & GS operations. No need for RSX no?
 
Last edited by a moderator:
Xenos doesn't have a bandwith advantage for "framebuffer effects" (assuming that term is referring to post-processing). The eDRAM can't be read as a texture source, so the data must be copied out to the GDDR3 first. That extra copy takes time and bandwidth, and you don't get any performance gain over a system without eDRAM for these particular effects. For this reason he's probably right about Xenos being a bit slower.

Xenos' advantage is in the actual scene rendering before you apply these effects.
 
Can't you render to texture on the eDRAM, then read that back in to post-process? There's a 2-way bus there so I can't see why not. Unless the eDRAM isn't addressable as texture-space for reads so there's no way to get the data directly, and as you say it has to be copied to GDDR before it can be processed.
 
nAo said:
All details are already there since I was talking about frame buffer effects: in the vast majority of cases you'll end up being texture bw limited and last time I checked Xenos does not have 2 separate busses to fetch textures.
The only thing I'm wondering about here is that your high-bandwidth textures are all dynamically rendered, so to take advantage of the 2 busses you'd have to either copy them from GDDR3 to XDR (which consumes as much bandwidth as simply reading from GDDR3 in the first place) or render directly to XDR (which is slow). It's the same reasoning as why Xenos-style eDRAM doesn't help postprocessing compared to a system without it, because the copying is basically just re-ordering the same memory cycles. (ERP, I don't see how eDRAM saves anything, regardless of separability)
nAo said:
Sure, but as soon as you get serious with texture bw (thus you are texture bw limited) the ratio between color and texture bw is so small that it gets almost not relevant as the color bw just costs as much as another texel to sample..
Another important factor here is texture cache, complex post process effects require big texture caches cause they use wide fllter kernels and/or non coherent sampling patterns.
How big filter kernels are you using? Every technique I've seen is reasonably coherent, especially in the light that you have 4000+ pixels in flight. For the most part, each texel should only be read once unless you're really incoherent, say sampling 50+ pixels away from the centre tap. Even if your shader has 20 taps, it should only have one net new texel read if the cache is doing its job. This is the same reason that I don't think copying to XDR and reading from two busses simultaneously saves you anything.

Anyway, I do agree with you about RSX possibly having an edge, assuming dynamic branching isn't a factor as ERP suggested. There could be some situations that you can get post-processing data to XDR without much of a penalty, and the 24 texture units are a boon also if math ops are low enough.
 
Last edited by a moderator:
Shifty Geezer said:
Unless the eDRAM isn't addressable as texture-space for reads so there's no way to get the data directly, and as you say it has to be copied to GDDR before it can be processed.
Exactly. Can't remember where I read it, but it's probably in the B3D article. The cost would go up notably if such a capability was included.
 
Back
Top