A technical question re: Xenos and RSX

RAZurrection

Newcomer
Hello, I'm new here and was wondering if some of the more technically minded of you would mind settling a bet for me.

I am currently engaged in discussion as to the bandwidths of both GPU's have a few issues that need clarifying whom is correct.

1)

Me: The framebuffer is the biggest bandwidth hog for a console

Her: Accessing textures from main memory is the biggest bandwidth hog

2)

Me: The Xenos's eDRam and 32GB/Sec pipe from the mother die to the framebuffer means that 360 games are less likely to be bandwidth bound than PS3 games where the total bandwidth to the framebuffer is 22GB/Sec

*cited this watchimpress article

http://pc.watch.impress.co.jp/docs/2006/0926/kaigai303.htm

http://www.neogaf.com/forum/showthread.php?t=121307

Her: PS3 will be less bandwidth bound due to having a seconday 128bit bus to XDR memory

Many thanks for any responses
 
"her" :?: .... :idea: -> :cool:

Well, 1.) Women, ARE ALWAYS right. Don't even think about it, just deal with it.

...

and 2.) Is she hot? number? :cool:
 
Oh well, it was worth a try. :D

Anyway, while not trying to go too much into detail, I think the point both of you should understand, is that the "largest bottleneck" doesn't always have to be the same singular entity in every case. If the biggest bandwidth hog is the framebuffer or texture related or something else is dependend on the application, not the hardware. If there's more bandwidth there for the framebuffer (i.e. Xenos), developers will find ways to use it, while on PS3 (RSX), developers will probably use different ways to reach the strengths that are limited to that architecture.

Even though you'd like to have a definite answer, there simply isn't one.
 
Not really.

Just really need to know, in your average game which you think would be of greater importance not to have a bottleneck to. The framebuffer or texture access.
 
Does the framebuffer take up that much bandwidth? Maybe I'm crazy, but I don't think it does.

32bpp x 1080 x 1920 x 60fps / 8bits/Byte / 1000^3 = 0.5 GB/s of bandwidth.

Seems to me that 22 GB/s is sufficient for the framebuffer.
 
Does the framebuffer take up that much bandwidth? Maybe I'm crazy, but I don't think it does.

32bpp x 1080 x 1920 x 60fps / 8bits/Byte / 1000^3 = 0.5 GB/s of bandwidth.

Seems to me that 22 GB/s is sufficient for the framebuffer.

You usually write SEVERAL pixels for each one that you ultimately display (overdraw, particle effects, multipass, transparency on the main buffer, plus others for cubemaps, shadowmaps, etc.... using multiple buffers such as Bungie's HDR method...). In the worst case, you may have to not only write, but also read 64bits per pixel. And as far as internal bandwidth on the Xenos eDRAM die, there is no benefit of compression, and it caters to the worst-case scenario of reading + writing 8 pixels per clock, 4 samples per pixel with 64bits per sample (256GBytes/sec)
 
Does the framebuffer take up that much bandwidth? Maybe I'm crazy, but I don't think it does.

32bpp x 1080 x 1920 x 60fps / 8bits/Byte / 1000^3 = 0.5 GB/s of bandwidth.

Seems to me that 22 GB/s is sufficient for the framebuffer.

Sorry but that would be far more then an ideal case. First of all you haven't counted the Z buffer into your calculations, which is another 32 bit full size buffer.

Second, and this is more important, traffic is bidirectional. Theoretically, you'll read a Z value for each pixel to check if it's visible, and if it is you'll fill the pixel and the Z buffer. This is already 3 times 4 bytes per pixel of traffic (Z read, color write, Z write).
If you also have to do alpha blending for a transparent/translucent poligon (think effects) then you'll also have to read a color value, which is 4 more bytes.

Third, and most importantly, you haven't calculated any overdraw into the the bandwidth. Most of the pixels have to be filled several times, for example you could draw the sky first, then the terrain, then a house, and then a character in front of it whose arms are crossed in front of his chest. This means that you have to draw some of the pixels five times, without any effects.
Now most particles are transparency mapped quad polygons and effects like fire, smoke, dust, blood etc. use a lot of these polygons - and they're overlapping each other. So we'll get a lot of overdraw here as well, and because of the alpha blending, all these pixels will require both a color and Z read and a write, 4 times 4 bytes of traffic.
However, it is very hard to give an exact value for overdraw, as it varies from game to game, and even within a single game from scene to scene. A sudden explosion could increase framebuffer traffic several times for a few dozen frames, for example.

And this is only the framebuffer traffic, but nextgen games are far more complex and many of them use additional rendering targets for various things like HDR lighting and bloom effects, real time reflections, multipass rendering and so on. Some of the games use a higher color depth of FP16 which doubles the data for each color value (8 bytes instead of 4). Then there are shadow buffers, which mean that you have to render the scene from the light's point of view, which requires 2-4 bytes of data per pixel for resolutions of 512*512 to as high as 2048*2048. All these extra buffers also require a lot of extra traffic.

Adding everything up would show that no game could run even at 10 fps on the RSX. Fortunately there are many bandwith saving optimizations: Z and framebuffer compression, hierarchical Z-buffer, occlusion culling and so on. All these help to reduce the framebuffer traffic a lot - but make it nearly impossible to calculate a theoretical bandwidth usage for any engine. This is why developers have to use a lot of various tools to measure what their engine is actually doing at times.

So, all we can conclude is that 22GB/s of bandwith for the RSX is a lot less then the EDRAM's bandwith on the Xenos. But for some operations, RSX can make use of the XDR memory as well, temporarily doubling it's bandwith, and in some cases Xenos has to access the external memory, which is just as fast as the GDDR bus on RSX, and it also has to share it with the Xenon CPU.
 
Last edited by a moderator:
In some cases Xenos has to access the external memory, which is just as fast as the GDDR bus on RSX, and it also has to share it with the Xenon CPU.

How much of the 22GB/s GDDR bandwidth would the Xenon CPU steal from the GPU in a typical game? Would this be peaky maybe causing the GPU to drop frames or causing the CPU to stutter?
 
That's something only actual developers can tell.

Although keep in mind that it's not as simple as n GB here, n GB there. During a single second a game will render at least 30 frames; and during the rendering of every single frame, very different tasks follow each other in a certain order. It may happen that the bus is completely idle, and it may also happen that both the CPU and GPU wants to transfer some data at the same time.
 
IIRC, according to one forum member's GPU simulation (maybe RoOoBo?) frame buffer traffic alone accounted for somewhere between 75-80% of utilized bandwidth. Keep in mind, this was one test on one game title two or three years ago; there is no hard and fast answer that's globally true.
 
I think I've read something about developers liking the RSX because of it's efficiency in rejecting pixels (early Z???) to avoid wasting bandwidth from overdraw...

If only I could remember where I read it...

It's probably here in the forum somewhere... Anyone want to lend a hand and link it?
 
Are there any figures showing what sort of percentage of available theoretical memory bus bandwidth a typical 2, 3 or 4 core CPU typically uses if it doesn't have the share it with anything else? This of course depends on the type of application running and how much it can make use the cache, but it might give an indication of the performance drop to be expected when the Xenos GPU is using the system RAM intensively.

This certainly would be interesting to know since memory speed is generally what is supposed to limit CPU performance.
 
Trying to answer the original question, rather than get involved in RSX performance/architecture discussions, I think both the OP and his friend could be correct - possibly even in the same game.

There is no good answer. Some games will be FB bandwidth bound, others will be texture bound. Some will be bound by either during different parts of a level or even different parts of the same frame.

So the first question should probably be considered a draw.

The second question kind of depends on there being an answer to the first question, so that's also not really clear cut. Yes, the Xenos has good FB bandwidth, but the PS3 does have potentially more bandwidth in total - so again you're both kind of right.

I am quite certain there will be cases where both machines will shine over the other.
 
Back
Top