Does Cell Have Any Other Advantages Over XCPU Other Than FLOPS?

Could someone explain quite easy how tile rendering works? There's no difference in processing a frame that fits in the 10mb and one that has been "tiled"?

And again, how much bandwidth does the framebuffer use regurarly? I don't really see the big bandwidth loss. Would it take more than what xenon requires?

Meh, I just can't understand how 10MB 32GB/s memory can almost replace 256MB 22,4GB/s memory. :???:

:smile:
 
scooby_dooby said:
Right, but you should also point out that the PC counterpart to this card seems already BW limited in some games when you simulate a 128-bit bus.

Let's not draw conclusions about a closed platform from the PC space..there are things you can do and assumptions you can make with a closed box to help overcome weaknesses and play to strengths, more than you ever could with PC software. You can much more easily play with the bound and shift it around too.

The comparison you're making isn't particularly fair regardless, though, when you consider that RSX still has potentially more bw available to it than the original GTX would (it's got 38.4GB/s). PS3 as a whole has 48GB/s there - sure, the CPU takes a bite, but if you limited to a "regular" PC's memory bandwidth, there'd still be more leftover for RSX than the GTX has (I think?).

Dave Baumann said:
Thats suggest that, other than command information, you are thinking that geometry is only ever going to be shoved between the two chips, in which case how much is that going to be valid for and does RSX have the setup rate for it?

nAo has already been suggesting very much against this when the concept of the split memory was first talked about.

I'm really not sure what you're getting at. I'm simply saying that as a basis, PS3's CPU and GPU (for non-framebuffer tasks) is probably always going to have at least 25.6GB/s, which is more than X360 will have.


Dave Baumann said:
How much of a change is it going to be to get the code working on two radically different CPU's? How different is the code going to be given the two different API's? Who different do the shaders have to be, presumably, given the two different HLSL compilers? How different does the code have to be given the different properties of the graphics chips?

Changes will always have to be made becuase they are different platforms. Other factors may also come into play such as what the primary development platform be for the developers.

True, I simply ask the question because an apparently known "MS guy" raised some questions about that on another board, and seemed to imply that middleware support for predicated tiling wasn't necessarily guaranteed, and he wasn't holding his breath in some instances. I think it's a good question for any devs here.
 
Last edited by a moderator:
scooby_dooby said:
Right, but you should also point out that the PC counterpart to this card seems already BW limited in some games when you simulate a 128-bit bus.

Also, that in the newest refresh of that card, they've bumped up BW by 43% which suggests it certainly can 'use' all the bandwidth it can get, and shows that Nvidia feels that even with the 256bit bus at 600Mhz the G70 could still gain from a signifigant increase in bandwidth.

PC comparisons really can't be used in this case... unless you think its okay to compare Q4 for PC to Q4 to X360 and get anything valid from it?

PCs are a wildly different situation where you can't optimize worth a shit, really, so nothing is used very efficiently.

I question Dave's sanity for actually doing that little benchmark he did and acting as there was any useful information to be gained from it. I'll say this again, you don't program to a piece of hardware's "weakness", you make something that works around the limitations and uses the strengths -- it's always a juggling act. If it wasn't the case we'd see every game use a FB that can fit in a single tile on Xbox360.
 
Titanio said:
Let's not draw conclusions about a closed platform from the PC space...
Right. And let's not 'gloss over' one of the biggest limitations to the PS3 hardware.

I'm not making direct comparisons, just pointing out that 22GB/s is not exactly a ton of bandwidth. You seem to be making the case it's perfectly sufficient, which I 't think is very optimistic, especially considering the latest refresh from nvidia with a whopping 42% increase on a card that already has double the BW of RSX.
 
Last edited by a moderator:
scooby_dooby said:
Right. And let's not 'gloss over' one of the biggest limitations to the PS3 hardware.

I'm not making direct comparisons, just pointing out that 22GB/s is not exactly a ton of bandwidth. You seem to be making the case it's perfectly sufficient, which I 't think is very optimistic, especially considering the latest refresh from nvidia with a whopping 42% increase on a card that already has double the BW of RSX.

You're comparing total GPU requirements on the PC side to suggested framebuffer requirements on the PS3 side. You're also neglecting that RSX has access to PS3's total BW minus CPU consumption (i.e. more than 22GB/s). Which would leave it with more than the original GTX at least, if we limited CPU consumption to that of the typical PC. Of course, that's an apples-to-oranges comparison - I'm sure Cell will use more bw than the typical CPU processor - but then really, yours is also.
 
Jawed said:
Yes, by leaving AA turned off.

Perhaps, yeah, or a lower level of AA. That's the implication. (Again, though, that all hinges on ideal use of eDram). There's implications to greater bw on the other side too though. It's all tradeoffs..
 
Bobbler said:
PC comparisons really can't be used in this case... unless you think its okay to compare Q4 for PC to Q4 to X360 and get anything valid from it?

PCs are a wildly different situation where you can't optimize worth a shit, really, so nothing is used very efficiently.

I question Dave's sanity for actually doing that little benchmark he did and acting as there was any useful information to be gained from it. I'll say this again, you don't program to a piece of hardware's "weakness", you make something that works around the limitations and uses the strengths -- it's always a juggling act. If it wasn't the case we'd see every game use a FB that can fit in a single tile on Xbox360.
At a graphics level there plenty of optimisation that can occur, which is one of the reasons why the IHV's spend much of their time evangelising to developers how best to optimise for a particular API generation and the hardware supporting that API - its of no co-incidence that many of those optimisations are equally valid for ATI and NVIDIA; many of those scenarios will be equally applicable to XBOX 360 and PS 3 (for instance, I'm not sure I see much difference in preferred sorting methods between any of the parts out there).

Where there is a difference is the level of communication that will occur between the CPU and GPU, and hence in PS3's case what will be may be drawn from/to system RAM as opposed to local RAM, or what geometry is generated from the CPU. Geometry, though, doesn't really factor too highly on the framebuffer bandwidth consumption though, with pixel and texture being the primary two consumers, especially when "high quality" pixels (AA/FP) are in operation. The point of the test was to look at some bandwidth bound situations to see what types are scenarios are likely to require alternate bandwidths to be used should those tragets be required - what it did show is that there are many cases that fairly high quality pixels could be used without using too much of the system bandwidth for elements such as texture storage / retrieval, etc.
 
Has it not been shown that in most cases the performance increase of the 7800 GTX 512MB scale linearly with the increased clockrate of the card over the 7800 GTX 256MB?

It was my understanding that only a few titles actually benefitted from the increased bandwith the faster memory provided such as Doom3 and Quake4 due to them using many texture layers on just about everthing. I can't recall excatly but are the gain mostly notable with the Ultra quality modes where textures are uncompressed? I'm only asking this because I HIGHLY doubt console developers are going to use uncompressed textures if they can avoid it and furthermore if only a subjective analysis Doom3's ultra quality mode BARELY looks better than it's high quality mode if you stand still and look up close on a wall or something.
 
When its not CPU limited, Doom 3 actually scales very well with bandwidth because many of the sample layers are just stencils burning it. Quake 4 appears to be more vertex limited than Doom 3 is.

In many cases the the performances on the PC are more CPU bound though. Increasing the "quality" of the pixels always shows the benefits of more bandwidth.
 
Dave Baumann said:
When its not CPU limited, Doom 3 actually scales very well with bandwidth because many of the sample layers are just stencils burning it. Quake 4 appears to be more vertex limited than Doom 3 is.

In many cases the the performances on the PC are more CPU bound though. Increasing the "quality" of the pixels always shows the benefits of more bandwidth.

Makes sense to me.

Although increasing the quality of pixels would suggest more pixel processing power is needed before the benefits of more bandwith can be taken advantage of. In any case, I don't expect Xenos nor RSX to be lacking in the ability to make prettier pixels.

If anything it highlights that different games will benefit from different aspects of the hardware one has ultimately even if we can find some commonality at times with what games demand.
 
So the graphical memory is basically just for framebuffer? (like textures and models, that's stored in the main ram?) How do you count it to get that high bandwidth usage? Is there anything I should read?
 
weaksauce said:
So the graphical memory is basically just for framebuffer? (like textures and models, that's stored in the main ram?) How do you count it to get that high bandwidth usage? Is there anything I should read?

Realistically the vram bandwidth is not going to be solely used by the framebuffer - devs will aim to keep framebuffer usage lower than the available vram bw, in order to make use of some of it for texture/vertex access etc. If they didn't, you'd basically be wasting an awful lot of vram, as you'd be tieing up 256MB's bw for ~7MB maybe, rendering the rest useless.

No one really knows how much bw a game's framebuffer is sucking up unless you're there working on it really ;) There's no one formula for all games. It'd be nice if you could keep RSX's usage to maybe 15/16GB/s though (which with compression would effectively be more). That'd leave the CPU and GPU with ~32GB/s. Take out 10GB/s for the CPU, perhaps, leaving 22GB/s for the GPU (texture/vertex fetch). If you apportioned the same bw to the CPU on X360, you'd be left with only 12GB/s for the GPU in comparison. That's just one set of figures - it seems though that with low CPU usage and high FB usage on PS3, the increase versus X360 will be small (20%) but with higher CPU usage and lower FB usage, the difference grows quite a lot (up to or over 100% as above - in particular as CPU goes up, the difference gets large).
 
Last edited by a moderator:
Can someone esplain what the front buffer portion of the framebuffer is typically used for?

The front buffer still resides in main memory in the X360 so a good follow up question is how much bandwith do front buffer tasks typically consume? (A relative answer will suffice...more than moving textures/vertex access...about the same...less)

Are there any interesting things Cell/Xenon could do with the frontbuffer alone?
 
scificube said:
Can someone esplain what the front buffer portion of the framebuffer is typically used for?

The front buffer still resides in main memory in the X360 so a good follow up question is how much bandwith do front buffer tasks typically consume? (A relative answer will suffice...more than moving textures/vertex access...about the same...less)

Are there any interesting things Cell/Xenon could do with the frontbuffer alone?


The front buffer is just the image that is being scan converted by the video scaler.
How much bandwidth it needs is based on the number of taps the scaler is using and how it actually applies the filter, but overall is will be a minimal bandwidth cost.
 
iknowall said:
Also what is more important flops power or integer power for making games ?

:???:
You can calculate everything as floating point. Integer 1234 is float 1234.0F, no problem there except maybe for higher memory usage and a few more CPU cycles needed.

I ask this because i know that only with the dx10 all the geometry will be created directly on the gpu.

You're the only one to know that, methinks ;)

I would like to know if this flop power will translate in something or will be useless.

Well that depends on the programming.
 
_xxx_ said:
:???:
You can calculate everything as floating point. Integer 1234 is float 1234.0F, no problem there except maybe for higher memory usage and a few more CPU cycles needed.

Actually that's not a good idea, because floating point math results are inexact. For example, it would be stupid to calculate pointer offsets or loop counters using floating point. ;-) Use the right tool for the right job.

To borrow a famous example:

http://docs.sun.com/source/806-3568/ncg_goldberg.html
Code:
 int main() {
    double  q;
    q = 3.0/7.0;
    if (q == 3.0/7.0) printf("Equal\n");
    else printf("Not Equal\n");
    return 0;
}

What this code will print actually depends on the CPU and compiler (and sometimes even the compiler settings) you run it on. :devilish: ;)
 
Last edited by a moderator:
aaaaa00 said:
Actually that's not a good idea, because floating point math results are inexact. Unless an algorithm actually requires floating point math, I would avoid it.

I never said it's a good idea, just that it is possible with a rather small loss. And I'm pretty sure that the compiler will know what to do with the given CPU in this case. And what will you get when you calculate that with int's?
 
Back
Top