How much work must the SPU's do to compensate for the RSX's lack of power?

Status
Not open for further replies.
I would like to see what numbers the poster who came up with the 70 to 80% number used, because it can't the pretty creditable Wikipedia numbers I'm looking at, where RSX comes out in top more often than not.

Sorry, but you certainly know developers can not discuss openly about RSX performance, but we can estimate performance G70 GPUs that the RSX is based and we can have a satisfactory orientation.

What your numbers?

I put the numbers based on a blog who tested Xenos in XNA and pairing it with Geforce 7400 which is about 1/6 Alus pixel shaders and less than 1/3 bandwidth RSX (including FlexIO), unfortunately I can not post more about who informed me about numbers that made me get to 70/80% ...
 
Last edited by a moderator:
Ah, well. We're all really just killing time until the next gen starts to come out and we can salivate over the new technology all over again.

Seems we have a couple of years at least to wait, though. ;-/
 
So the 16MB/s is just the Cell who can't read fast from the graphics memory. So why is the GPU boss of graphics memory, while the main memory can be accessed so 'freely' by both the Cell and the GPU? This is because the main memory is RAMBUS stuff, made and slotted into the motherboard to be effeciently accessed by several components at once. The graphics memory on the other hand is GDDR memory designed to be very quickly accessed by one component only. Hence the RSX being master of it.

That the Cell can in fact sort of write to it directly at all, is because for DVD/BluRay playback, it's convenient to be able to use the Cell to process the compressed stream and put it straight into the framebuffer. The 4GB/s alotted to this is just for that purpose (but note that it was also useful in the Linux environment where the RSX was taken out of the picture completely for security purposes, in hindsight a correct decision I think).
I always thought the tiny read speed from RSX was because of the requirement that the access is coherent, it must take into account all the small caches everywhere on the GPU. I dont think the RAM beeing GDDR3 is much of a problem (Xenos/Xenos handle it fine).

Having reasonable fast read and write could help those devs that run out of System RAM, so I think fast reads were left out because of technical restrictions and not because there is no use for it.
 
Ah, well. We're all really just killing time until the next gen starts to come out and we can salivate over the new technology all over again.

Seems we have a couple of years at least to wait, though. ;-/

If what Arwin said is true, then I think we may see more new approaches in graphics rendering akin to MLAA.

These new architectural features may be more interesting than pure number comparison.
 
I dont think the RAM beeing GDDR3 is much of a problem (Xenos/Xenos handle it fine).

Are Xenos and Xenon sharing access to the GDDR3, or is one of them in charge of feeding the other?
 
Are Xenos and Xenon sharing access to the GDDR3, or is one of them in charge of feeding the other?
Does it matter? - there is always a sort of bus controller involved, be it (G)DDR or something else you cant just attach 2 devices directly to RAM (I believe the GDDR-Ram is physically connected to the GPU in case of XB360). You can also instruct RSX to push Memory to Cell, so there shouldnt be a problem of archiving higher transfer rates because of the RAM-type.
I believe its juts a matter of RSX having to lock down the accessed memory from its operational units.
 
Does it matter? - there is always a sort of bus controller involved, be it (G)DDR or something else you cant just attach 2 devices directly to RAM (I believe the GDDR-Ram is physically connected to the GPU in case of XB360). You can also instruct RSX to push Memory to Cell, so there shouldnt be a problem of archiving higher transfer rates because of the RAM-type.

But that's exactly what I just said. The reason I think however that you don't see how this matters is because you're 100% GPU focussed (which is typical PC guy behavior, no offense). But the Cell processor has 8 cores, 7 of which are stream processors, and it just wouldn't nearly be as effective with GDDR3 style memory, let alone sharing it with everything else.
 
Has everyone just forgot all the info we have got from devs and game analysts?

Xenos is better than RSX in a few key ways, 1. It has a unified shader pipeline, which allows it to better use it's resources for it's what it is rendering. and 2. the EDRAM allows it to do AA and heavy alpha effects a lot easier than RSX (and NO! devs don't try to get around it!).
 
It is rather telling to me that it seems Cell's SPU's isn't used so much for physics or AI or whatnot, or at least certainly that's not what gets the media's reporting attention, it's mostly used to help out the GPU. Not that I'm biased but my longstanding assertion is graphics drive the hardcore market and this certainly bears that out. Developers aren't saying "how can we use the SPU's to drive better AI" they're saying "how can we use the SPU's to help our renderer".

I think you can almost consider the PS3 a form of SLI, another tactic I've always thought would be interesting in a console (a dual GPU console).
Best AI in a AAA Game 2009 - Killzone 2
Technical Innovation in Game AI 2009 - Killzone 2

http://aigamedev.com/open/editorial/2009-awards-results/
 
But that's exactly what I just said. The reason I think however that you don't see how this matters is because you're 100% GPU focussed (which is typical PC guy behavior, no offense). But the Cell processor has 8 cores, 7 of which are stream processors, and it just wouldn't nearly be as effective with GDDR3 style memory, let alone sharing it with everything else.
I Never said it would be very effective, but then it would still beat running out of memory while having free RAM on the GPU. Think of it like paging out sparely used parts of memory, I hope you dont think of paging as useless even if its way slower than using the primary RAM and should be totally avoided in a perfect world.
Even with its meager 16MB/s Linux made good use of it, just think of having a few GB/s at relatively low latencies ready for the cases where you otherwise would have to grab stuff from HDD or BluRay. If the GPU needs the GDDR3 Ram just priorize it, its still no need for an artificial 16MB/s bottleneck even if the GPU is doing nothing - the bus certainly can push GB/s in both directions.

As said I think coherence is the limiting factor, but its just my guess with the points above considered. Just to not needlessly discuss, is your post on why the bandwith is crippled based on some information from devs or just a similar guess (like mine) on your end?
 
I'm not saying that bandwidth is crippled necessarily. I'm inferring how it works from the actual released specs (bandwidth figures between the cell/rsx/gddr/xdr and previous discussions on this forum. (I personally believe that the right choices have been made for getting the most out of the Cell and RSX combined.)
 
But we didn't see a lot of articles written about using Cell to get AI, but we read a lot of boasts about it helping graphics rendering. (Naughty Dog, Insomniac, Guerrilla, etc)

Because graphics has the most visible impact. You should be able to find articles on Cell AI, but the real problem may be: The press may not know how to appreciate it; let alone write an article about the topic. The devs themselves published AI articles to further their work.

Although the standard enemy AI does not consume much resources, there were doubts that Cell could handle tree traversal efficiently.

I remember ND talked about their path finding AI as early as U1, Insomniac talked about hedgehog physics and bot AI in MP, Guerilla wrote about their AI and AI-driven animation.

Naturally, the whole PSEye and Move area includes a major and on-going AI component. Pitching the technology + games in E3 and GDC endorses the movement.
 
The EDRAM die is mainly a cost saving measure. It's nice for a few things, but the features it offers is far from free. When some of the most high profile exclusive and 1st party games on the platform jumps through hoops to use or not use it, something is wrong.
Every Xbox360 game uses edram regardless of AA implementation. Also, edram was not a cost saving measure. It's what the designers felt was necessary to achieve the highest performance. The cost saving measure was only having 10 MB and requiring tiling for AA with 720p. 4x AA with 480p fits perfectly.

Are Xenos and Xenon sharing access to the GDDR3, or is one of them in charge of feeding the other?
It's like x86 designs were prior to Opteron moving the memory controller on die. The memory controller is in Xenos (i.e. the northbridge).
 
Every Xbox360 game uses edram regardless of AA implementation. Also, edram was not a cost saving measure. It's what the designers felt was necessary to achieve the highest performance. The cost saving measure was only having 10 MB and requiring tiling for AA with 720p. 4x AA with 480p fits perfectly.

Heck, wasn't MS originally only going to include 5 MB of EDRAM? And it was at the behest of devs (I think Epic mainly) that got MS to up it to 10 MB even though the devs would have liked more.

So it ended up being a compromise between cost of EDRAM and devs desires.

And yeah I don't think any dev doesn't use it, except possibly for quick and dirty ports from PS3. But that's extremely rare.

Regards,
SB
 
Heck, wasn't MS originally only going to include 5 MB of EDRAM? And it was at the behest of devs (I think Epic mainly) that got MS to up it to 10 MB even though the devs would have liked more.

So it ended up being a compromise between cost of EDRAM and devs desires.

And yeah I don't think any dev doesn't use it, except possibly for quick and dirty ports from PS3. But that's extremely rare.

Regards,
SB
I believe Epic said go with 512MB of RAM instead of 256MB.
 
But we didn't see a lot of articles written about using Cell to get AI, but we read a lot of boasts about it helping graphics rendering.

AI's performance requirements are usually related to a lot of raycasting for visibility calculations (can AI agent see player, can player see this spot so is it in cover etc. etc.)
The rest isn't really CPU intensive, but requires a lot of good design work on what and how to do. Finite state machines, number of behaviours and requirements to change between them, mid to high level scripting and so on.
It's basically trying to ship a designer with the game :)
 
Every Xbox360 game uses edram regardless of AA implementation.

Yeah, EDRAM is also the reason why we don't see that many games on the 360 using 1/4 or even smaller sized effects buffers. How much of a visual impact this has compared to PS3 versions is of course debatable, but if we're discussing technical issues it is an important one IMHO.
 
Status
Not open for further replies.
Back
Top