How much work must the SPU's do to compensate for the RSX's lack of power?

Status
Not open for further replies.
The old G7x architecture has trouble with more complex shader programs. Dynamic branching is not a G7x strength, for example. ATI was ahead of the game with R520 for complex shaders compared to G7x. Xenos' design is somewhat like a hybrid of R520/R600 so it's possibly better than RSX at complex shaders.

G7x also has performance issues if you go beyond bilinear filtering. They cheated in their PC drivers, with aggressive trilinear and anisotropic tweaks, to be more competitive.

RSX is also likely limited to max 4X MSAA, but that probably doesn't matter for console-land.
 
Last edited by a moderator:
RSX seems to be a strange animal in that it does certain things very well yet others no so well depending up the context. No, it can't compete with the Xenos with certain things as it doesn't have the raw bandwidth with the ROPs etc.

Sometimes it's nice to see what real developers say about it. For example, the most recent I've seen someone say things about RSX is http://www.eurogamer.net/articles/digitalfoundry-lbp2-tech-interview

Research-wise, we just dove in. I knew that we'd have these six SPU processors, and I'd heard rumours (incorrect in detail, helpful in spirit) that the vertex pipe was weak on the GPU. That doesn't really count as research, but it was enough of a nudge to just force us to build an engine where all the vertices are pushed through SPU.

...

Because the geometry of the whole scene is now in a volume texture, sampling for occlusion information just turns into volume texture lookups, which the RSX, in this case, is good at.

...

The RSX is an odd beast in that sometimes it can surprise you in how fast it chews through things - perhaps it's the bus - and sometimes its performance just 'drops off a cliff'.

Every GPU has its foibles - and with the PS3, we didn't take a particularly scientific or analytic approach. We just threw lots of pasta at the wall and some of it stuck.
 
... which is why the SPUs come in handy to keep the RSX focused on its tasks on-hand -- like having a dedicated vertex processor and dedicated pixel processor working at the same time.

The culling may take up as much as a rough 40% of the SPU power, but there should be plenty to go around. They can fit other jobs in between culling, or even while culling.
 
It's interesting how that worked out. The bought an old school GPU, threw in a funky new school CPU. I don't think they knew how it was all going to work out. The fake pre-rendered videos and stupid marketing stuff was proof enough of that. I think the CPU and Bluray were mostly Sony pushing their own tech for some ROI and to push their corporate world view or whatever. Devs discovered that the CPU could be used to supplement the gimpy GPU. How "fortuitous", aside from the extra pain involved for the people working unreal hours to make the games.
 
It's interesting how that worked out. The bought an old school GPU, threw in a funky new school CPU. I don't think they knew how it was all going to work out. The fake pre-rendered videos and stupid marketing stuff was proof enough of that. I think the CPU and Bluray were mostly Sony pushing their own tech for some ROI and to push their corporate world view or whatever. Devs discovered that the CPU could be used to supplement the gimpy GPU. How "fortuitous", aside from the extra pain involved for the people working unreal hours to make the games.

While the Cell is a general purpose CPU, the PS3 is clearly designed to allow Cell to help in graphics. Otherwise, there'd be no need to have Cell fast read/write the video memory directly. I believe they profiled games for a year or more when they designed Cell. At that time, rendering was probably the only heavy weight job. Game physics and video/audio recognition AI were at their infancy (Nothing much to profile).
 
Last edited by a moderator:
If you're talking about the KZ2 2005 video, it's Guerilla's estimation of what they can/want to achieve on the PS3.

While the Cell is a general purpose CPU, the PS3 is clearly designed to allow Cell to help in graphics. Otherwise, there'd be no need to have Cell fast read/write the video memory directly. I believe they profiled games for a year or more when they designed Cell. At that time, rendering was probably the only heavy weight job. Game physics and video/audio recognition AI were at their infancy (Nothing much to profile).

Given that the Cell reads from RSX's RAM at 16MB/s, I wouldn't lean so hard on the Cell having fast read/write of the video memory. ;)
 
Given that the Cell reads from RSX's RAM at 16MB/s, I wouldn't lean so hard on the Cell having fast read/write of the video memory. ;)

Ah yes, it's the other way round. The Cell can write to video memory fast for RSX to read.

I believe the RSX also has a larger texture cache (may be to smooth data access from the video and system memory ?), but it's all NDA'ed.
 
Given that the Cell reads from RSX's RAM at 16MB/s, I wouldn't lean so hard on the Cell having fast read/write of the video memory. ;)
That was proven looong ago to be FUD. Yes it's true but it has little impact on rendering to copy the buffer over to CPU RAM.

RSX has ca. 1.25 the transistor budget of Xenos if you leave out the daughter dies EDRAM cells. Nvidia would have to be incredible bad engineers for that not to make a difference.
The EDRAM die is mainly a cost saving measure. It's nice for a few things, but the features it offers is far from free. When some of the most high profile exclusive and 1st party games on the platform jumps through hoops to use or not use it, something is wrong.
The reason some games look slightly better on 360, can as I already said before, have many other reasons than technical superiority.
What matters is that PS3 beat 360 pretty thoroughly, when comparing the top games on either platform. It should, it's a year younger and the hardware's first iteration didn't look like the components was thrown in with a shovel in the mid eighties.
I would like to see what numbers the poster who came up with the 70 to 80% number used, because it can't the pretty creditable Wikipedia numbers I'm looking at, where RSX comes out in top more often than not.
 
Last edited by a moderator:
Ah yes, it's the other way round. The Cell can write to video memory fast for RSX to read.

And RSX can read/write XDR fast, yeah.

It's very nice that the SPUs are there, but I wish Sony hadn't handicapped the RSX so much. A 256 bit interface to the GDDR3 would have helped a whole lot.

Everyone says that makes a system much more expensive and harder to price-reduce, unfortunately.
 
Majority of CPU time is generally spent on AI and Physics. Especially if you have AI with complex pathfinding routings and behavioral guidelines. And then multiply that by number of AIs and number of players (if co-op). Physics can use up all the processing in either of the consoles and still not be satisfied. Thus there's people dedicated to finding ways to approximate physics calculations such that while not entirely accurate look and perform well enough for X game.

Physics is an entirely different ballgame, I'm not talking about Physics.Obviously more CPU resources allow for better physics, that goes without saying.

I can see AI becoming burdensome if you have 100s of AI's to control at once, but that has nothing to do with the Quality of AI, but rather the magnitude. So, more CPU power may make it more feasible to have more objects with AI, or allow you to not scale back your AI when supporting many objects, but it doesn't really do anything towards making an individual AI object "smarter". This is a programming/logic issue.

Pathfinding can get fairly processor intensive for sure, but I'm thinking more of decision making.

All AI's in games these days are still just as stupid as they were last generation. ie Halo had great AI last gen, and it has basically not improved whatsoever with this latest generation.
 
And RSX can read/write XDR fast, yeah.

It's very nice that the SPUs are there, but I wish Sony hadn't handicapped the RSX so much. A 256 bit interface to the GDDR3 would have helped a whole lot.

Everyone says that makes a system much more expensive and harder to price-reduce, unfortunately.
After 1-2 die-shrinks they could`ve replaced the 256 bit GDDR3 RAM (512MB) with a 128bit GDDR5. Initially it would`ve been added cost for sure, but atleast you`d have a observable advantage in multiplattform games, instead of parity or disadvantage for more money compared to XB360.
 
Just a curiosity: if SPU's are used for vertex processing, whick work is letf to RSX vertex shader units? Are they underutilised in latest games?
 
All AI's in games these days are still just as stupid as they were last generation. ie Halo had great AI last gen, and it has basically not improved whatsoever with this latest generation.

Again, not for recognition AI though. The software are smarter today, but still trying to chase after live human actions, speech, and other form of expressions, etc. MS just invested millions into this relatively new branch of gaming. Sony started to work on it since EyeToy. I think they both have a long way to go.
 
It's not FUD. It's just not an intended use case.

It's FUD in-so-much as the way these things work is misunderstood, I think? The RSX is the boss of the memory bus between itself and the other components for the most part. From what I understand, the RSX can stream textures from main memory at high speed, and read a framebuffer from there in the same way, as well as vertex data.

But similarly, the RSX can push data straight at the Cell or even (only?) I think directly at an SPU, which can then process, pass on, and write to, say, a framebuffer or a texture in main memory, from where RSX can pull the data again.

So the 16MB/s is just the Cell who can't read fast from the graphics memory. So why is the GPU boss of graphics memory, while the main memory can be accessed so 'freely' by both the Cell and the GPU? This is because the main memory is RAMBUS stuff, made and slotted into the motherboard to be effeciently accessed by several components at once. The graphics memory on the other hand is GDDR memory designed to be very quickly accessed by one component only. Hence the RSX being master of it.

That the Cell can in fact sort of write to it directly at all, is because for DVD/BluRay playback, it's convenient to be able to use the Cell to process the compressed stream and put it straight into the framebuffer. The 4GB/s alotted to this is just for that purpose (but note that it was also useful in the Linux environment where the RSX was taken out of the picture completely for security purposes, in hindsight a correct decision I think).

So at any rate, this means that the RSX has the high speed read/write necessary for close cooperation with the Cell processor and its SPUs to make an interesting, integrated rendering pipeline.

Note that this also answers the question of why the RSX was linked to its graphics memory with a 128bit interface - the other 128bit is hooked up to main memory.
 
It's FUD in-so-much as the way these things work is misunderstood, I think? The RSX is the boss of the memory bus between itself and the other components for the most part. From what I understand, the RSX can stream textures from main memory at high speed, and read a framebuffer from there in the same way, as well as vertex data.

I suspect the (rumored) larger texture cache may have something to do with it.

But similarly, the RSX can push data straight at the Cell or even (only?) I think directly at an SPU, which can then process, pass on, and write to, say, a framebuffer or a texture in main memory, from where RSX can pull the data again.

Csn RSX push data to a Local Store directly ? (I don't know either way). That's rather interesting if true. I only know you can map the Local Stores and main memory into a logical global memory map.

EDIT: So may be by writing into the right range of memory, it's equivalent to loading data into an SPU's Local Store ?
 
I remember reading Graham's post about Uncharted 2 using 40% Cell for culling. But I can't find any link right now. What's the right figure, insider ? :p

A lot less than that. ;)
Maybe he was talking about all of geometry processing, which could make sense. I don't remember seeing Uncharted's SPU schedules, but 40% for culling seems excessive, unless you do some pretty sophisticated occlusion culling.

In any case, 80ms are 256M cycles, or 1G cycles, if you account for SIMD. So even if you have 10M triangles in your scene, that would be 128cy per triangle.
 
Status
Not open for further replies.
Back
Top