How much work must the SPU's do to compensate for the RSX's lack of power?

Status
Not open for further replies.
Keep in mind that KZ2 isn't that geometry heavy, for example the sniper character presented in the D'artiste book from Ballistic has only 11000 triangles, which is quite a bit below UC2's average of 18K. Granted, that average probably goes from the highly detailed Drake to the most simple enemies, but there definitely is a difference. KZ2 also seems to have less environment detail, somewhat lower number of enemies at the same time, and so on. So what I'm saying is that it might not be completely representative :)
 
Keep in mind that KZ2 isn't that geometry heavy, for example the sniper character presented in the D'artiste book from Ballistic has only 11000 triangles, which is quite a bit below UC2's average of 18K. Granted, that average probably goes from the highly detailed Drake to the most simple enemies, but there definitely is a difference. KZ2 also seems to have less environment detail, somewhat lower number of enemies at the same time, and so on. So what I'm saying is that it might not be completely representative :)

True, and no game is going to be completely representative. I mean, Ratchet & Clank, a game which a lot of people consider to be really good looking, doesn't use the SPEs for culling at all - the SPUs are reserved exclusively for stuff like weapons, animations, physics, AI and so on.
 
Keep in mind that KZ2 isn't that geometry heavy, for example the sniper character presented in the D'artiste book from Ballistic has only 11000 triangles, which is quite a bit below UC2's average of 18K. Granted, that average probably goes from the highly detailed Drake to the most simple enemies, but there definitely is a difference. KZ2 also seems to have less environment detail, somewhat lower number of enemies at the same time, and so on. So what I'm saying is that it might not be completely representative :)
Number of enemies is one thing which KZ2 does just fine imo...the highest count should be around 8-10 on screen enemies at the very least (could be more in the biggest battles), & that's already more than UC2.
 
It appears the answer to the thread title is "not much".

I could go on for pages listing the types of things the spu's are used for to make up for the machines aging gpu, which may be 7 series NVidia but that's basically a tweaked 6 series NVidia for the most part. But I'll just type a few off the top of my head:


1) Two ppu/vmx units
There are three ppu/vmx units on the 360, and just one on the PS3. So any load on the 360's remaining two ppu/vmx units must be moved to spu.

2) Vertex culling
You can look back a few years at my first post talking about this, but it's common knowledge now that you need to move as much vertex load as possible to spu otherwise it won't keep pace with the 360.

3) Vertex texture sampling
You can texture sample in vertex shaders on 360 just fine, but it's unusably slow on PS3. Most multi platform games simply won't use this feature on 360 to make keeping parity easier, but if a dev does make use of it then you will have no choice but to move all such functionality to spu.

4) Shader patching
Changing variables in shader programs is cake on the 360. Not so on the PS3 because they are embedded into the shader programs. So you have to use spu's to patch your shader programs.

5) Branching
You never want a lot of branching in general, but when you do really need it the 360 handles it fine, PS3 does not. If you are stuck needing branching in shaders then you will want to move all such functionality to spu.

6) Shader inputs
You can pass plenty of inputs to shaders on 360, but do it on PS3 and your game will grind to a halt. You will want to move all such functionality to spu to minimize the amount of inputs needed on the shader programs.

7) Msaa alternatives
Msaa runs full speed on 360 gpu needing just cpu tiling calculations. Msaa on PS3 gpu is very slow. You will want to move msaa to spu as soon as you can.

8) Post processing
360 is unified architecture meaning post process steps can often be slotted into gpu idle time. This is not as easily doable on PS3, so you will want to move as much post process to spu as possible.

9) Load balancing
360 gpu load balances itself just fine since it's unified. If the load on a given frame shifts to heavy vertex or heavy pixel load then you don't care. Not so on PS3 where such load shifts will cause frame drops. You will want to shift as much load as possible to spu to minimize your peak load on the gpu.

10) Half floats
You can use full floats just fine on the 360 gpu. On the PS3 gpu they cause performance slowdowns. If you really need/have to use shaders with many full floats then you will want to move such functionality over to the spu's.

11) Shader array indexing
You can index into arrays in shaders on the 360 gpu no problem. You can't do that on PS3. If you absolutely need this functionality then you will have to either rework your shaders or move it all to spu.

Etc, etc, etc...
 
:oops:

Uhh I guess Cell saves the day then? ;)

I would qualify: yes, particularly for developers who develop their game for 360 (and/or starting from PC), and then port to PS3, it's pretty clear you need the SPUs if you want to do things the same way as much as possible. I think if you had any doubt ever before, then this clarifies 100% joker's perspective in these matters. ;)

It's still very interesting information though of course! (thanks joker)

(Point 1) though is pretty irrelevant to this discussion, as it's not a GPU feature?)
 
I could go on for pages listing the types of things the spu's are used for to make up for the machines aging gpu, which may be 7 series NVidia but that's basically a tweaked 6 series NVidia for the most part. But I'll just type a few off the top of my head:


1) Two ppu/vmx units
There are three ppu/vmx units on the 360, and just one on the PS3. So any load on the 360's remaining two ppu/vmx units must be moved to spu.

2) Vertex culling
You can look back a few years at my first post talking about this, but it's common knowledge now that you need to move as much vertex load as possible to spu otherwise it won't keep pace with the 360.

3) Vertex texture sampling
You can texture sample in vertex shaders on 360 just fine, but it's unusably slow on PS3. Most multi platform games simply won't use this feature on 360 to make keeping parity easier, but if a dev does make use of it then you will have no choice but to move all such functionality to spu.

4) Shader patching
Changing variables in shader programs is cake on the 360. Not so on the PS3 because they are embedded into the shader programs. So you have to use spu's to patch your shader programs.

5) Branching
You never want a lot of branching in general, but when you do really need it the 360 handles it fine, PS3 does not. If you are stuck needing branching in shaders then you will want to move all such functionality to spu.

6) Shader inputs
You can pass plenty of inputs to shaders on 360, but do it on PS3 and your game will grind to a halt. You will want to move all such functionality to spu to minimize the amount of inputs needed on the shader programs.

7) Msaa alternatives
Msaa runs full speed on 360 gpu needing just cpu tiling calculations. Msaa on PS3 gpu is very slow. You will want to move msaa to spu as soon as you can.

8) Post processing
360 is unified architecture meaning post process steps can often be slotted into gpu idle time. This is not as easily doable on PS3, so you will want to move as much post process to spu as possible.

9) Load balancing
360 gpu load balances itself just fine since it's unified. If the load on a given frame shifts to heavy vertex or heavy pixel load then you don't care. Not so on PS3 where such load shifts will cause frame drops. You will want to shift as much load as possible to spu to minimize your peak load on the gpu.

10) Half floats
You can use full floats just fine on the 360 gpu. On the PS3 gpu they cause performance slowdowns. If you really need/have to use shaders with many full floats then you will want to move such functionality over to the spu's.

11) Shader array indexing
You can index into arrays in shaders on the 360 gpu no problem. You can't do that on PS3. If you absolutely need this functionality then you will have to either rework your shaders or move it all to spu.

Etc, etc, etc...
Can you do some kind of memexport on the RSX? I just read an interesting presentation about crown rendering on MS gamefest site? I remember something called "turbo cache" without remembering if it was related/ what it was.
 
:oops:

Uhh I guess Cell saves the day then? ;)

It is more than that ! ^_^
It is fundamentally a difference in philosophy (and hence, resource investment).

The Cell doesn't just saves the day, it is meant to be integral in everything the PS3 does (graphics, security, physics, AI, system utilities). You can tell from joker454's post when he lumps general Xenon's VMX workload into his GPU task list.

By designing and implementing the pipeline differently, the savvy developers can maximize the combined power more readily because the hardware supports parallel CPU + GPU work at every level (from memory access to cores). Most people only look at specific/narrow features (e.g., shader limitations), but it is only a partial picture.

The problem is also more than just substituting numbers. Replacing VMX load with SPU load is a very simplistic way of porting. In a system that is designed to take advantage of the SPUs specifically, a good developer may be able to maximize the advantage because the SPU is a standalone core with built-in memory and "dedicated" bandwidth. e.g., In the security world, performing security operations on the VMX or SPU is not so interesting. On top of hard number crunching, Kana Shimizu also designed the SPU security to run separately from the PPU; hence gained the special ability to stop the hackers even when they have gained access to the kernel. It is a game changer from this perspective.

If Shimizu simply ported traditional security operations to an SPU's vector engine and call it a day, the PS3 may have been hacked by now. [size=-2]And I wouldn't bother to remember her name.[/size]

Similar efforts exist in the graphics pipeline (See MLAA). This is one of the reasons the Cell CPU is an interesting animal. It's hard to program though, thanks to the small Local Store, slow PPU, etc. :p

Can you do some kind of memexport on the RSX?

Nope. You use the more general SPUs to do CPU-like operations. With that, you also have the opportunity to perform traditional graphics operations differently, or integrate it with other subsystems. It is a lot of work. I supposed most developers don't have that luxury especially if they need to maintain both PS3 and 360 code.
 
question I still want answered....

where is the strength of the RSX? at 300 million transistors....what work are they doing? Where are the transistors wasted at?

Is it just horribly designed. Or is the xenon just superbly designed?
 
Last edited by a moderator:
Nope. You use the more general SPUs to do CPU-like operations. With that, you also have the opportunity to integrate traditional graphics operations differently, or with other subsystems. It is a lot of work though. I supposed most developers don't have that luxury especially if they need to maintain both PS3 and 360 code.
Oops sorry for the brain fart of mine, I did a search turbocahe has nothing to do with memexport... neither with RSX.
 
where is the strength of the RSX? at 300 million transistors....what work are they doing? Where are the transistors wasted at?

Is it just horribly design. Or is the xenon just superbly designed?
It's just a hold design, I don't remember that GF7xxx aged that well vs their ATI relatives. Xenons was on top of it an in-between generation of GPU.
The difference in transistors is not that much if you factor in the RBE (on the daughter die).
 
Don't think so, otherwise people wouldn't bother to get the SPUs involved at all. I'd say a 5 to 10% gain in vertex processing would already be worth some SPU time, as it'd help with those pesky tiny bottlenecks that can cause framerate fluctuation; but if they dedicate an entire array of them, even for a short time, then it presents a significant gain, indicating the lack of RSX's ability to deal with the issue on its own.
I'm not speaking form the perspective of there aren't jobs that need to be picked up by the SPUs. I'm saying, with these numbers in mind, that it doesn't have to eat much of the SPUs' time. I mean look what we're seeing from the top graphical games on one platform and the developers are saying they can go a good deal further. Meanwhile, there is a definite gap between those and top graphical games from the other platform. I don't think the "no one's (not even 1st party devs) pushing the 360 hardware" can really hold up. I think the quote below says a lot:

He estimated one SPU to slightly exceed the vertex performance of a 6600 for a workload consisting of skinning and backface culling. A 660 has 3 vertex units, RSX/7800s have 8. This is in line with the number thrown earlier in this thread ("2 or 3 SPUs dedicated to geometry processing")

That still leaves another 3 or 4 SPUs (and PPU) for other tasks beyond that. Then, there is nAo's statement below.

Different platforms require different care, would not be surprised if ppl working on 360 and suddenly dropping their datasets onto RSX would not observe good numbers (and viceversa)
Now..I can't see how RSX, if used in the right way, should be so limited at vertex processing: in HS we easily render 2-2.5 MTriangles per frame at 30 fps without being VS limited and without making any use of CELL to speed up vertex shading and I know for sure that being more clever we could even go faster..(just using the GPU)
 
Realize that vertex processing is not going on all the time, the load is constantly shifting throughout the rendering of any given frame. If RSX is a bottleneck only 5-10% of the time, it won't take much SPU use to seriously improve the situation; and yet it'd cause serious setbacks if there was no way to compensate for the weakness.


As for nao's comments, I respect him and his work, but there are far too many examples of 1st party developers utilizing SPUs for vertex processing to ignore. Heavenly Sword was a first generation title and probably pushed the tech a lot less than today's games.
I'd say - and this is full speculation - that those 2-2.5 million triangles were mostly rendered as crowds of enemies using heavy LOD, just a few polygons in a segmented character model with no self-shadows or skinning or any complex processing at all, located in a large open arena type of environment with very little occlusion and a simple directional/spot light as the sun. There are many games today that are doing a lot more... which is pretty much expected from third generation titles.

So, I'd modify his statement like this:
Depending on what the game is doing as vertex processing, RSX may or may not be limited on its own.
 
Keep in mind that KZ2 isn't that geometry heavy, for example the sniper character presented in the D'artiste book from Ballistic has only 11000 triangles, which is quite a bit below UC2's average of 18K. Granted, that average probably goes from the highly detailed Drake to the most simple enemies, but there definitely is a difference. KZ2 also seems to have less environment detail, somewhat lower number of enemies at the same time, and so on. So what I'm saying is that it might not be completely representative :)
I have completely missed your point :???: the poly number of snipers is ok but you can post more technical data about about less environment details, low geometry & few enemies in kz2? Because to me seem pretty unfaith... kz 2 has a lot more phisics objects emulated with havok compared to unchy 2 & enemies numbers not appears low to me & about the low geometry too I'm not sure... :???:
 
Old school type thread right here, wish I'd seen it earlier!

The whole RSX <--> Cell memory transfer thing was discussed to conclusion several years ago in a couple of threads, (or maybe I'm remembering side conversations), but I don't think so. It became *the* topic after some ridiculous Charlie D Inquirer article if I recall correctly; I'm sure that triggers the memories of some folk here. Granted, slightly pathetically for a mod I'm not actually going to search for these discussions of old to post a thread link, but rather moving on...

I think we have to view the PS3 graphics subsystem on a macro level and the RSX itself separately. The former, yes, can get the job done - and the utilization of the cross-pool memory writes/reads comes into play there as part of a clever solution to allow Cell to lend a very tangible role; all the moreso lately, much to PS3's benefit.

But the RSX itself when viewed in isolation - I think we can be honest - is just old tech with minor tweaking on NVidia's part for the benefit of Sony. The caches, the memory controller, maybe some other stuff... but it's more of a pretty direct 7x00 port than it is a custom part at the end of the day. I hate to say it, but "typical NVidia" at the time. And the transistor budget analysis is a waste of time IMO between RSX and Xenos; two completely different architectures, but for what they are I think Xenos adds more to its respective system on a per transistor basis.

On some other points...

I agree with the thought (or am stating it now) that AI hasn't really advanced this gen, and certainly not on the level hoped for by some (including myself) when Cell debuted. The discussion here on AI in this thread split across two lines; one was the pathfinding/decision tree aspect, and yes, Cell has 'arrived' in this area in a big way. I feel that developers, and especially our usual top Sony tech showcase studios, have really pushed the algorithm porting to Cell in such a fashion that even stuff that is "bad" on Cell is now fast because with some half-decent code you can essentially brute force the thing to run quickly. And brute force is the wrong image, but... the raw speed is there to make it work and work well is the idea.

The other 'AI' tangent seems to be on the stuff that Patsu is highlighting in terms of Move, PS Eye, etc., but to me it's a misnomer to call it AI - that is simply the signal processing realm where Cell has excelled from day 1, and one of the easier areas to make 'happen' with the chip. So much so that as I've harped in the past, IMO Sony has really dropped the ball in pushing the PSEye and/or related technologies as viable input methods from system launch until today. The whole Minority Report interface, the games, all of it... shame.

The AI I was hoping to see though was some of the academic highly parallel/stream-style projects ported over into gaming to make use of the architecture. To get some "real" intelligence in some games so to speak. In practical terms it's probably too compute intensive given what else is required of the CPU to build a game AI around - and plus what do you even do with it - but that's the sort of 'supercomputing in your game' stuff I would have found novel to see, and something the PS3 would in theory have been capable of that its competitors simply would not have.

Other random thoughts...

1) As we know if the eDRAM had been 12MB vs 10MB, things would have been better at launch for the 360 and certainly easier to implement, both then and now. So Xenos had its own minor misstep for whatever reason (yields?)

2) Let's not quote devs saying something uses "40%," "100%," or anything else like that; whether it uses 0%, or 100%, there's always room for improvement - both in efficiency and in the algorithms themselves (and task distribution).
 
Last edited by a moderator:
The other 'AI' tangent seems to be on the stuff that Patsu is highlighting in terms of Move, PS Eye, etc., but to me it's a misnomer to call it AI - that is simply the signal processing realm where Cell has excelled from day 1, and one of the easier areas to make 'happen' with the chip.

Gah... computer vision problems are commonly classified as AI problems by the academia, not me. ^_^

e.g., http://en.wikipedia.org/wiki/Computer_vision

Much of artificial intelligence deals with autonomous planning or deliberation for robotical systems to navigate through an environment. A detailed understanding of these environments is required to navigate through them. Information about the environment could be provided by a computer vision system, acting as a vision sensor and providing high-level information about the environment and the robot. Artificial intelligence and computer vision share other topics such as pattern recognition and learning techniques. Consequently, computer vision is sometimes seen as a part of the artificial intelligence field or the computer science field in general.

...

Yet another field related to computer vision is signal processing. Many methods for processing of one-variable signals, typically temporal signals, can be extended in a natural way to processing of two-variable signals or multi-variable signals in computer vision. However, because of the specific nature of images there are many methods developed within computer vision which have no counterpart in the processing of one-variable signals. A distinct character of these methods is the fact that they are non-linear which, together with the multi-dimensionality of the signal, defines a subfield in signal processing as a part of computer vision.

There are many overlaps and school of thoughts while scientists work on many different levels. There is no one way to classify knowledge.
 
Well fair enough, but I think when we are talking a console and pixel mapping/rendering vs robotics and the role said processing plays in 'awareness,' it adds a layer of confusion to use the term AI interchangeably. But that could just be me. :)
 
It's more than pixel mapping/rendering though. e.g., Estimating the number, age, mood and other attributes of the users, tracking his motion, speech recognition (e.g., in SingStar), sketch recognition and transforming into 3D objects, gesture recognition, etc.
 
That's the signal processing of which I speak though; to me I don't want to lump it under AI - as I think you legitimately could as a sub-set if you were talking a robot - because in this case it is an application running that will have a pre-defined result based on the input. The ability to capture the input in 'real time' and process the result is the impressive aspect here in terms of Cell/PS3, not what the application (game) is doing with that result, which is very basic/simple. When we are talking robotics AI generally what we are then impressed by is what the machine will do with those data points, and that to me is what qualifies it for the 'artificial intelligence' designation.
 
Ah I see what you're saying. A title like EyePet and concept like Milo are selling on the awareness and companionship idea though. Last gen, it's all motion tracking (i.e., signal processing stuff in your taxonomy).

In general, the aforementioned areas are all building blocks of a complete AI system. Looking from a computer science perspective, I don't think the researchers will separate them cleanly. There are many ways to frame a high level problem into smaller ones.

e.g., Dr. Marks separate PS Move into 2 layers: data layer and recognition layer. Depending on how high level the objective is. All the different techniques will be used to create the illusion of an intelligent software or personality at the recognition level (and above). It doesn't have to be manifested as a robot per se. A smart air conditioner can also run AI programs.
 
Status
Not open for further replies.
Back
Top