PC Watch (my translation): PS3 Evaluation System, and much,

PC-Engine · Jul 24, 2005

seismologist said:
PC-Engine said:

What happens if you're vertex bound?

Click to expand...

allocate another SPE.

If you're vertex bound then all of your SPEs are already being used up.

nondescript · Jul 24, 2005

Re: PC Watch (my translation): PS3 Evaluation System, and mu

Titanio said:
Presuming you can, is it generally practical with the kind of bandwidth in the evaluation system? I think whilst you may be able to do it, having a lot more bandwidth may change your approach to accessing main memory vs the evaluation system..I think that's probably the author's main point..devs probably are not going to be using the xdr memory from the gpu in the same way they will be able to in the final box.

Yup.

The reason I translated this is because I wanted to here what devs have to say about this. This seems to be a major weakness in PS3 development support, if the PS3 Eval System architecture encourages a different data-passing scheme and CPU-GPU workload allocation than the one that will bring out maximum performance in PS3.

Shifty Geezer · Jul 24, 2005

I don't know how the current setup will affect first-gen games. There's still an existing system comparable with PC setup so devs can use existing PC communication.

Process data, fill RAM, GPU collects data from RAM and renders pixels. This works well enough for say UE3 1st gen games even on the eval kits. The huge CPU<>GPU bandwidth of PS3 is something devs will probably need to hav a good long look at to make real use of more than just feeding the GPU.

Worst case situation a dev can work on an algorithm for doing something over FlexIO by commincating over the slow main RAM, and know that they'll have 15x the BW for the real thing. As long as they get the algorithm working, even if slow, they'll make headway into using FlexIO. The only unknown is *how* to address the GPU or CPU storages. I think they've all got their own addresses so it's all accessed with pointers, in which case going from reading/writing main memory to reading/writing directly across the FelxIO should be a doddle.

Incidentally, how is memory addressed on Cell in a scalable environment? In the case you have say 2 PS3's linked up sharing resources, how would PS3a access PS3b's RAM? I'm guessing this is intrinsic to Cell's design given scalable was a key priority. Are there indications as to where in the memory map the SPE LS's and GPU caches (or other onboard storage on RSX) are located?

seismologist · Jul 24, 2005

PC-Engine said:
If you're vertex bound then all of your SPEs are already being used up.

Then you're CPU bound.

PC-Engine · Jul 24, 2005

seismologist said:
PC-Engine said:

If you're vertex bound then all of your SPEs are already being used up.

Click to expand...

Then you're CPU bound.

...due to the non load balancing setup that you've proposed....

seismologist · Jul 24, 2005

PC-Engine said:
seismologist said:

PC-Engine said:

If you're vertex bound then all of your SPEs are already being used up.

Click to expand...

Then you're CPU bound.

Click to expand...

...due to the non load balancing setup that you've proposed....

so a performance advantage in every situation that isn't the oddball case of being extreme vertex bound to the point where the CPU can no longer keep up.
Seems like a decent tradeoff to me.

By the way, next time you respond please back it up with actual numbers as evidence. I'm not going to keep responding to multiple pages of one liners.

PC-Engine · Jul 24, 2005

so a performance advantage in every situation that isn't the oddball case of being extreme vertex bound to the point where the CPU can no longer keep up.

In a unified shader architecture, the CPU doesn't need to keep up since everything is already being load balanced by the GPU.

By the way, next time you respond please back it up with actual numbers as evidence.

Numbers are not required in this kind of simple comparison.

seismologist · Jul 24, 2005

PC-Engine said:
so a performance advantage in every situation that isn't the oddball case of being extreme vertex bound to the point where the CPU can no longer keep up.

Click to expand...

In a unified shader architecture, the CPU doesn't need to keep up since everything is already being load balanced by the GPU.

By the way, next time you respond please back it up with actual numbers as evidence.

Click to expand...

Numbers are not required in this kind of simple comparison.

Though I'm not supposed to be working on Sunday

I went ahead and quickly dug up the numbers. It appears that the combined unified shader peformance of Xenos are capable of 240gflops.
Each SPE is capable of doing 32gflops plus an addition 32 for the GPU vertex shaders.

If these numbers sound right I'll let you do the math from here.

Titanio · Jul 24, 2005

seismologist said:
Each SPE is capable of doing 32gflops plus an addition 32 for the GPU vertex shaders.

If these numbers sound right I'll let you do the math from here.

The SPEs provide peak performance of 25.6Gflops each. The vertex shaders in RSX should provide 44Gflops, assuming it's the same configuration as the RSX.

Total peak vertex processing capability: 223.2Gflops (7 SPEs + VS)
Total peak pixel processing capability: 264Gflops

Assuming the SPEs can be leveraged for vertex work, on a high level the PS3 can accomodate something close to a 50:50 balance of power between pixel and vertex shading.

Snyder · Jul 24, 2005

Titanio said:
The SPEs provide peak performance of 25.6Gflops/s each. The vertex shaders in RSX should provide 44Gflops/s, assuming it's the same configuration as the RSX.

Total peak vertex processing capability: 223.2Gflops/s (7 SPEs + VS)
Total peak pixel processing capability: 264Gflops/s

Sorry for this OT...but: You shouldn't take "3D acceleration" that literally.

Flops=Floating point operation per second...no /s needed. Sorry, I'm in some anal mood today...

Titanio · Jul 24, 2005

Snyder said:
Titanio said:

The SPEs provide peak performance of 25.6Gflops/s each. The vertex shaders in RSX should provide 44Gflops/s, assuming it's the same configuration as the RSX.

Total peak vertex processing capability: 223.2Gflops/s (7 SPEs + VS)
Total peak pixel processing capability: 264Gflops/s

Click to expand...

Sorry for this OT...but: You shouldn't take "3D acceleration" that literally.

Flops=Floating point operation per second...no /s needed. Sorry, I'm in some anal mood today...

Haha, you're quite right, corrected

PC-Engine · Jul 24, 2005

seismologist said:
PC-Engine said:

so a performance advantage in every situation that isn't the oddball case of being extreme vertex bound to the point where the CPU can no longer keep up.

Click to expand...

In a unified shader architecture, the CPU doesn't need to keep up since everything is already being load balanced by the GPU.

By the way, next time you respond please back it up with actual numbers as evidence.

Click to expand...

Numbers are not required in this kind of simple comparison.

Click to expand...

Though I'm not supposed to be working on Sunday I went ahead and quickly dug up the numbers. It appears that the combined unified shader peformance of Xenos are capable of 240gflops.
Each SPE is capable of doing 32gflops plus an addition 32 for the GPU vertex shaders.

If these numbers sound right I'll let you do the math from here.

And I'll let you figure out what a game is comprised of...

Neeyik · Jul 25, 2005

I was wondering how long it would before we started to see threads degenerate into tit-for-tat arguing. For God's sake - pack it in!

mckmas8808 · Jul 25, 2005

Yeah to me this sounds like great news for guys like us straving for infomation. There is so much information here and from eariler articles that I think 50% of it gets past over our heads.

randycat99 · Jul 25, 2005

Neeyik said:
I was wondering how long it would before we started to see threads degenerate into tit-for-tat arguing. For God's sake - pack it in!

Perhaps, if you PM that person to reign it in, maybe we'll get another week or so of good behavior outta him? Given the sheer number of jabs he has left around in less than a week, it's amazing the rest of us have done so well to largely ignore him this far. What'dya say?

PC-Engine · Jul 25, 2005

Can somebody tell me where all of the AI, physics, etc. will be calculated if the SPE's are occupied doing vertex work?

randycat99 said:
Neeyik said:

I was wondering how long it would before we started to see threads degenerate into tit-for-tat arguing. For God's sake - pack it in!

Click to expand...

Perhaps, if you PM that person to reign it in, maybe we'll get another week or so of good behavior outta him? Given the sheer number of jabs he has left around in less than a week, it's amazing the rest of us have done so well to largely ignore him this far. What'dya say?

Stay on topic.

Sonic · Jul 25, 2005

randycat, there is no reason for you to come into this thread and share your thoughts about it. You contradicted yourself by asking Neeyik to PM the offending posters. In turn, when you have a post like this I advise you to PM it to a mod instead of posting it in a thread.

PC-Engine, ease up a little bit.

scificube · Jul 25, 2005

If I am understand things there are two schools of thought here...

For the X360:

1. You can let Xenos handle all the load balancing

2. If you let the X360's CPU handle all the vertex processing and dedicate the GPU to only pixel processing you may see a net gain in the overall amount of processing power available to you.

caveat: If you become vertex limited you've nowhere to shift the load where as if if the CPU were not already doing vertex operations it could take some of that load.

If you allow Xenos to do all the load balancing...when it get gets limited it beyond what it can internally balance for you could pull from the CPU. I don't understand why it's not better to have the CPU dedicated to that portion of the vertex load that would cripple the Xenos and have this never occur. Concurrently when the Xenos is not approaching this limit it will automatically dedicate itself to more pixel processing and you could take advantage of this.

Is what I'm thinking in error somehow?

With the Cell:

The balancing act will really occur on the Cell. It could dynamically allocate more of less of it's resources to vertex processing, but really only in addition to the vertex processing capabilities of the RSX. As removing the load only ensures RSX's vertex pipelines sit idle.

Basically this approach can be used to remove the portion of a vertex load that would be in excess to what the RSX could do in a dynamic fashion or you could use Cell in combination with RSX to handle a constant vertex load greater than what the RSX can do alone in an efficient manner such that this is something that could be taken advantage of.

I hope I'm on track with that.

It would seem a good thing to always have the CPUs doing some vertex processing if the resources could be spared. Did I miss something?

PC-Engine · Jul 25, 2005

On Xenos you can have all of the shaders do vertex or pixel work on RSX you cannot. If you're PS limited on RSX the VS just sits idle and the CPU cannot assist in PS work. Correct me if I'm wrong but that is my understanding. If CELL can offload some PS work then I stand corrected.

Powderkeg · Jul 25, 2005

scificube said:
If I am understand things there are two schools of thought here...

For the X360:

1. You can let Xenos handle all the load balancing

2. If you let the X360's CPU handle all the vertex processing and dedicate the GPU to only pixel processing you may see a net gain in the overall amount of processing power available to you.

caveat: If you become vertex limited you've nowhere to shift the load where as if if the CPU were not already doing vertex operations it could take some of that load.

If you allow Xenos to do all the load balancing...when it get gets limited it beyond what it can internally balance for you could pull from the CPU. I don't understand why it's not better to have the CPU dedicated to that portion of the vertex load that would cripple the Xenos and have this never occur. Concurrently when the Xenos is not approaching this limit it will automatically dedicate itself to more pixel processing and you could take advantage of this.

Is what I'm thinking in error somehow?

With the Cell:

The balancing act will really occur on the Cell. It could dynamically allocate more of less of it's resources to vertex processing, but really only in addition to the vertex processing capabilities of the RSX. As removing the load only ensures RSX's vertex pipelines sit idle.

Basically this approach can be used to remove the portion of a vertex load that would be in excess to what the RSX could do in a dynamic fashion or you could use Cell in combination with RSX to handle a constant vertex load greater than what the RSX can do alone in an efficient manner such that this is something that could be taken advantage of.

I hope I'm on track with that.

It would seem a good thing to always have the CPUs doing some vertex processing if the resources could be spared. Did I miss something?

There is another option, and it's the one that happens in real life.

It is extremely rare for any game to be vertex bound. So rare that typical GPU's (Non-unified shaders) typically have 3-4 Pixel Shading units for every Vertex Shading unit. In truth, there should never be a situation where you have to revert back to the CPU for Vertex Shader operations.

The real life situation is that games will typically leave 1-2 Vertex Shader units unused, or underused. They simply don't push enough geomitry to really push these GPU's to the limits in Vertex Shading. This is where a Unified Shader has an advantage, because those unused shader units can be switched over to Pixel Shading instead.

I can't think of a single game that is Vertex Shader bound while having plenty of Pixel Shader processing power left over. It simply never happens. These guys are fighting over a hypothetical situation that is in truth a moot point.

Besides, the added latency of having to revert back to the CPU for Vertex Shading operations makes it an unattractive option for either system.

PC Watch (my translation): PS3 Evaluation System, and much,

PC-Engine

nondescript

Shifty Geezer

uber-Troll!

seismologist

PC-Engine

seismologist

PC-Engine

seismologist

Titanio

Snyder

Titanio

PC-Engine

Neeyik

Homo ergaster

mckmas8808

randycat99

PC-Engine

Sonic

Senior Member

scificube

PC-Engine

Powderkeg

Similar threads