SPU usage in games

Alan wake on PS3… I don't think that Microsoft Game Studio allowed this… may be on DS ;)

So PS3 and PC only not xbox360, strange I thought I heard about it being demonstrated on the xbox360 last year or so. :smile:

As for Uncharted animation: they use "clever" blending + IK.
So it's not "procedural"...

Interesting, thanks for the info!
 
Last edited by a moderator:
You are wrong, you just forget the fact, that you can not access GPU directly on PC.
All access to GPU is through driver + API, and this is killing your performance.
So in fact, ALL PC games are CPU limited these days.

Take any modern game and run it on these two hypothetical machines:

1) An ancient 2.4 ghz pentium with an nvidia 8800 gtx
2) 1000ghz overclocked 50 core infinite power cpu with an nvidia 6800gt.

Run the game at 1920x1200 with all features on that the game allows. Which machine would you bet that the game would run better on? If all PC games are cpu limited as you say then clearly machine #2 would be the victor? It still may suck on both, but I'd wager that machine #1 would win more often.

Further, if the DirectX api layer is such a 'dramatic' burden, then how the heck did anyone ever get DirectX games running on 700 Mhz PCs?
 
Further, if the DirectX api layer is such a 'dramatic' burden, then how the heck did anyone ever get DirectX games running on 700 Mhz PCs?

Is that subjective? If someone makes a 3D Tetris running on Windows with DirectX, it defeats your point? :oops:

But we know what you meant.
 
I think you're talking about 2 different concepts. To validate his points, you (or someone else) will need to do the following tests for CPU-boundedness:

1) An ancient 2.4 ghz pentium with an nvidia 8800 gtx
2) 1000ghz overclocked 50 core infinite power cpu with an nvidia 8800 gtx

If there is a significant enough performance increase from 1) to 2), then the CPU (and/or something between 8800 gtx and the CPU) is holding the 8800 gtx back. Otherwise, the CPU is not the bottleneck.

Your tests contains 2 variables (8800 gtx vs 6800 gt, 2.4 ghz vs 50 core), so it's hard to conclude anything specifically about the CPU (or GPU).
 
Take any modern game and run it on these two hypothetical machines:

1) An ancient 2.4 ghz pentium with an nvidia 8800 gtx
2) 1000ghz overclocked 50 core infinite power cpu with an nvidia 6800gt.

Run the game at 1920x1200 with all features on that the game allows. Which machine would you bet that the game would run better on? If all PC games are cpu limited as you say then clearly machine #2 would be the victor? It still may suck on both, but I'd wager that machine #1 would win more often.
#2 would win every single time

i wouldnt even use the graphics card,
just do all rendering on the cpu in software 50x 1000ghz processors (whats that like greater than 1000x what cpus have currently), it should handle all games at 1920x1080 at ease (based on what ive seen of cpu rendering)

(though i do agree unlike whatever nvidia/amd pdf's might say to the contray, gpus are often the bottleneck)
 
Last edited by a moderator:
Well my GPU is what limits perfomance in several games.

CoH, PREY, FSX, C&C3, HL2, FEAR, Far Cry, Oblivion, Stalker, X3, Hitman : Blood money to name some. Disabling mapping technique and lowering resolution aswell as disabling AA and AF makes thoose games run much faster.
 
Your tests contains 2 variables (8800 gtx vs 6800 gt, 2.4 ghz vs 50 core), so it's hard to conclude anything specifically about the CPU (or GPU).

I purposely put two variables in there! Let me word it in a different way. Say you have machine #1. Check your framerate. Now upgrade your cpu to the latest and greatest, and check your framerate difference. Now say you have machine#2. Check your framerate. Now upgrade your gpu to the latest and greatest and check your framerate difference.

After upgrades, both machines are now identical. But which of the two machines do you think will exhibit the greater % increase in framerate from pre upgrade, assuming you aren't playing Tetris? ;) Whichever one does will tell you where the bottleneck is. I'd bet it's the gpu swap.


zed said:
2 would win every single time
i wouldnt even use the graphics card,
just do all rendering on the cpu in software 50x 1000ghz processors (whats that like greater than 1000x what cpus have currently), it should handle all games at 1920x1080 at ease (based on what ive seen of cpu rendering)

Ok, you got me there ;) But you'd have one whopper of an electric bill!
 
I purposely put two variables in there! Let me word it in a different way. Say you have machine #1. Check your framerate. Now upgrade your cpu to the latest and greatest, and check your framerate difference. Now say you have machine#2. Check your framerate. Now upgrade your gpu to the latest and greatest and check your framerate difference.

After upgrades, both machines are now identical. But which of the two machines do you think will exhibit the greater % increase in framerate from pre upgrade, assuming you aren't playing Tetris? ;) Whichever one does will tell you where the bottleneck is. I'd bet it's the gpu swap.

Naturally, the two variables (CPU and GPU) are needed.
What I am saying is that psocerer and you're arguing about 2 different things.
In your example, it can also be true that both your pre- and post-upgraded GPU machines are CPU bound -- which would make psocerer right too.

Ok, you got me there ;) But you'd have one whopper of an electric bill!

I am still curious about SPU usage in games. The ability for a CPU to work closely with the SPU is an interesting idea. I hope the Kill Zone 2 technical slides will be out soon enough.
 
Last edited by a moderator:
You are wrong, you just forget the fact, that you can not access GPU directly on PC.
All access to GPU is through driver + API, and this is killing your performance.
So in fact, ALL PC games are CPU limited these days.
And because you can do nothing about it (you can not change the way driver or Direct3D behaves) the games use settings that reduce graphic quality, which in turn reduces the load on API and CPU.
Of course you can always make any game GPU limited by pumping up resolution or AA/AF settings, but in most cases this means that you just stress ONE GPU component (ROP namely) and this causes GPU bottleneck.


As for Uncharted animation: they use "clever" blending + IK.
So it's not "procedural" but it's way more advanced than any PC GPU can do, just because it's SPU-based.
And if you have just one IK agent it's not hard to do proper IK solutions.

:LOL:

Best post ever!

Thoroughly made my day that did! :LOL:
 
I don't remember playing any PC game that seems to be using CPU heavily for non-graphical tasks.

But that doesn't necessarily mean PC games are not CPU limited. For one, games can scale much better on GPU than CPU as the latter affects gameplay.

That pretty much means your CPU code should be able to run well on older CPU, smaller cache(s) and slower memory, compared to your GPU code.
 
Run the game at 1920x1200 with all features on that the game allows.

Ok, I've got ROP bottleneck what next?
Does it mean that game engine is GPU limited? No, it means that on PC you can always artificially get bottlenecked anywhere you want.
You just need to understand, that GPU is a very special hardware which does not work as one piece, so when you over-stress one of the modules it does not mean you've used all the GPU resources, you're not limited, you've just hit one of the walls with stupid over-stressing.
 
Well my GPU is what limits perfomance in several games.

CoH, PREY, FSX, C&C3, HL2, FEAR, Far Cry, Oblivion, Stalker, X3, Hitman : Blood money to name some. Disabling mapping technique and lowering resolution aswell as disabling AA and AF makes thoose games run much faster.

Let's do some explanations.
Thirst of all how do you measure speed of a processing unit?
Very simple, you take amount of work you want to do and divide it by the time it took to do this work. That's simple, but people often forget: if you can not use processing unit for some time - you can not do any work, and thus your time increases for nothing.
So let's then talk about the CPU <-> GPU collaboration.
When you render frames you prepare some data and then issue a command to GPU through some kind of call. And this is done on CPU.
So what exactly happens when you issue some direct3d call for example?
CPU waits for the call to complete!
Waits = does nothing certain amount of time.
Which in turn means: if you've got enough direct3d calls in your frame, your CPU will wait all the time! And you can not do any work.
This way you're CPU limited because of GPU.

What's really funny here is this: the amount of time CPU waits has little to do with CPU speed, which means even worse - faster CPUs will loose more speed than slower ones on the same code.
 
So PS3 and PC only not xbox360, strange I thought I heard about it being demonstrated on the xbox360 last year or so. :smile:
I think you misunderstand.

It's PC/360 exclusive as Microsoft Games are the publisher now. It was originally planned for PS3 until MS bought up the rights.
 
Interesting, so CPU's spend time on wasting cycles and limits GPU perfomance due to drivers and API overhead (how about DX10 though?).
Yet PC CPU's and GPU's even thought they are being limited by drivers and API still have the raw perfomance to drive games with cutting edge features such as graphics features, AI, physics etc that is only possible on PC in the kind of scope they are used in.
Fascinating, one can even say PC CPU's and GPU's are barely tapped (well not much atleast)! :smile:
 
Interesting, so CPU's spend time on wasting cycles and limits GPU perfomance due to drivers and API overhead (how about DX10 though?).
Yet PC CPU's and GPU's even thought they are being limited by drivers and API still have the raw perfomance to drive games with cutting edge features such as graphics features, AI, physics etc that is only possible on PC in the kind of scope they are used in.
Fascinating, one can even say PC CPU's and GPU's are barely tapped (well not much atleast)! :smile:

DX10 lifts some of the barriers, but not even close to what you can use on PS3.
And I do not mean that PS3 is more advanced, no, it's just that you can use hardware much-much more efficiently.
For example, the now famous, "MSAA with deferred shading in Killzone 2".
You can do MSAA with DS on any PC GPU starting from GF6x, but you can not do it through Direct3D 9, because D3D9 abstracts the memory layout and you can not access it directly.
What was done in DX10: layout standardization, so you can now copy almost any recourses to any other resources, on PS3 you do not have this problem, because you know exactly where the resources are and can directly access them.

I even can make rather bold statement that you can "emulate" any DX10 feature on the PS3, in most cases even with a very good speed.
 
DX10 lifts some of the barriers, but not even close to what you can use on PS3.
And I do not mean that PS3 is more advanced, no, it's just that you can use hardware much-much more efficiently.
For example, the now famous, "MSAA with deferred shading in Killzone 2".
You can do MSAA with DS on any PC GPU starting from GF6x, but you can not do it through Direct3D 9, because D3D9 abstracts the memory layout and you can not access it directly.
What was done in DX10: layout standardization, so you can now copy almost any recourses to any other resources, on PS3 you do not have this problem, because you know exactly where the resources are and can directly access them.

How about OpenGL, was it/is it better when it comes to efficiency?


I even can make rather bold statement that you can "emulate" any DX10 feature on the PS3, in most cases even with a very good speed.

Of course but in the end it is about the scope of use and how many different effects are done at the same time with good speed.
 
How about OpenGL, was it/is it better when it comes to efficiency?

I don't really know, because I don't use it, but it seems that it's better.

Of course but in the end it is about the scope of use and how many different effects are done at the same time with good speed.

What I can tell with 100% confidence the PS3 games with time will be much more interactive than anything else.
Which means: a lot of moving and movable objects, tons of animations and a lot of characters on the screen.
 
So let's then talk about the CPU <-> GPU collaboration.
When you render frames you prepare some data and then issue a command to GPU through some kind of call. And this is done on CPU.
So what exactly happens when you issue some direct3d call for example?
CPU waits for the call to complete!
Waits = does nothing certain amount of time.
Which in turn means: if you've got enough direct3d calls in your frame, your CPU will wait all the time! And you can not do any work.
This way you're CPU limited because of GPU.

Graphics APIs are mostly non-blocking. You pass them your data and they return immediately, with the GPU performing the commands asynchronously as much as possible with the CPU. Calls that needlessly sync the two are bugs, outside of things like final frame sync. The problem with making too many calls is in their setup, with an expensive context switch on top of the normal function call overhead.
OpenGL drivers are more complex and flexible than DX drivers, and thus necessarily slower. That won't change until their major API refresh in Mt. Evans.
 
What I can tell with 100% confidence the PS3 games with time will be much more interactive than anything else.
Which means: a lot of moving and movable objects, tons of animations and a lot of characters on the screen.

"anything else" is a pretty bold claim..

PC tech moves along pretty quickly I hope you realise?
 
The problem with making too many calls is in their setup, with an expensive context switch on top of the normal function call overhead.

It does not matter why you have syncs, the problem is - you have them in DX9 and DX10, and will have them in any API which tries to abstract the hardware.
In DX9 you have problem not only with state changes, but also with any push buffer sync. Which means: yes, the calls are "non-blocking" to the programmer, but they absolutely need to sync their data anyways, so it doesn't matter if you get small blocks every call or you get a large block every 100 calls, it's overhead in any case.
And one more point: on PC the internals of the driver and API are top secret so you can not even optimize properly in most cases, because you need to rely on outside data. I hope we will change that.
 
Back
Top