GPU<->CPU interconnect...what's possible?

"Nerve-Damage" · Dec 3, 2005

BlueTsunami said:
How about AI, Physics and other CPU intensive tasks? Would things like that have to be compromised to do tasks that GPUs usually do?

Ok letâ€™s say this: :smile:
* PPE: runs a lot of the main game logic and so fourth.
* SPE: Sound
* SPE: AI
* SPE: Physics
* SPE: Certain Post Processing Effects
* SPE: Vertex Shading, ECTâ€¦.
* SPE: Vertex Shading, ECTâ€¦.
* SPE: Whatever else (extra) or used for something listed above.
* RSX: Does all the main pixel rendering and some vertex shading or whatever else.

BlueTsunami · Dec 3, 2005

Nerve-Damage said:
Ok let’s say this: :smile:
* PPE: runs a lot of the main game logic and so fourth.
* SPE: Sound
* SPE: AI
* SPE: Physics
* SPE: Certain Post Processing Effects
* SPE: Vertex Shading, ECT….
* SPE: Vertex Shading, ECT….
* SPE: Whatever else (extra) or used for something listed above.
* RSX: Does all the main pixel rendering and some vertex shading or whatever else.

Wow, the general skeleton being seen like that seems very simple (I know from what i've heard that it won't be as simple to write the code out and getting it all to work together). So each SPE can actually do all the AI and Sound and Physics etc (each task being assigned to one SPE)?

"Nerve-Damage" · Dec 3, 2005

BlueTsunami said:
Wow, the general skeleton being seen like that seems very simple (I know from what i've heard that it won't be as simple to write the code out and getting it all to work together). So each SPE can actually do all the AI and Sound and Physics etc (each task being assigned to one SPE)?

From what I understand (from certain developers) if written correctly the off loading of certain specific task, algorithms, and so fourth to each SPE shouldn’t be a problem. The SPE aren’t stone-age technology of the PC past as certain members would have you believe, at the same time its no cake walk (programming) for them either. Only time will tell…………..

ERP · Dec 3, 2005

I doubt you'll run much of the high level AI on SPE's, the memory is too much of a constraint.

Architecturally what you're likely to see is a lot of the gameplay including AI on the PPE, with more expensive operations batched to SPE's, animation, audio, some collision detection (possibly orchestrayed by the PPE) are all obvious candidates. Even doing this much requires rearchitecting a lot of current game engines, so that they are not dependant on results of these farmed out jobs being immediately available.

You could certainly have some of the SPE's help out with graphics, things like sub-division spring to mind, but you're really going to want to do things that the SPE is good at and the RSX isn't, you're also not going to want to be continuously banging the bus with your SPE tasks. The short version is you need to play to the architectural strengths.

"Nerve-Damage" · Dec 3, 2005

ERP said:
I doubt you'll run much of the high level AI on SPE's, the memory is too much of a constraint.

Architecturally what you're likely to see is a lot of the gameplay including AI on the PPE, with more expensive operations batched to SPE's, animation, audio, some collision detection (possibly orchestrayed by the PPE) are all obvious candidates. Even doing this much requires rearchitecting a lot of current game engines, so that they are not dependant on results of these farmed out jobs being immediately available.

You could certainly have some of the SPE's help out with graphics, things like sub-division spring to mind, but you're really going to want to do things that the SPE is good at and the RSX isn't, you're also not going to want to be continuously banging the bus with your SPE tasks. The short version is you need to play to the architectural strengths.

If I’m not mistaken Crytek was building or creating a special sub-engine specifically for PS3 Cell (SPE) for offloading many task. But I could be wrong…………

Titanio · Dec 3, 2005

Nerve-Damage said:
If I’m not mistaken Crytek was building or creating a special sub-engine specifically for PS3 Cell (SPE) for offloading many task. But I could be wrong…………

They are.

"We scale the individual modules such as animation, physics and parts of the graphics with the CPU, depending on how many threads the hardware offers."

That's what they want more CPU power for. SPUs should be quite well suited to those things

As for AI, I'm not sure what techniques they're using, but the MoH team at EA for one is reportedly using SPUs for AI.

Edge · Dec 3, 2005

Using one SPE for sound would be equivalent to sending a package across town on a 747 Jumbo Jet.

Any SPE used for sound, would have plenty of processing left over for some other significant task.

Do not underestimate the power of EACH and EVERY SPE. They are extremly powerful processors.

I think AI is OK on the SPE's along as you use a number of them to do your AI. You could setup a routine, that farms out AI routines to whatever free SPE, and when the demand for that AI is completed, you remove the job. Of course it all depends how you setup your SPE jobs, and how well you coded your AI routines to take advantage of the SPE strengths and differences.

The SPE's are certainly not useless for random memory access work, or branch heavy jobs, as long as you use the branch prediction routines on the SPE's, and pay careful attention to interrupts, and DMA requests, possibly buffering data a head of time, based on upcoming branches in your code. Of course that's all for programmers that can at least tie their shoelaces.

wireframe · Dec 3, 2005

Edge said:
Using one SPE for sound would be equivalent to sending a package across town on a 747 Jumbo Jet.

Any SPE used for sound, would have plenty of processing left over for some other significant task.

Are you saying this because you know or because you are estimating like many others? I don't think what you are suggesting is close to being true. Let's think about this. Let's say you want 7.1 sound and you want to process 128 channels in-engine. You then want to use sound effects (something like EAX). This all adds up. Even if an SPE can do this comfortably, I don't think it will be a case where you want to throw lots of other stuff at that unit. Switching context is something they don't like doing and goes against the design philosophy. Indeed, this is one "quirk" that has been discussed extensively, if not directly at least as a premise for the performance characteristics being talked about. I would guess that instead of throwing "some other significant task" at the audio processing SPE, developers simply maximize their audio potential.

Do not underestimate the power of EACH and EVERY SPE. They are extremly powerful processors.

You don't have to underestimate something to be realistic and consider that bounds may be introduced for other reasons than hardware limitations, convenience comes to mind.

"Nerve-Damage" · Dec 3, 2005

Edge said:
Using one SPE for sound would be equivalent to sending a package across town on a 747 Jumbo Jet.

Any SPE used for sound, would have plenty of processing left over for some other significant task.

I know that Edge! I was just giving a simple generalized overview.

Do not underestimate the power of EACH and EVERY SPE. They are extremly powerful processors.

100% Agree

I think AI is OK on the SPE's along as you use a number of them to do your AI. You could setup a routine, that farms out AI routines to whatever free SPE, and when the demand for that AI is completed, you remove the job. Of course it all depends how you setup your SPE jobs, and how well you coded your AI routines to take advantage of the SPE strengths and differences.

I believe the AI routines will most likely be done between the PPE and one SPE, but relying more on the PPE end. The SPE will pretty much free up the PPE from ever doing physics, thus leaving the PPE to run routine game code and certain other functions.

The SPE's are certainly not useless for random memory access work, or branch heavy jobs, as long as you use the branch prediction routines on the SPE's, and pay careful attention to interrupts, and DMA requests, possibly buffering data a head of time, based on upcoming branches in your code. Of course that's all for programmers that can atleast tie their shoelaces.

Agree..........

How about this then? (Generalized View)
* PPE: Heavy AI routines, general game data, and certain sound data.
* SPE: lower level AI routines, Physics (partial), ECT...
* SPE: Physics (Main)
* SPE: Sound (5.1-7.1) Dolby Digital, DTS ETC… and whatever else
* SPE: Vertex Shading, ECT...
* SPE: Vertex Shading, ECT...
* SPE: Vertex Shading, Certain Post Processing Effects.
* SPE: Extra unit for whatever else or anything needing help listed above.
* RSX: Mostly pixel shaders, some vertex shading, ECT…

Edge · Dec 3, 2005

wireframe said:
Switching context is something they don't like doing and goes against the design philosophy.

It all depends how it's coded. With those huge number of registers, (128-bitx128), you could easily set things up, where you are not load/storing registers, and local RAM, on a context switch. You could setup you sound routine to stay permanently in the local RAM, and have half the ram (128KB) left over for other routines to come in and execute. Same with the registers, you could save 64 of those registers for other routines.

Because of the real-time nature of sound, I would not context switch the sound routines, and buffers out of the SPE for another job. That would just kill the sound.

7.1 channels and 128 "internal" sound channels is a piece of cake for a 3.2 GHz monster of a processor. Buffering would be an issue, and would have to be watched carefully.

Of course all this for programmers that know how to tie their shoelaces.

Edge · Dec 3, 2005

Nerve-Damage said:
How about this then? (Generalized View)
* PPE: Heavy AI routines, general game data, and certain sound data.
* SPE: lower level AI routines, Physics (partial), ECT...
* SPE: Physics (Main)
* SPE: Sound (5.1-7.1) Dolby Digital, DTS ETC… and whatever else
* SPE: Vertex Shading, ECT...
* SPE: Vertex Shading, ECT...
* SPE: Vertex Shading, Certain Post Processing Effects.
* SPE: Extra unit for whatever else or anything needing help listed above.
* RSX: Mostly pixel shaders, some vertex shading, ECT…

The great thing about CELL, is that not only can you devote SPE's for specific jobs, but you can also set things up, were you can dynamically assign jobs to the SPE's based on workload demand. Like those unified shaders on X360 GPU, you could use the PPE for some very clever load balancing, and depending on the demands being placed on the game engine, can allocate numbers of SPE's to get the job done. That involves the more heavier context switch of having to swap your registers, and localized memory, but that might be better way of doing things for some game engines. But you are not forced to use this for every SPE, you can have SPE's devoted to processing one specific routine, and have maybe 4 of them for processing loads, based on demand.

That's the beauty of the CELL architecture, it's variety in allowing different methods in getting the job done, based on the type of workload being done.

Titanio · Dec 3, 2005

wireframe said:
Are you saying this because you know or because you are estimating like many others? I don't think what you are suggesting is close to being true. Let's think about this. Let's say you want 7.1 sound and you want to process 128 channels in-engine. You then want to use sound effects (something like EAX). This all adds up. Even if an SPE can do this comfortably, I don't think it will be a case where you want to throw lots of other stuff at that unit. Switching context is something they don't like doing and goes against the design philosophy. Indeed, this is one "quirk" that has been discussed extensively, if not directly at least as a premise for the performance characteristics being talked about. I would guess that instead of throwing "some other significant task" at the audio processing SPE, developers simply maximize their audio potential.

It really depends on how much time a SPU has left for that frame after doing the audio. Assuming a 60fps target, a context switch would eat about 1/800th of a SPU's cycles for that frame. So unless a task was pretty much taking the entire length of a frame, I think a SPU certainly will switch to another within that frame. If a task takes 75% of the SPUs available cycles for the frame, and a switch costs you less than 0.125% of the same cycles, are you really going to avoid that and not reclaim the remaining ~25%?

When people discuss context switching in negative terms with regard to SPUs, I think that's just in the context of frequent switches every few cycles, or whatever, as you could do if you had multiple threads in context. It's not worth switching every few cycles when that costs you tens of thousands of cycles. But it can certainly be worthwhile if you're switching every dozens of millions of cycles or more (upon completion of tasks/threads)

That's before we consider things like this:

Another option is to load multiple pieces of code and data to a single SPE; this can be the basis of a primitive multitasking environment on the SPE, useful for multiple small jobs which do not need a dedicated processor. You cannot provide memory protection between tasks running on a single SPE, and such multitasking is necessarily cooperative, not preemptive. However, in cases where the tasks are small enough, and complete reliably enough, this can dramatically improve performance, freeing up other SPEs for dedicated tasks. The cost of transferring a small program to the SPE might be too high in some cases, however.

http://www-128.ibm.com/developerworks/library/pa-fpfunleashing/

I'd wager that most if not all SPUs will see duty with multiple tasks within a frame. I also amn't sure about considering this in terms of "dedicating" certain SPUs to certain tasks. I think something more like a job queue, where any free SPU takes any available task, is more likely - they just take a task, finish it, move on to the next and so on. Of course, devs can divvy things up however they want.

!eVo!-X Ant UK · Dec 3, 2005

w00t i want this on PS3...

DeanoC · Dec 3, 2005

Edge said:
Do not underestimate the power of EACH and EVERY SPE. They are extremly powerful processors.

And equally don't overestimate an SPE's power or underestimate how hard good audio is.

A psychoacoustic decompresser and fourier mixer with a few simple DSP effects (basic reverb, if your going to try a large windowed impulse reverb for example expect a lot more processing power to be sucked out) is going to take a fair bit of processing power per channel, now repeat for say a hundred audio channels and see how much change you get from a single SPE...

And we a still talking about 2D audio, add in a 3D sound system with a decent HRTF and watch those FLOPs disappear.

"Nerve-Damage" · Dec 3, 2005

!eVo!-X Ant UK said:
w00t i want this on PS3...

That should be a problem for PS3, since itâ€™s not bound by a dedicated sound chip. PS3 can offer many different sound medias (formats), via threw firmware or OS updates with the latest audio codecâ€™s.

Another great aspect of the Cell processor that Iâ€™m looking forward too is the true real-time representation of individual sounds that can be applied to each individual object or multi-facet environment setting. â€œThe Cell leaves demo anyoneâ€â€¦â€¦â€¦â€¦â€¦â€¦.

"Nerve-Damage" · Dec 3, 2005

DeanoC said:
And equally don't overestimate an SPE's power or underestimate how hard good audio is.

A psychoacoustic decompresser and fourier mixer with a few simple DSP effects (basic reverb, if your going to try a large windowed impulse reverb for example expect a lot more processing power to be sucked out) is going to take a fair bit of processing power per channel, now repeat for say a hundred audio channels and see how much change you get from a single SPE...

And we a still talking about 2D audio, add in a 3D sound system with a decent HRTF and watch those FLOPs disappear.

So DeanoC what are you up too?

DeanoC · Dec 3, 2005

Nerve-Damage said:
â€œThe Cell leaves demo anyoneâ€â€¦â€¦â€¦â€¦â€¦â€¦.

Lets not talk about the leaf demo, it really upsets sound guys as being the most pointless stupid arse sound demo ever made

"Nerve-Damage" · Dec 3, 2005

DeanoC said:
Lets not talk about the leaf demo, it really upsets sound guys as being the most pointless stupid arse sound demo ever made

But at the same time :?:

Edit: What they (Sony) showed wasn’t feasible for gaming or that the demo just sucked?

DeanoC · Dec 3, 2005

Nerve-Damage said:
Edit: What they (Sony) showed wasnâ€™t feasible for gaming or that the demo just sucked?

Because most sound designers spend alot of time making sure than whenever you get multiple sounds of a same or similar type that you swap to a custom sound. Playing a sound 50 time sounds rubbish compared to a single sound that sound like 50 individual sounds.

It the exact opposite of a good sound demo from a sound designer POV. You can tell it was just some techy who have never made a sound scape in his or her life

"Nerve-Damage" · Dec 3, 2005

Thanks

DeanoC said:
Because most sound designers spend alot of time making sure than whenever you get multiple sounds of a same or similar type that you swap to a custom sound. Playing a sound 50 time sounds rubbish compared to a single sound that sound like 50 individual sounds.

It the exact opposite of a good sound demo from a sound designer POV. You can tell it was just some techy who have never made a sound scape in his or her life

Thanks DeanoC for clearing that up! :smile:

Now how about some PS3 RSX development kit newsâ€¦j/kâ€¦or am I?

GPU<->CPU interconnect...what's possible?

"Nerve-Damage"

BlueTsunami

I laugh at you! HA HA HA!

"Nerve-Damage"

ERP

"Nerve-Damage"

Titanio

Edge

wireframe

"Nerve-Damage"

Edge

Edge

Titanio

!eVo!-X Ant UK

DeanoC

Trust me, I'm a renderer person!

"Nerve-Damage"

"Nerve-Damage"

DeanoC

Trust me, I'm a renderer person!

"Nerve-Damage"

DeanoC

Trust me, I'm a renderer person!

"Nerve-Damage"

Similar threads