Are developers ready to cope with multi-core consoles?

Tuttle · Nov 6, 2003

Most of the parallelism in game code right now is an afterthought. The game code is written for the most part oblivious to maximizing parallelism during the majority of the dev process. At some point the main engineer starts profiling the code and starts moving code around to maximize parallelism with the GPU or coprocessors. Only the most trivial form of synchronization is needed. Has the GPU finished drawing or has the VBL started.

The main code stream is executed in serial order with only things like hardware callbacks being executed outside the linear execution of the game logic.

For an architecture like Cell, I can see two different types of parallelism.

1) Game code still has a main thread that spawns sub threads on the other units whose computational results are independent of each other. An example game would be a game no more complex than what we have today except it uses the massive float point parallelism to do insane amount of patch tessellation. Synchronization would be not much more difficult that what is being done today with a standard CPU+GPU pair. Console engineers are already doing similar type of coding on the PS2.

2) Game code that is completely broken up into execution packets. An example would be a game that has order(s) of magnitude more active objects(people,cars,zombies) being updated in the world. Objects are updated in groups in parallel on multiple execution units in parallel. As far as I know no one has written game code of structure.

So far almost all parallelism still fits into the classic game structure of:

while(true)
{
world.Update
world.Render
}

As long as the structure is some form of the above, adding more execution units is just another form of optimization. If the architecture requires a break with the classic game structure, game engineers are going to have to do some major homework.

chaphack · Nov 7, 2003

SCEI has yet made known how the hell they are gonna help the developers to work with this multicore Cell system. They just laughed it off when someone questioned about the "crazy architecture"... :?

Paul · Nov 7, 2003

SCEI has yet made known how the hell they are gonna help the developers to work with this multicore Cell system. They just laughed it off when someone questioned about the "crazy architecture"...

Your dead wrong on this one.

Remember all those Cell interviews in Japanese? Kutaragi said there would be a PS3 API, sample code, and the works.

MfA · Nov 7, 2003

Pipelining wont get you enough parallelism unless you want to delay rendering for 10's of frames.

chaphack · Nov 7, 2003

Cant say i do, which Japanese interview is that? What exactly did he say?

Paul · Nov 7, 2003

You can check this.

http://www.beyond3d.com/forum/viewtopic.php?t=7579

If you don't see where it is, just tell me.

DeadmeatGA · Nov 7, 2003

...

Kutaragi said there would be a PS3 API, sample code, and the works.

Yea, a bunch of code that 99% of developers can't follow, only the few and proud supercomputing guys do...

chaphack · Nov 7, 2003

This?

"S C E made 3D hardware access via a graphic library in Founder PS. In PS2, both sides of the development technique using the approach which squeezes out direct power from hardware, and the middleware which a third vender offers are supported. In PSP, the Founder[ PS ]-like technique which restricts hardware access to a new function and is controlled by the library side was taken.

Although the method of not making it conscious [ by a certain method ] of hardware or network composition by PS3 will be offered, the burden which the S C E side bears will increase sharply. In this interview, although not touched about PS of PS3, development environment, and the tool, if the portion becomes clear, the figure of software which will be realized on PS3 can imagine as a more concrete thing. "

Cant exactly catch WTF that translation is about...

BUT seems like he is saying PSP will be as easy to work with as PS1(duh! known) WHILE it also seems that nothing is unveiled about PS3 development environment?

I think we need a Japanese guy to work out the Japanese(duh!

)!

I love the opening sentence though!

Paul · Nov 7, 2003

A Japanese guy did translate it right in the topic.

9) To achieve these goals, Sony needs to solve several problems: network latency, scalability, hardware abstraction. PS3 will work much like the original PS, with SCE providing high-level libraries to abstract the hardware for the applications.

[/i]

chaphack · Nov 7, 2003

In that case, if true, then i do hope PS3 will work like the original PS. Did he also translate this part too,

"In this interview, although not touched about PS of PS3, development environment, and the tool, if the portion becomes clear, the figure of software which will be realized on PS3 can imagine as a more concrete thing. "

Sounds like KK "plans" (inside joke

) to make PS3 easy to work for, but nothing much was let known in that interview.

Oh well 2004 cant come fast enough.

DeadmeatGA · Nov 7, 2003

...

FYI, Kutaragi did comment about PSX3 development while discussing PSP development during an interview with impress.co.jp, stating something like "PSP will be abstracted like PS1, leave the metal banging to PS3"..

Paul · Nov 7, 2003

I can say this Chap, PS3 will provide you with some awesome eye candy. I know how you love graphics

notAFanB · Nov 7, 2003

According to what I understood this is an absolutelly transparent thing in the CELL case where the compiler or JIT compiler will do all the necessary to distribute work between the APUs.

this is recurring theme in parrellel computing, if it were easy we could scale back the R&D alas....

Yea, a bunch of code that 99% of developers can't follow, only the few and proud supercomputing guys do...

probably not but I imagine the sample code (if it exist, we have heard nothing as yet) would be used when looking much more closely at CELL.

the real question is *if* they do plan to ship an API to devs for PS3, what are their options?

Pipelining wont get you enough parallelism unless you want to delay rendering for 10's of frames.

ditto

Panajev2001a · Nov 7, 2003

Re: ...

DeadmeatGA said:
FYI, Kutaragi did comment about PSX3 development while discussing PSP development during an interview with impress.co.jp, stating something like "PSP will be abstracted like PS1, leave the metal banging to PS3"..

That is not what he said, but nice job trying to bend the truth as always.

Fafalada · Nov 7, 2003

ShinHoshi said:
this is an absolutelly transparent thing in the CELL case where the compiler or JIT compiler will do all the necessary to distribute work between the APUs.

Actually while this would be nice - I'll believe it when I see it. But personally I am skeptical about what compilers can do.
Consider for example that even writting Serial code, a skilled C++ programmer can do a hell of a lot better job then any C++ compiler will do on its own(at least in math intensive code) - and that's without writting a single line of asm code.
Ideally I would expect an API that gives you at least one - or a couple of, basic rendering paths that don't need lowlevel handling but it also allows you to create your own paths if you want to go lower.

Mfa said:
Pipelining wont get you enough parallelism unless you want to delay rendering for 10's of frames.

I remember discussions about this years ago when GSCube was unveiled, even on other forums - rendering processes can very nicely break down to layers, giving similar utilization to pipelining without the absurd latency.
The rest of the code is a different beast - but say you break down the game code loop into several iterations - loop unrolling + pipelining will work within a single frame.
I've been doing it with loop optimizations on asm level long enough that I think I could find a few spots in higher code to use it too.

Saem · Nov 7, 2003

I can't help but harp on this again, but I think imperative languages need to fuck off for the most part. They're really not very useful in these computing paradigms, unless you're doing some seriously low level stuff. Unless you want to put in massive programmer overhead of building the infastructure and using various non-standard libraries/terminologies. What's needed is a quasi-imperative language where at most the programmer merely needs to hint at what is atomic.

In most large projects one spends a lot of time and a lot of money simply building up the language to a reasonable level of abstraction where they're basically achieving "procedural zen".

Now if you'll excuse me, I have to wash my mouth out, my post sounds a bit too much like LISP propaganda.

MfA · Nov 7, 2003

Rendering in layers to get parallelism? I never understood how that is supposed to be better than simply rendering seperate parts of the screen in parallel, frustum culling is cheap.

To me it seems the obvious way to exploit parallelism is batch processing, not pipelining.

Fafalada · Nov 7, 2003

Mfa said:
Rendering in layers to get parallelism? I never understood how that is supposed to be better than simply rendering seperate parts of the screen in parallel, frustum culling is cheap.

Layer rendering has nearly complete data locality and very little redundant processing (Note also that I'm talking about the whole rendering process here, not just rasterization).
Rendering in screen tiles on the other hand has almost no data locality, so you stand a good chance bandwith use will also scale linearly with number of nodes, not just the 'redundant' culling processing.
Moreover the culling part may not be that cheap at all depending on your setup (in the case of GS cube discussion it would definately not be).
- After all, if it were that simple and cheap, every PS2 game would break rendering to tiles and come with 4x4 AA

To me it seems the obvious way to exploit parallelism is batch processing, not pipelining.

That goes without saying but I'm not sure right now how often you can do it in more general purpose code.

MfA · Nov 7, 2003

Fafalada said:
Layer rendering has nearly complete data locality and very little redundant processing (Note also that I'm talking about the whole rendering process here, not just rasterization).
Rendering in screen tiles on the other hand has almost no data locality, so you stand a good chance bandwith use will also scale linearly with number of nodes, not just the 'redundant' culling processing.

In the framebuffer data locality is obvious poorer for the compositing approach, hard to avoid with such an incredible amount of memory devoted to it. For textures the only way it would be much better is if there was a lot of repetetive use of textures. As for geometry, it depends on the average size of the projection of bounding volumes at the lowest level of the bounding volume hierarchy. If small enough (depends on the tilesize) it wont be a factor either.

With unique texturing and highly detailed scenes I dont think locality has much chance of being better for compositing.

After all, if it were that simple and cheap, every PS2 game would break rendering to tiles and come with 4x4 AA

It would have needed seperate texture memory too.

That goes without saying but I'm not sure right now how often you can do it in more general purpose code.

With timestep simulations most of the work on objects can be done independently. That takes care of physics/collision-detection and AI, which pretty much covers the heavy lifting part of the code (presumably with Cell the smaller housekeeping stuff is pretty much insignificant compared to the amount of cycles which goes into those tasks and rendering).

Fafalada · Nov 7, 2003

Mfa said:
In the framebuffer data locality is obvious poorer for the compositing approach, hard to avoid with such an incredible amount of memory devoted to it. For textures the only way it would be much better is if there was a lot of repetetive use of textures. As for geometry, it depends on the average size of the projection of bounding volumes at the lowest level of the bounding volume hierarchy. If small enough (depends on the tilesize) it wont be a factor either.

Good point about FB and sizes (I forgot to mention that).
But it seems to me you got the texture situation reversed though. Assuming we're sorting into layers by geometry, repeating textures a lot will only force you to duplicate texture accesses across layers - the more unique they are, the more you keep them local to each layer.
Following on that, geometry should be completely local to layers, so at best tiles can approach that but never get quite there.

Alternatively you could sort by texture if the nature of content requires it.

With unique texturing and highly detailed scenes I dont think locality has much chance of being better for compositing.

So long as the FB sizes are not a problem (which I have to admit can be a really big issue) I would disagree (as mentioned above already).

It would have needed seperate texture memory too.

You can have them in GS mem and it still won't work well - frustrum culling brings along a ton of other CPU overhead (generation of display lists etc.) and it's not like we usually have CPU time to spare on PS2 - rather the opposite :?.
And I don't need to tell you that using a single display list and moving gross culling to VU1 won't work well either.

presumably with Cell the smaller housekeeping stuff is pretty much insignificant compared to the amount of cycles which goes into those tasks and rendering

I should hope so - but until I see it I won't presume anything on this one

Are developers ready to cope with multi-core consoles?

Similar threads