Why would xenon be so hard to program for?

blakjedi

Veteran
Three symmetrical cores... if you can program (threads) for one you can do it for all...

is it switching moving instructions from one thread to another?

Help me understand... it seems like a no brainer... BTW has anyone ever seen the Xenon?
 
blakjedi said:
Three symmetrical cores... if you can program (threads) for one you can do it for all...

is it switching moving instructions from one thread to another?

Help me understand... it seems like a no brainer... BTW has anyone ever seen the Xenon?

Well, beyond just the basic difficulties in writing multithreaded code (and writing distributed algorithms in general), there will likely be unique constraints to the xenon design that might make it difficult in certain ways to extract performance. The cache configuration has been a rather hot topic for example.

Having said that, I really think developers are mostly bitching about having to finally get around and write multithreaded engines. They've avoided it for years, and now it's finally coming home to roost. It will really seperate the weak development houses from the strong ones. I imagine a lot of studios are going to end up contracting engines rather than attempting to write their own. Certainly it seems the UE3 engine is quite popular this generation.

Nite_Hawk
 
Before the alpha kits went out, is JC the only one who has even attempted multithreaded code in a game engine (commercial release, not just for internal testing purposes)?
 
Alstrong said:
Before the alpha kits went out, is JC the only one who has even attempted multithreaded code in a game engine (commercial release, not just for internal testing purposes)?

This is far from true.

Multithreading is a common tool in game engines to make operations asynchronous. The difference is that on platforms with multiple cores you are trying to balance workload, not just hide operation latencies.

Multithreaded code is difficult to get right, and can be hideous to debug with large numbers of threads. They are also extremly difficult to test, since man of the possible bugs are timing related.
 
I meant in games where we would actually see performance increases by having more than one processor... It was demonstrated in Quake III, but I haven't seen any other game in the same situation...

:?:
 
ERP said:
Multithreading is a common tool in game engines to make operations asynchronous. The difference is that on platforms with multiple cores you are trying to balance workload, not just hide operation latencies.

Multithreaded code is difficult to get right, and can be hideous to debug with large numbers of threads. They are also extremly difficult to test, since man of the possible bugs are timing related.

Interjecting a few geek terms into the metaphor, it sounds as if those hapless developers are paddling canoes across an angry, multiprocessing ocean that is being stirred by the eye of a parallel storm.

Now in these grizzly conditions, allocating (and protecting) resources must be mind-boggling. :cry:
 
blakjedi said:
Three symmetrical cores... if you can program (threads) for one you can do it for all...

is it switching moving instructions from one thread to another?

Help me understand... it seems like a no brainer... BTW has anyone ever seen the Xenon?

Well it would have been easier to program if there was a multithread friendly language such as Java for early games developpement.
 
Java_man said:
Well it would have been easier to program if there was a multithread friendly language such as Java for early games developpement.

How would you multithread zipping one File in Java?

The problem aint multithreading, thats a pretty simple concept. The problem is taking your code and actually split it into as-much-independend-as-possible Parts. The problem is at a much higher level than C++ or Java - its a design problem. For serial Problems (as like zipping since its dependand on previous data ) its nontrivial or even impossible.
 
MS' console diagram made it look so conceptually easy.


Essentially I think of it as

Processor 0/Thread 0 (P0/T0) equals game code,

P0/T1 equals AI

P1/T0 equals physics

P1/T1 equals geometry

P2/T0 equals sound

P2/T1 equals extra....
 
To blakjedi: that would be hard to cordinate, keeping the workload and basicaly that is divid the threads in smaller independents one, what they want or will do is multithread physics, multithread AI, multithread...

At least that is want I understand from what I have been hearing, please correct me :!:
 
MS' console diagram made it look so conceptually easy.

Essentially I think of it as
Processor 0/Thread 0 (P0/T0) equals game code,
P0/T1 equals AI
P1/T0 equals physics
P1/T1 equals geometry
P2/T0 equals sound
P2/T1 equals extra....
You cannot multithread the outer loop of your game but for a few minor things (like ambient sound and effects playback). Unless you want rendering to lag behind physics for a frame, you can't run physics and "geometry" in parallel. Physics tells you where everything is... AI tells you where the bots are. The rather nebulous term of "game code" will include telling you about positions of player, camera, what animations will be playing, sound/graphical effects that should be triggered at some point, messages passed around from object to object and machine to machine.

If you are all right with letting certain computations lag behind others, you have to be sure to buffer off several copies of world states. Object transforms, bounding boxes, no big deal. Bone transforms of each and every skinned object... ha... And if anything, accessing memory on Xenon can be considered a capital offense punishable by 13 forms of torture.
 
ShootMyMonkey said:
You cannot multithread the outer loop of your game but for a few minor things (like ambient sound and effects playback). Unless you want rendering to lag behind physics for a frame, you can't run physics and "geometry" in parallel. Physics tells you where everything is... AI tells you where the bots are. The rather nebulous term of "game code" will include telling you about positions of player, camera, what animations will be playing, sound/graphical effects that should be triggered at some point, messages passed around from object to object and machine to machine.

If you are all right with letting certain computations lag behind others, you have to be sure to buffer off several copies of world states. Object transforms, bounding boxes, no big deal. Bone transforms of each and every skinned object... ha... And if anything, accessing memory on Xenon can be considered a capital offense punishable by 13 forms of torture.

As I was typing it, I know that you're correct... however I was pointing out more of a P0/T1 would be the standard thread for all AI, P1/T0 would be the standard thread for all physics etc...

Unless its just more efficient to keep it all in the same thread...
 
Last edited by a moderator:
just a remark on that treads load distribution - due to the 'hyperthreading' nature of those cores you'd want to pair time-critical with non-time-citical threads (or non-citical with non-critical), so that the lower priority thread wouldn't get into the way of the higher priority thread while smt-ing. actually the more critical one of those threads is the greater the difference in priorities you want. you may also want to consider memory access patters across the threads running on a single core and make sure they play nice to each other's caches.
 
Last edited by a moderator:
Trying to explain this differently.

The problem is that say Physics and AI are not independant problems.
It's not that they can't be, but in most games they are not. It takes time to design a system that allows them to be seperated and there is a cost associated with seperating them.

There is an interesting paper on memory allocation for parallel systems, where they run a web application on multicore sparc servers. The app never runs faster than 1.2x the sequential app and above 3 or 4 processors actually starts to slow down.
The reason in this case is that it does a lot of memory allocations, and the locks on the call to malloc basically serialise the app, all the threads spend their time basically blocked
waiting for memory. In a multithreaded environment this is potentially an issue on any shared resource. So you have to identify them and minimise potential conflicts to get good performance.

Saying I will run physics over here and AI over there is easy, actually doing it in a real app efficiently isn't.
 
It seems like multi-threaded middleware is going to be the key here.

I mean there's no reason each dev should have to reinvent the wheel each time they goto tackle the problem of multithreading geometry and physics(for example), right?

Doesn't it make sens to have game engines that are built to be multi-threaded, where the designers of these engines will tackle each multi-threading issue and devise workarounds as best they can?

Then developers could take these engine's and supplement them with their own tricks/knowledge, without having to rethink every problem from the ground up.
 
darkblu said:
just a remark on that treads load distribution - due to the 'hyperthreading' nature of those cores you'd want to pair time-critical with non-time-citical threads (or non-citical with non-critical), so that the lower priority thread wouldn't get into the way of the higher priority thread while smt-ing. actually the more critical one of those threads is the greater the difference in priorities you want. you may also want to consider memory access patters across the threads running on a single core and make sure they play nice to each other's caches.

My theory on this is you want to pair the same work on both threads. i.e. Say have two vertex transform threads on one CPU. That way you get the best possible cache coherence and you use up some of the cycles lost ton instruction and memory latency.

Of course the problem is in theory that's all well and good, but in practice you have to measure everything on a parallel system, because you get none obvious interference.

Referencing a piece of data on 1 CPU that is in the same cache line as a piece of data on a 2nd CPU will eject the cacheline from the firsts L1 cache. If Xenon didn't have shared L2 it would eject it from the L2 aswell.

All you can really do is theorise try and measure.

Often on a multithreaded system the fastest solution to A may not be the fastest solution when paired with doing B.
 
As far as multi-threaded game engines go, don't forget that basically all PS2 games are multi-threaded (EE has three processor cores). The better Saturn games were multi-threaded too (Saturn had two CPUs). Compared to either of those examples programming X360 games will be simple!
 
ERP said:
My theory on this is you want to pair the same work on both threads. i.e. Say have two vertex transform threads on one CPU. That way you get the best possible cache coherence and you use up some of the cycles lost ton instruction and memory latency.

yes, that would work as well, as long as you're careful not to throw more work at a single core than the latter's practical capacity (tm) - ie. if a single threaded job on this core does work amount X, and you know that it's at say, 80% core efficiency, you can throw an extra 20% of X at this core in a sibling thread and if eveything goes well you can expect you'll get your job done on time at a bonus. but surely it si easier said than done and does take much tuning. that's why i'm inclined to belive that many devs will prefer to pair demaning threads with threads that actually sleep most of the time and/or do a modest amount of memory accesses for a quick and cheap gain.
 
As far as multi-threaded game engines go, don't forget that basically all PS2 games are multi-threaded (EE has three processor cores). The better Saturn games were multi-threaded too (Saturn had two CPUs). Compared to either of those examples programming X360 games will be simple!
Well, I also picture CELL being used in a pretty similar way to VUs. Namely that you'll multithread off the individual independent repetitive tasks like skinning verts or performing line-of-sight tests. In essence, while it's not impossible to do things like make AI independent of physics (for the current frame, anyway), it's a lot easier to find individual things within the loop that can be multithreaded, than to adjust the design on a larger scale.

Well, it'll be a change in the fact that vector instrinsics won't entirely be dogs, trinaries will be spread about everywhere where once was an if-then, certain keywords which were meaningless before will actually be useful (restrict, register), and for crying out loud, maybe, just maybe, malloc/new will actually align pointers correctly!!!
 
Alstrong said:
I meant in games where we would actually see performance increases by having more than one processor... It was demonstrated in Quake III, but I haven't seen any other game in the same situation...

:?:
Dungeon Siege increased in performance with 2 slow CPUs. In my case I had 2 P3 733's and there wasn't a huge performance increase, but there was an increase. Most importantly the minimum frame rate was increased.
 
Back
Top