*OK warning first.
Keep in mind,
This is a very rough comparison of manage code. One system is running a speed optimised compiler with a generational GC, where the 360 is running what we can assume is a size optimised non-generational GC - aka the compact framework.
*anyway*
I've just done my first half-baked performance test of XNA on both the 360 and my PC.
The results are surprising, in both good and bad ways.
Firstly I'll state that I expected XNA to run like a dog on the 360.
These are *not* accurate numbers, but give a good rough idea.
The first surprise was that (as I should have realised) there is no thread scheduling across cores. Doh. You have to manually set the thread affinity if you want decent threaded performance. *ops*. Otherwise everything runs on the default core.
Second surprise, of the 6 'cores' you have access to only 4. core 0 and 2 are restricted (for the GC I'd presume). How much of an effect this has on overall performance is an open question.
The final surprise was a lack of WaitHandle.WaitAny() support, this method wraps the native WaitForMultipleObjects() function, which will wait on several events, and return once one of them has been signalled. I had to replace this with a rough equivalent that was effectively while{ loop(signals), sleep(0) }. So this *will* have had a performance effect, but not that significant.
The program is as follows:
My tree, from the other day, contains a bunch of objects. Either 5,000 or 25,000 depending on the test.
One lower priority thread collects objects from the tree, farms them off in batches of 64 to the worker threads. They do work Then the process repeats, however the worker threads are running out of caches, and the low priority thread is applying position changes to the tree. On top of that there is rendering too, which currently ain't too quick, and blocks the updating (I'm getting there...).
So each object gets updated twice per iteration, once to calculate, once to apply changes. For both of these, I 'simulate' complex code by a large bunch of trig' in a loop. For the 25k this trig is removed. (so it's really a more single threaded test)
Anyway. Results:
My PC:
athlon64 X2 4200+. 2gbram, etc.
(results as framerate)
5,000 item:
25k (more single threaded, no fake workload)
Overall I expected the 360 to loose out. If anything I expected it to get slaughtered, but 50% of the potential performance of my A64, given it's running the compact framework (a lot less optimisation, the GC isn't generational, etc) and I'm actually quite happy.
When the 360 was new, this cpu was the same price as the hot white box, so all things considered I think it did very well.
Perhaps what surprised me the most was the significant jump when using all 4 'cores'. The performance jump *is* consistently higher than 3x with the heavy work load.
Graphics will be interesting.
Keep in mind,
This is a very rough comparison of manage code. One system is running a speed optimised compiler with a generational GC, where the 360 is running what we can assume is a size optimised non-generational GC - aka the compact framework.
*anyway*
I've just done my first half-baked performance test of XNA on both the 360 and my PC.
The results are surprising, in both good and bad ways.
Firstly I'll state that I expected XNA to run like a dog on the 360.
These are *not* accurate numbers, but give a good rough idea.
The first surprise was that (as I should have realised) there is no thread scheduling across cores. Doh. You have to manually set the thread affinity if you want decent threaded performance. *ops*. Otherwise everything runs on the default core.
Second surprise, of the 6 'cores' you have access to only 4. core 0 and 2 are restricted (for the GC I'd presume). How much of an effect this has on overall performance is an open question.
The final surprise was a lack of WaitHandle.WaitAny() support, this method wraps the native WaitForMultipleObjects() function, which will wait on several events, and return once one of them has been signalled. I had to replace this with a rough equivalent that was effectively while{ loop(signals), sleep(0) }. So this *will* have had a performance effect, but not that significant.
The program is as follows:
My tree, from the other day, contains a bunch of objects. Either 5,000 or 25,000 depending on the test.
One lower priority thread collects objects from the tree, farms them off in batches of 64 to the worker threads. They do work Then the process repeats, however the worker threads are running out of caches, and the low priority thread is applying position changes to the tree. On top of that there is rendering too, which currently ain't too quick, and blocks the updating (I'm getting there...).
So each object gets updated twice per iteration, once to calculate, once to apply changes. For both of these, I 'simulate' complex code by a large bunch of trig' in a loop. For the 25k this trig is removed. (so it's really a more single threaded test)
Anyway. Results:
My PC:
athlon64 X2 4200+. 2gbram, etc.
(results as framerate)
5,000 item:
Code:
ST MT
360: 7 26
PC: 28 51
25k (more single threaded, no fake workload)
Code:
ST MT MT*
360: 10 16 17
PC: 36 47 40
*only a single worker thread
Overall I expected the 360 to loose out. If anything I expected it to get slaughtered, but 50% of the potential performance of my A64, given it's running the compact framework (a lot less optimisation, the GC isn't generational, etc) and I'm actually quite happy.
When the 360 was new, this cpu was the same price as the hot white box, so all things considered I think it did very well.
Perhaps what surprised me the most was the significant jump when using all 4 'cores'. The performance jump *is* consistently higher than 3x with the heavy work load.
Graphics will be interesting.
Last edited by a moderator: