Is the 360 cpu limited?

pjbliverpool · Mar 14, 2007

Thanks for clearing that up 3dillentante and inefficient. It seems Xenon is more comparable in both performance and functional unit capability to an Athlon X2 than to a single core solution.

3dilettante · Mar 14, 2007

pjbliverpool said:
Thanks for clearing that up 3dillentante and inefficient. It seems Xenon is more comparable in both performance and functional unit capability to an Athlon X2 than to a single core solution.

There aren't many head to head benchmarks to be sure about that. I'd be cautious to say that for a dual-core X2. My comparison was for a single A64 core only.

The X2 still has better branch prediction, more cache overall, and would be able to tolerate less optimal code.

In heavily threaded code that is more optimized, Xenon could do better in cases where the ILP is low and the A64's AGUs are less utilized. While the AGUs are specialized units, they are used a lot for x86 code.
Equivalent code on Xenon would lean heavily on the extra int unit.

DJ12 · Mar 14, 2007

It's only a year old, I doubt it's anywhere near the limit yet.

If it's anything limited it's "Best practise" or "current knowledge".

swaaye · Mar 14, 2007

It has hit its limits in lots of apps already. Quake 4 is an example of how bad that CPU can be in some situations. They couldn't get all 3 cores going well and the result was a very poor port. Doom3 engine uses a lot of CPU power.

So, it may be very fast, but as with Cell, it will take custom tailored code for the chip to get there. Game ports probably will not be a good spot to see such efforts taking place. Just keep an eye on the exclusive titles coming up for the best it can do.

crystall · Mar 15, 2007

3dilettante said:
Perhaps in unit count, but not in functionality.
Since Xenon has an additiona int unit, there is a total of six full integer units for three cores.
A64 has three int units and three AGUs, which limits the full range they can be applied to.

Each Xenon core has only a fixed-point unit (split into an ALU/logic pipe and a complex/mul pipe but it can issue only one instruction per cycle to it) and an adder in the L/S unit but not two separate FX units IIRC.

For scalar FP, each VMX unit can issue one math op and one memory op.
A64 can issue one ADD + MUL + MEM.
Over three cores, Xenon can manage 3 math and 3 store ops.
A64 in one core could handle a max of 3 ops of the prescribed mix, period.

The picture is more complex here IMHO. Xenon can issue one 128-bit SPFP multiply-add operation per cycle to the vector unit plus another different vector op (probably a permute as it is common in other IBM designs) or another unrelated instruction (FX, L/S, CR). Current A64 cores can issue one 128-bit wide SPFP multiply and one 128-bit wide SPFP add operation every two cycles but can issue freely a third instruction every cycle (for example a L/S one).

The load/store unit on A64 can handle two ops. I don't know about Xenon, but if each core's load/store can only handle one op, it's still more than A64.

It's a mixed bag, the A64 L/S pipe can handle two 64-bit ops per cycle but not two 128-bit ops per cycle whilst Xenon's core can do one 128-bit op per cycle.

Acert93 · Mar 15, 2007

swaaye said:
It has hit its limits in lots of apps already. Quake 4 is an example of how bad that CPU can be in some situations. They couldn't get all 3 cores going well and the result was a very poor port. Doom3 engine uses a lot of CPU power.

So, it may be very fast, but as with Cell, it will take custom tailored code for the chip to get there. Game ports probably will not be a good spot to see such efforts taking place. Just keep an eye on the exclusive titles coming up for the best it can do.

I think what you are talking about there is "application limited" and not hardware limited. Even if Doom 3 / Q4 required a lot of CPU power and ran poorly on the 360 doesn't say the hardware was a limiting factor to performance but only that their code was. You see Q4 as an example of how bad the CPU is in some situations, I think what we know tells us that it shows how rushing a title out the door to hit the launch with limited time on development kits and quickly ported code can make a mess. This is not a new phenomena, games have been ported from slower systems to much faster systems but have performed poorly due to poor execution. This isn't like transfering a game from an Intel to AMD based chip where they are both performing on the same basic platform. The entire structure of the platform changes and takes weeks, of not months, typically to even get your code up and running and not seconds like a CPU swap in a PC.

ERP · Mar 15, 2007

I was going to avoid responding to this thread, but since it isn't a flame war....

It's a somewhat pointless question for any console. It is what it is, analasis of parts outside the context of the whole is meaningless at some level.

As a dev I decide how CPU heavy my app iss, I decide how GPU heavy it is, I can make any system CPU or GPU limited and make it look like shit doing it.

As has been pointed out some ports run badly, mainstream PC's are generally very CPU biased (we still have to support 5X00 card on the game I'm working on), so PC games tend to put more stress on the CPU to alleviate batch overhead and general GPU load. Porting a single threaded app that stresses the CPU to either X360 or PS3 is not going to gve you the best of results without a LOT of work.

Most apps are actually CPU and GPU limited at different points in a frame, if your careful and don't force synchronisation, then it can all work out. However it's fairly common to either accidentally or deliberately force synchronisation during a frame, which can waste a lot of performance on either side of the fence.

There isn't one way to submit geometry or organise your data that's best it's all dependant on what you want to do. If I can submit all my geometry and do all my AI in 1ms and have to wait 15ms for the next frame I should be preprocessing it to help out the GPU. If I'm spending 20ms computing occlusion results on the CPU, and my GPU renders the final scene in 5ms I should probably forget the occlusion culling or significantly simplify it and let the GPU do it.

You can't easilly make tradeoffs like this on a PC because of no fixed configuration, and it's one of the reasons that console games tend to push hardware harder.

Clockwork · Mar 15, 2007

Joshua Luna said:
I think what you are talking about there is "application limited" and not hardware limited. Even if Doom 3 / Q4 required a lot of CPU power and ran poorly on the 360 doesn't say the hardware was a limiting factor to performance but only that their code was. You see Q4 as an example of how bad the CPU is in some situations, I think what we know tells us that it shows how rushing a title out the door to hit the launch with limited time on development kits and quickly ported code can make a mess. This is not a new phenomena, games have been ported from slower systems to much faster systems but have performed poorly due to poor execution. This isn't like transfering a game from an Intel to AMD based chip where they are both performing on the same basic platform. The entire structure of the platform changes and takes weeks, of not months, typically to even get your code up and running and not seconds like a CPU swap in a PC.

Not to mention a complete engine port. If I'm mistaken the original game used OpenGL which the 360 version obviously does not....

Please correct me if I am mistaken.

(similarly, I noticed a similar issue with Prey which coincidently also used the Doom 3 engine)

The_legend_of_drtre · Mar 21, 2007

jayco said:
Is the 1MB of L2 Cache enough for the three G5 cores?

Considering the Gamecube has a 512k L2 cache and was only single core... I'd say no.

And as stated above, those aren't 3 G5 level cores.

Zeross · Mar 21, 2007

The_legend_of_drtre said:
Considering the Gamecube has a 512k L2 cache and was only single core... I'd say no.

The Gekko has only 256Ko of L2 cache

The_legend_of_drtre · Mar 21, 2007

Zeross said:
The Gekko has only 256Ko of L2 cache

Yup you are right.

V-G · Mar 21, 2007

Just wanted to add that all this cache size talk means little to nothing if we do not look at the latancy of the RAM too.

Megadrive1988 · Mar 22, 2007

Joshua Luna said:
I think what you are talking about there is "application limited" and not hardware limited. Even if Doom 3 / Q4 required a lot of CPU power and ran poorly on the 360 doesn't say the hardware was a limiting factor to performance but only that their code was. You see Q4 as an example of how bad the CPU is in some situations, I think what we know tells us that it shows how rushing a title out the door to hit the launch with limited time on development kits and quickly ported code can make a mess. This is not a new phenomena, games have been ported from slower systems to much faster systems but have performed poorly due to poor execution. This isn't like transfering a game from an Intel to AMD based chip where they are both performing on the same basic platform. The entire structure of the platform changes and takes weeks, of not months, typically to even get your code up and running and not seconds like a CPU swap in a PC.

well said.

archie4oz · Mar 22, 2007

V-G said:
Just wanted to add that all this cache size talk means little to nothing if we do not look at the latancy of the RAM too.

Or even just cache latencies... I mean the PPE has a pretty abysmal cache latency (the L2 macros run at 1/2 clock speed).

Personally I didn't quite get MS's decision for going that route (makes more sense w/Cell).

swaaye · Mar 22, 2007

Joshua Luna said:
I think what you are talking about there is "application limited" and not hardware limited. Even if Doom 3 / Q4 required a lot of CPU power and ran poorly on the 360 doesn't say the hardware was a limiting factor to performance but only that their code was. You see Q4 as an example of how bad the CPU is in some situations, I think what we know tells us that it shows how rushing a title out the door to hit the launch with limited time on development kits and quickly ported code can make a mess. This is not a new phenomena, games have been ported from slower systems to much faster systems but have performed poorly due to poor execution. This isn't like transfering a game from an Intel to AMD based chip where they are both performing on the same basic platform. The entire structure of the platform changes and takes weeks, of not months, typically to even get your code up and running and not seconds like a CPU swap in a PC.

Yeah I should've been a bit more explicit in what I said.

I've said before that ports are not going to work well between PC, PS3, and 360. Ports rarely show off each platform as well as a title that is tuned to a specific machine.

The CPUs especially are just so different. And then there's the big fundamental challenge of actually taking good advantage of multiple CPU cores in games. On the PC, individual cores are so powerful already and ready for non-optimal code that it's not as big of an issue if the devs don't pull it off well. And, there's the need to program "smarter" and cleaner for the in-order cores. And to best manage the limited RAM resource and each machine's unique RAM layout.

With Quake 4 And Prey we definitely saw this application-limited scenario. The potential performance of the consoles is mind blowing at their price levels, but the range of possible performance is huge and the bottom of that range isn't so hot.

BadTB25 · Aug 11, 2007

Now that the Xbox360 has been out a while, is there any advantage to the WMX units units compared to cell?

"Nerve-Damage" · Aug 11, 2007

BadTB25 said:
Now that the Xbox360 has been out a while, is there any advantage to the WMX units units compared to cell?

Err, VMX units.

Graham · Aug 11, 2007

A while ago, I posted a thread on XNA performance.

Now take this with a quarry of salt, because there are millions of factors at play (primary one being I don't have access to a 360 devkit... yet). - However given the 360's compact framework has very limited floating point optimisation, has a non-generational garbage collector, very limited (if any) inlining, no ability do automatic pass by ref instead of pass by value (sortof like inlining), and only gives you access to 4/6 threads... I still managed to get over half the performance of my dual core X2. And I was hardly touching the XNA maths libs, which are apparently optimised to buggery boo.

So assuming the 360's CF is 75% efficient compared to the windows framework (which is crazy generous), and the 2/3 threads, you are looking at something that is *faster* than my X2.
Which at the time I got it was more expensive than the 360

So no, I don't think it's cpu limited. I think it's a very good console cpu. Though it does mean that for single threaded stuff, XNA performance is pants

[EDIT]

ahh. Didn't realise this was a resurrected thread sorry

Carl B · Aug 11, 2007

Yeah, exactly - necromancy alert!

BadTB25 normally raising a thread from the dead is frowned upon... if you'd like to start a new thread on the topic, feel free.

Is the 360 cpu limited?

pjbliverpool

B3D Scallywag

3dilettante

DJ12

swaaye

Entirely Suboptimal

crystall

Acert93

Artist formerly known as Acert93

ERP

Clockwork

The_legend_of_drtre

Zeross

The_legend_of_drtre

V-G

Megadrive1988

archie4oz

ea_spouse is H4WT!

swaaye

Entirely Suboptimal

BadTB25

"Nerve-Damage"

Graham

Hello :-)

Carl B

Friends call me xbd

Similar threads