arstechnica part 2 : inside the xbox 360

Actually it's pretty easy to design a chip to do 115GFLOPS double precision very well, it's just that the XeCPU doesn't need the precision.

For example the Clearspeed CSX600 can do 50GFLOPS double OR single peak at only 250MHz and only consumes 5W of power. :LOL:

Imagine a quad core CSX600 runing at only 250MHz and consuming 20W while having 200GFLOPS peak of double OR single precision. ;)
 
aaaaa00 said:
At any rate, Playstation 3 fanboys shouldn't get all flush over the idea that the Xenon will struggle on non-graphics code. However bad off Xenon will be in that department, the PS3's Cell will probably be worse. The Cell has only one PPE to the Xenon's three, which means that developers will have to cram all their game control, AI, and physics code into at most two threads that are sharing a very narrow execution core with no instruction window. (Don't bother suggesting that the PS3 can use its SPEs for branch-intensive code, because the SPEs lack branch prediction entirely.) Furthermore, the PS3's L2 is only 512K, which is half the size of the Xenon's L2. So the PS3 doesn't get much help with branches in the cache department. In short, the PS3 may fare a bit worse than the Xenon on non-graphics code, but on the upside it will probably fare a bit better on graphics code because of the seven SPEs.

Major Nelson vindicated? ;)

That of course is a lie. Cell does have branch prediction (predicts not taken automatically, can also use branch hinting). While the cell will suffer its not as bad as some people make out.
 
PPE has branch prediction, SPEs haven't branch prediction, just branch hints.
I wonder if branch hints will be useful for something more than static loops.
 
PC-Engine said:
Actually it's pretty easy to design a chip to do 115GFLOPS double precision very well, it's just that the XeCPU doesn't need the precision.

For example the Clearspeed CSX600 can do 50GFLOPS double OR single peak at only 250MHz and only consumes 5W of power. :LOL:

Imagine a quad core CSX600 runing at only 250MHz and consuming 20W while having 200GFLOPS peak of double OR single precision. ;)
:rolleyes: It has the same external bandwith of a 1999 PS2, 6kb per processor
(all its processors together have half the sram of ONE SPE :) ), a 128 bytes register file per processor (SPEs have a 3.4 kb octal ported register file) and it's internal bandwith it's a joke.
Of course it consumes so little power.. :eek:
I really doubt this thing can approach 50 DP Gigaflop/s (or even 50 SP Gigaflop/s for what is worth) in anything but a very restricted range of applications.
 
nAo said:
PPE has branch prediction, SPEs haven't branch prediction, just branch hints.
I wonder if branch hints will be useful for something more than static loops.

The SPEs will use til hint bit to speculate one way or the other in a conditional branch, if the condition hasn't been produced yet. So the hint can make a difference in tight if-then-else constructs.

The SPEs seems to have a branch bubble of 8 cycles on taken branches, even if they are speculated (or decided) correctly. Branches are very much a thing to avoid in the SPEs

Cheers
Gubbi
 
PC-Engine said:
Actually it's pretty easy to design a chip to do 115GFLOPS double precision very well, it's just that the XeCPU doesn't need the precision.

For example the Clearspeed CSX600 can do 50GFLOPS double OR single peak at only 250MHz and only consumes 5W of power. :LOL:

Imagine a quad core CSX600 runing at only 250MHz and consuming 20W while having 200GFLOPS peak of double OR single precision. ;)
So what do you think the possibility is that Microsoft was using Double Precision GFLOPs as a performance metric instead of the typical Single Precision? I know it is possible, but I just lack any evidence right now to prove that possibility. We are still missing a lot of information it seems... those VMX128 units are obiviously more than just additional registers. I am still under the impression that people are getting confused over the DOT products between the PS3 and the XBox360 (including myself), but I guess I am going to have to chalk it into the same bucket as using FLOPs and Shader Operations as a performance metric.

A second note though... if there is indeed 2 VMX128 units per core, as that article suggest, instead of the 1 VMX128 unt per core that was listed with the specification list released by Microsoft... would that not effectively double their floating point performance? I need to verify that claim by that article...

The GameMaster...
 
The GameMaster said:
PC-Engine said:
Actually it's pretty easy to design a chip to do 115GFLOPS double precision very well, it's just that the XeCPU doesn't need the precision.

For example the Clearspeed CSX600 can do 50GFLOPS double OR single peak at only 250MHz and only consumes 5W of power. :LOL:

Imagine a quad core CSX600 runing at only 250MHz and consuming 20W while having 200GFLOPS peak of double OR single precision. ;)
So what do you think the possibility is that Microsoft was using Double Precision GFLOPs as a performance metric instead of the typical Single Precision? I know it is possible, but I just lack any evidence right now to prove that possibility. We are still missing a lot of information it seems... those VMX128 units are obiviously more than just additional registers. I am still under the impression that people are getting confused over the DOT products between the PS3 and the XBox360 (including myself), but I guess I am going to have to chalk it into the same bucket as using FLOPs and Shader Operations as a performance metric.

A second note though... if there is indeed 2 VMX128 units per core, as that article suggest, instead of the 1 VMX128 unt per core that was listed with the specification list released by Microsoft... would that not effectively double their floating point performance? I need to verify that claim by that article...

The GameMaster...

What exactly would be the point of having fast double precision? Why spend the resources to get it rather than improving your singile precision performance for a game console?

Nite_Hawk
 
The GameMaster said:
I am still under the impression that people are getting confused over the DOT products between the PS3 and the XBox360 (including myself), but I guess I am going to have to chalk it into the same bucket as using FLOPs and Shader Operations as a performance metric.

It's pretty easy to derive. MS has even given us their figures in that regard (9bn per sec). In general, 4-component vector dot product = 4 multiplies and 3 additions. 7 floating point ops. A VMX unit or SPE can do 8 per cycle, so effectively one dot product per cycle per vmx unit or SPE (with a flop to spare, if you can use it ;)). All these figures work out pretty transparently:

X360: 1 VMX per core * 3 * 8 flops = 24 flops per cycle * 3.2Ghz = 76.8Gflops per second / 8 flops per dot product = 9.6bn dot products per second

PS3: 1 VMX unit * 8 flops + 7 SPEs * 8 flops = 64 flops per cycle * 3.2Ghz = 204.8Gflops per second / 8 flops per dot product = 25.6bn dot products per second

Of course, we're ignoring FPU's contributions, simply because you wouldn't be using them for dot products.

The GameMaster said:
A second note though... if there is indeed 2 VMX128 units per core, as that article suggest, instead of the 1 VMX128 unt per core that was listed with the specification list released by Microsoft... would that not effectively double their floating point performance? I need to verify that claim by that article...

A VMX unit has multiple execution units inside. They're not talking about seperate VMX units.

MS isn't understating their spec here.
 
The MS specs say 1 VMX unit per core. 115 Gflops could be achieved with 2 FP units but not with two VMX units: 3,2 GHz * 2 ( * 2 FPu ) * 8 ( * 1 VMX ) * 3 cores = 115 GFflops
 
Love_In_Rio said:
The MS specs say 1 VMX unit per core. 115 Gflops could be achieved with 2 FP units but not with two VMX units: 3,2 GHz * 2 ( * 2 FPu ) * 8 ( * 1 VMX ) * 3 cores = 115 GFflops


only 1 fpu , not?
 
nAo said:
PC-Engine said:
Actually it's pretty easy to design a chip to do 115GFLOPS double precision very well, it's just that the XeCPU doesn't need the precision.

For example the Clearspeed CSX600 can do 50GFLOPS double OR single peak at only 250MHz and only consumes 5W of power. :LOL:

Imagine a quad core CSX600 runing at only 250MHz and consuming 20W while having 200GFLOPS peak of double OR single precision. ;)
:rolleyes: It has the same external bandwith of a 1999 PS2, 6kb per processor
(all its processors together have half the sram of ONE SPE :) ), a 128 bytes register file per processor (SPEs have a 3.4 kb octal ported register file) and it's internal bandwith it's a joke.
Of course it consumes so little power.. :eek:
I really doubt this thing can approach 50 DP Gigaflop/s (or even 50 SP Gigaflop/s for what is worth) in anything but a very restricted range of applications.

Don't be so quick to discount it considering each CSX600 looks to be more powerful than each SPE, yet running at a fraction of the clockspeed and consuming a fraction of power. BTW you're right it's designed for all types of phsyics calculations which makes it ideal as a coprocessor inside of a CPU. ;)

The CSX600 has a 128-byte register file for each PE and there are 96 PEs Thos 96 execution units also gives it 576kb of SRAM total.
 
Could it be that in the end, games on both consoles offer big improvement in graphics but not as much improvement in AI or physics over current games?

Would belie KK's claim about the PS3 expanding computer entertainment beyond just upgraded graphics.
 
version said:
Love_In_Rio said:
The MS specs say 1 VMX unit per core. 115 Gflops could be achieved with 2 FP units but not with two VMX units: 3,2 GHz * 2 ( * 2 FPu ) * 8 ( * 1 VMX ) * 3 cores = 115 GFflops


only 1 fpu , not?

115Gflops suggests 2, or some funky math at least to arrive at that figure. 1 FPU would be 96Gflops - the other 19Gflops have to come from somewhere, and a second FPU fits the bill ;)

wco81 said:
Could it be that in the end, games on both consoles offer big improvement in graphics but not as much improvement in AI or physics over current games?

It's already been discussed, but I don't agree with Hannibal's suggestions about physics on SPEs or X360's cores. AI is another matter, and that really depends on what you're doing. As far as physics go, we've already had AGEIA and Epic touting Cell's suitability for physics at least.

Neither of these chips primary work is going to be graphics related. I certainly don't think there's enough headroom on X360 to allow it to spend a whole lot of time on procedural vertex work and such unless you want to hold everything else to a single core. Cell has a bit more headroom, and Sony has openly discussed Cell/RSX collaboration on graphics, but obviously the chip needs to be doing physics, AI and the like, like any other CPU. If it's good at physics, you may as well let it rip on that. Of course, it all depends on the priorities of your game..if you want something that looks truly "standout", perhaps you'd allocate more resources to graphics work on the CPU versus other areas.
 
PC-Engine said:
Actually it's pretty easy to design a chip to do 115GFLOPS double precision very well, it's just that the XeCPU doesn't need the precision.

For example the Clearspeed CSX600 can do 50GFLOPS double OR single peak at only 250MHz and only consumes 5W of power. :LOL:

Imagine a quad core CSX600 runing at only 250MHz and consuming 20W while having 200GFLOPS peak of double OR single precision. ;)

meh, if you want double precision FP you should wish that Compaq didn't axe the Alpha
http://systems.cs.colorado.edu/ISCA2002/FinalPapers/X/EspasaR_Tarantula_final.pdf

of course putting an ev8 and a ev9 into a gaming console might be a bit difficult, since the ev8 alone was said to consume around 200 watts and be 420 mm2 at 130nm.. :oops:
 
Alpha was so awesome... Intel finally succeeded in getting it killed off for good though.
 
Rune said:
PC-Engine said:
Actually it's pretty easy to design a chip to do 115GFLOPS double precision very well, it's just that the XeCPU doesn't need the precision.

For example the Clearspeed CSX600 can do 50GFLOPS double OR single peak at only 250MHz and only consumes 5W of power. :LOL:

Imagine a quad core CSX600 runing at only 250MHz and consuming 20W while having 200GFLOPS peak of double OR single precision. ;)

meh, if you want double precision FP you should wish that Compaq didn't axe the Alpha
http://systems.cs.colorado.edu/ISCA2002/FinalPapers/X/EspasaR_Tarantula_final.pdf

of course putting an ev8 and a ev9 into a gaming console might be a bit difficult, since the ev8 alone was said to consume around 200 watts and be 420 mm2 at 130nm.. :oops:

Cool!! :oops: Thanks for posting that. Now I know where that Tarantula rumor came from. :devilish:

BTW it shouldn't be a problem at 90nm. 8)
 
Nao : unless he means the "loop" from fetch to FX2 ... but in that case I think he missed the fact that branch hints can be stored well before the actual branch.

For short loops you just put in a static prediction before starting the loop.
 
nAo said:
Gubbi said:
The SPEs seems to have a branch bubble of 8 cycles on taken branches, even if they are speculated (or decided) correctly. Branches are very much a thing to avoid in the SPEs
I believe that guy has confused SPEs with PPE.

I believe you're right.

However page 5 in here say that a branch has a 4 cycle latency, plus fetch, decode, dependency check and issue, so still around 6-8 cycles.

Does anyone have hard info ?

Cheers
Gubbi
 
As far as I can tell, all that the "multithreading" support in Novodex does is pull the physics code out of the application thread, and into its own (single) thread.
But regardles of how API works, isn't their hardware still supposed to be a bunch or paralelized units that works in a setup that is simillar to SPEs in the Cell?
 
Back
Top