Well, Well, Xenon CPU and Cell the same??

PC-Engine said:
Yep I heard MS wanted some of their own tech from their research division in there that IBM found pretty difficult to implement, but eventually did. You won't find that tech in the PPE in CELL.
Yeah.. Maybe their input was about stuff like that (from a Xenon related patent):
More specifically, to achieve best performance, many current CPU instruction sets require a user to perform a dot product using the Structure of Arrays (SOA) approach, as opposed to the more intuitive and user friendly Array of Structure (AOS) approach. In the former approach, the operand data used to perform the dot product is loaded into appropriate registers provided by the CPU. Then, this operand data is manipulated by "rotating" it in such a manner to accommodate the SOA approach used by the CPU. Namely, to perform a multiplication of one vector by another, this SOA technique effectively turns a 1.times.4 vector on its side to provide a 4.times.1 vector. This results in an inefficient use of register capacity, as only one lane of each register is now used to store vector data. Further, the operation of rotating a vector on its side (accomplished in a so-called "swizzling" operation) requires execution cycles that are "empty" in the sense of not performing any meaningful conversion of the vector data (that is, not performing any mathematical operations on the data). Allowing programmers to keep their data in AOS format greatly simplifies optimization efforts; by contrast, SOA is at odds with natural data structure design, and Application Program Interface (API) parameter passing. Further, SOA generally complicates the programmer's use of SIMD vector math instruction usage. The logic 800 overcomes these drawbacks by using the aforesaid AOS approach. (However, the CPUs employed in the present system 100 can be configured to perform the dot product using the SOA approach too; the user is thus afforded the option of performing the dot product using either the AOS approach or the SOA approach.)
 
nAo said:
PC-Engine said:
Yep I heard MS wanted some of their own tech from their research division in there that IBM found pretty difficult to implement, but eventually did. You won't find that tech in the PPE in CELL.
Yeah.. Maybe their input was about stuff like that (from a Xenon related patent):
More specifically, to achieve best performance, many current CPU instruction sets require a user to perform a dot product using the Structure of Arrays (SOA) approach, as opposed to the more intuitive and user friendly Array of Structure (AOS) approach. In the former approach, the operand data used to perform the dot product is loaded into appropriate registers provided by the CPU. Then, this operand data is manipulated by "rotating" it in such a manner to accommodate the SOA approach used by the CPU. Namely, to perform a multiplication of one vector by another, this SOA technique effectively turns a 1.times.4 vector on its side to provide a 4.times.1 vector. This results in an inefficient use of register capacity, as only one lane of each register is now used to store vector data. Further, the operation of rotating a vector on its side (accomplished in a so-called "swizzling" operation) requires execution cycles that are "empty" in the sense of not performing any meaningful conversion of the vector data (that is, not performing any mathematical operations on the data). Allowing programmers to keep their data in AOS format greatly simplifies optimization efforts; by contrast, SOA is at odds with natural data structure design, and Application Program Interface (API) parameter passing. Further, SOA generally complicates the programmer's use of SIMD vector math instruction usage. The logic 800 overcomes these drawbacks by using the aforesaid AOS approach. (However, the CPUs employed in the present system 100 can be configured to perform the dot product using the SOA approach too; the user is thus afforded the option of performing the dot product using either the AOS approach or the SOA approach.)

Interesting isn't it.

Guess which way Sony went with SPEs? ;)

up27709.jpg
 
aaaaa00 said:
Jaws said:
aaaaa00 said:
Jaws said:
Wouldn't be a surprise at all as it's been speculated here for several months. Though the Xe cores will likely have fatter registers, 128.128 bit, and custom instructions unique to Xenon compared to the CELLs PPE.

That's not the only "special sauce" I heard about.

Well the other possible 'sauces' would be an increase in pipes for the VMX units for better sustained throughput and the 'sweetest sauce' would be that the VMX units are 2-way SMT capable themselves.

That would bring each Xenon core to 16 Flops per cycle and a tri-core to 48 Flops per cycle and @ 3GHz ~ 144 GFlops. Do you concur?

Cues Jaws music....

http://www.sharkattackphotos.com/Sounds/jawstheme.wav

HTML:
http://www.watch.impress.co.jp/game/docs/20050309/msdev01.jpg
[/img]

:LOL:
 
blakjedi said:
Vysez said:
s1lverbak said:
So it wouldnt be 3 Cores @ 3GHz, it would be 6 @ ?GHz. Right??
Nope, it would be 3 two-way cores running at +3GHz.
Jaws maths are correct (based on the datas available for the moment).

Actually in the so called "Peter Isensee leak" document it states that the XeCPU will have 128.128bit VMX registers per hardware thread... ...since the XECPU has six hardware threads I think s1lverbak is actually correct..... the Xenon has 768 - 128bit vector registers alone moving @ 3Ghz just in the CPU core... actually It has three sets of registers per hardware thread according to the article (integer, FPU and VMX) The total of which is not known... I dont know what that calculates to in terms of power though - you guys are better at extrapolating abstract power constructs like that than I am...

"Each core has two symmetric hardware threads (SMT), for a total of six hardware threads available to games. Not only does the Xenon CPU include the standard set of PowerPC integer and floating-point registers (one set per hardware thread), the Xenon CPU also includes 128 vector (VMX) registers per hardware thread. This astounding number of registers can drastically improve the speed of common mathematical operations."


http://forums.xbox-scene.com/index.php?showtopic=231928

The number of registers doesn't affect the flop rating. Having a seperate set of registers per thread is nice, but having seperate threads of execution on the VMX unit itself would be nicer, and would affect the flop rating (but this doesn't seem to be the case, but i'm very open to surprise! :)).
 
aaaaa0 said:
Interesting isn't it.
Guess which way Sony went with SPEs?
Right I only just noticed the lunacy of calling switch to SOA an "advance" too :?
Anyway personally I blame IBM for this, both Toshiba and Sony have a history of vector implementations that favored well, actually advanced SIMD ISAs. Some very recent history too.
 
s1lverbak said:
Jaws said:
VMX ~ 8 Flops per cycle
FPU ~ 2 Flops per cycle
PPE ~ VMX + FPU~ 10 Flops per cycle
SPE ~ 8 Flops per cycle

Xenon CPU core = VMX + FPU ~ 10 Flops per cycle

3 Cores ~ 30 Flops per cycle

3 Cores @ 3 GHz ~ 30* 3Ghz~ 90 GFlops



CELL CPU core = PPE + 8*SPE ~ 10 + 8*8 ~ 72 Flops per cycle

CELL @ 4Ghz ~ 72*4Ghz~ 296 GFlops

I guess I am confused...Everyone seems to be using the term "Core" pretty liberally. From the specs posted at gamespy, it says that each CPU would be able to do 2 instructions per cycle. Wouldnt that make it dual core? So it wouldnt be 3 Cores @ 3GHz, it would be 6 @ ?GHz. Right??

Yes, I know what you mean. Let me clarify,

A 'core' would be a collection of execution units e.g. vector, integer etc., control logic, cache etc. that would be considered a complete 'whole' or 'processing unit' that could exist on it's own.

Sometimes cores are used to describe individual execution units, e.g. SIMD/vector/integer/PPE/SPE/VMX/FPU etc. but they cannot exist alone.

Also a chip or IC or die, doesn't necessarily equate to a 'core' as you can have multiple cores on a slab of silicon.

And your also confusing instructions per cycle with Flops per cycle.

Instructions per cycel != Flops per cycle

You issue instructions to execution units to get these Flops. The Xenon CPU can issue two instruction per cycle per core. Because it is 2-way SMT, it can issue these 2 instructions to 2 seperate threads simultaneosly per core. So for a tri-core CPU, you can issue 6 instructions to 6 seperate threads simultaneously per cycle to attain peak Flops per cycle. In the above example, these six threads would be 3 on the VMXs and 3 on the FPUs.
 
Unless IBM have suddenly decided to go in a register window/stack type direction, I would assume that each thread can use all N (=128?) registers.

AFAICS this would makes a 32 bit per instruction VMX encoding pretty difficult seeing as a 4 op fma (3 src, 1 dst) requires 28 bits for register indexes alone...

Yeah I know about the index limit, that's why I brought up the physical vs. logical. I was pretty much guessing along the lines of register windows (ala SPARC) or register banks (ala MIPS).

Well the other possible 'sauces' would be an increase in pipes for the VMX units for better sustained throughput and the 'sweetest sauce' would be that the VMX units are 2-way SMT capable themselves.

That's not much of a sauce... Increasing "pipes" won't do yu much good if your fetch and dispatch resources can't keep them fed (and VMX is an *extension* to PowerPC, thus you're typically constrained by the LSU resources of the CPU. All VMX/AltiVec implementations so far have been 2-issue so far with the main distictive traits revolving around execution unit layout and dispatch constraints...
 
Fafalada said:
aaaaa0 said:
Interesting isn't it.
Guess which way Sony went with SPEs?
Right I only just noticed the lunacy of calling switch to SOA an "advance" too :?
Anyway personally I blame IBM for this, both Toshiba and Sony have a history of vector implementations that favored well, actually advanced SIMD ISAs. Some very recent history too.

Oh boo f'ing hoo... :p This is actually a good thing believe it or not...
 
So, i'm new in this board, just wanted to share my toughts.
I find that Pysix PPU very interesting, and i think that we may see it in next gen MS console.
The cpu from Xenon had 3 cores running at 3.5ghz(each?) but it seems that recent alpha kits(Gamespy) only have 2 cores, running at a lower speed - 3ghz - wich in my mind seems odd. They gave more power at the beggining and then took it away :?: It doesnt make sense.
Now either they found that it was enough to compete with the ps3 or they found a better solution(at least from a performance standpoint).
I believe in the later one(call me a dreamer). And i dont believe in it just because.....
Epic games is currently working on an exclusive game for Xenon, to be published by MS, from this we(at least me) can be led to believe that in the console realm, the unreal tech will be specially optimized to work with Ms next gen console, in fact, MS has announced recently that they made a Deal with Epic to license their Unreal 3 tech for all their(Ms) games for next gen console.
Now, Unreal 3 is also the 1st and only right now to announce full support for the PPU, wich makes the PPU Xna supported.
I honestly believe(i'm a noob) that its all linked. Xenon - Epic - Xna - Unreal 3 tech - PPU.
If this happens, i think it will be very cool, because PPU will give a big performance boost to the system, as it will take off the physics work from the cpu just like the GPU takes off the graphics work from the cpu.

:p
 
therealskywolf said:
The cpu from Xenon had 3 cores running at 3.5ghz(each?) but it seems that recent alpha kits(Gamespy) only have 2 cores, running at a lower speed - 3ghz - wich in my mind seems odd. They gave more power at the beggining and then took it away :?: It doesnt make sense.
Now either they found that it was enough to compete with the ps3 or they found a better solution(at least from a performance standpoint).
I believe in the later one(call me a dreamer). And i dont believe in it just because.....

Doesn't make sense? I think you're expecting something a little too solid at this point... There's a reason they're called "alpha kits" :p
 
Solidus said:
Maybe not the most on-topic question, but could the U3 engine be hardcoded in the Xenon?

It COULD be but it most likely will not. Optimized yes, hardcoded, probably not. I'm taking it that you are asking if all of the code in the U3 engine is completely Xenon specific. At first I thought you were asking if the engine could be hardcoded into the hardware :O.
 
a688 said:
Solidus said:
Maybe not the most on-topic question, but could the U3 engine be hardcoded in the Xenon?

It COULD be but it most likely will not. Optimized yes, hardcoded, probably not. I'm taking it that you are asking if all of the code in the U3 engine is completely Xenon specific. At first I thought you were asking if the engine could be hardcoded into the hardware :O.

It could very much be hardcoded . I don't see why not .

We know ati isn't going to drop the r500 tech. There will be a r600 part for the pc most likely early 06 based on this part so time spent getting down low for the r500 wont be wasted . I'm sure they will also want to use this engine for alot of games . So none of this will be wasted.

They will also write specificly for the cpu as who will liscense the engine for xenon games if it isn't ?
 
"Could" wasn't the right word for me to use. Ofcourse they can, it's a matter of "would" they hardcode the engine. Seeing as MS announced they will be using the engine for their games, it is likely.
 
GS: How long do you think it will take for developers to release games that will look and play significantly better on a PPU-enabled PC system?

TS: As with any new technology, there will be early games available that add hardware physics support into a mostly finished game design. That's the first stage, and it will give gamers the first hardware-accelerated physics support right away.

The later revolution will be in next-generation games designed with large-scale physics from the very beginning. PhysX will make that possible on the PC, while other innovations will make large-scale physics possible on next-generation game consoles. There is a great deal of synergy there, with Ageia's physics engine providing a great hardware-accelerated solution on PC (with a software physics fallback for reduced detail) and also addressing the needs of the future consoles.
 
Back
Top