Most games are 80% general purpose code and 20% FP...

Interesting stuff.

I am wondering what the Aegia fellow meant though.

Would Cell out perform PC with a single-core-single chip CPU and a PPU or would also out perform a dual core or dual chip system with a PPU?

If the latter is true that is rather impressive IMO. If Cell can still best an X2 or Operton or Pentium D which are still very good a general purpose code (this is branching and OOO execution I presume) but are still parallel processing capable and a PPU which is all specialized silicon for physics interactions then that looks impressive to me.

Isn't a PPU composed of about half the trannies the trannies in Cell and how much would a dual core consume of the remaining budget if not all and some?

I also wonder if the fellow meant Cell could do this while also doing other tasks or in the sense of the Cell vs. a PC in physics exclusive battle?

Not to demean the Cell chip or anything as I do think that it is powerful and in any respect this is impressive if true. Just looking for the correct frame in which to view this fellows statements.
 
I was thinking maybe he meant that Cell could be used for more than just physics. The PPU could close the physics gap, but there would still be other things that the Cell processor could destroy a PC in, like ray tracing.
 
Fox5 said:
I was thinking maybe he meant that Cell could be used for more than just physics. The PPU could close the physics gap, but there would still be other things that the Cell processor could destroy a PC in, like ray tracing.

Ha that made my day! :devilish:
 
Fox5 said:
I was thinking maybe he meant that Cell could be used for more than just physics. The PPU could close the physics gap, but there would still be other things that the Cell processor could destroy a PC in, like ray tracing.

Perhaps, but why would the Aegia fellow be concerned with ray-tracing? It would seem out of place for him to investigate this...unless ray tracing has some purpose in physics I am unaware of.

It has been my thinking that the PPU would level the playing field as far as physics was concerned but that seems to not be the case as it seems Cell may have the advantage in a knock down drag out fight. What I expect is true however is that the Cell will never be doing only physics where the PPU will be. If I am correct then a level playing field is probably still what it to be expected with respect the PC and PS3 games with respect to physics interactions.

I simply have a hard time believing Cell can do PPU level physics and a plethora of other tasks at the same time. I will believe that it is far more flexible in even the types of interactions it can do while doing other things.

If Cell can put PPU level physics in a game...not only that but better in what a PPU and a dual-core Cpu can do in a PS3 game while still handling intense AI, and varied speculated tasks it can handle with respect to rendering all at the same time....that is most impressive. Scary even.

I don't mean it in a bad way but I'm a bit skeptical of this actually being the case, while I have no problem believing if Cell had no other task but physics that it's 7SPE's+PPU could best 2 cores (which aren't float monsters) + a part with half the trannies of cell all dedicated to doing physics...that I can believe but I wouldn't take it to extremes at this point either.
 
While nobody (who isn't under NDA) knows the specs of that AEGIS PPU, if you were to design a custom chip to do physics (and looking at the number of transistors used by AEGIS), you would probably end up with something like half a Cell. Possibly with an ARM CPU core instead of a PPC, and some (4?) big SIMD pipes, for streaming data. Mostly because physics is a pretty broad thing, so a more general purpose approach would probably work better than a hugely specialized one, and would be much easier to design.
 
DiGuru said:
While nobody (who isn't under NDA) knows the specs of that AEGIS PPU, if you were to design a custom chip to do physics (and looking at the number of transistors used by AEGIS), you would probably end up with something like half a Cell. Possibly with an ARM CPU core instead of a PPC, and some (4?) big SIMD pipes, for streaming data. Mostly because physics is a pretty broad thing, so a more general purpose approach would probably work better than a hugely specialized one, and would be much easier to design.

From what I've heard, the PPU isn't suitable for physics calculations outside of games, which would indicate that it probably isn't as general purpose or precise as Cell.
 
Fox5 said:
From what I've heard, the PPU isn't suitable for physics calculations outside of games, which would indicate that it probably isn't as general purpose or precise as Cell.

No, it just means that they use a specific API (Novodex). Nothing is preventing you to program it directly, but without the tools and documentation that would be hard to do.
 
DiGuru said:
Fox5 said:
From what I've heard, the PPU isn't suitable for physics calculations outside of games, which would indicate that it probably isn't as general purpose or precise as Cell.

No, it just means that they use a specific API (Novodex). Nothing is preventing you to program it directly, but without the tools and documentation that would be hard to do.

Ah, that makes sense.
Wonder if Ageia will come out with a workstation version of their card or something that uses a different API...
 
aaaaa00 said:
I think someone should try profiling a Doom 3 or HL2 timedemo or other modern game. Even without real symbols, getting some more hard data to look at is always a good thing. :)

Here we go....

Doom3 1280x960, Hell level, just before the quad. Actual gameplay traced (note, not exactly the same trace in each case, but alas).
Doom3: All FP, SSE/SSE2 and MMX+3DNow!

Big monolithic .exe, a bit more than 25% of ops are FP ops, but predominantly regular x87 ones. Basically no MMX/3DNow!.

Far Cry 800x600, graphics detail very high, level: tree house second checkpoint.
Far Cry: All FP, SSE/SSE2 and MMX+3DNow!.

Nice breakdown of functions in different DLLs. Engine and physics having high ratios of FP ops (roughly 50% for physics, ~30% for Engine), again predominantly x87 (!!!).

Halflife 2, 1024x768, graphics detail maxed (reflect everything, high detail shadows etc, but no FSAA). Level: Water hazard (new game started there).
Halflife 2: All FP, SSE/SSE2 and MMX+3DNow.

Least FP work of the 3. Again almost exclusively x87.

Notes: These are all traces of gameplay, demos won't do, since they are just playback of recorded entities and hence no physics calculations etc. Codeanalyst was set to wait until game and level had loaded (45-90 seconds). Trace startet from same point everytime (either savegame or checkpoint), I tried to play the same way everytime, but as you can see the traces vary in length. I'm guessing though that the ratios of ops are fairly constant. Gameplay was from 3-6 minutes depending on game.

Considering how little FP work is being done in these, current, games, compared to the actual horsepower of modern CPUs (ie. almost no SSE/2 was used to boost throghput). I think that my assertion that FP will be virtually free on next gen consoles is correct (in fact the only cost that it will have is that it takes up issue slots for other ops :) ).

Edit, system: A64 3500+, 6800GT, 2GB RAM

Cheers
Gubbi
 
shaderguy said:
I'm just a caveman, and your tables of very large numbers frighten and confuse me. :) http://snltranscripts.jt.org/91/91gcaveman.phtml

Sorry, you should only pay attention to event columns and the module name column. Event "c1" is retired micro ops, event "cb" is retired fp ops. So cb/c1 gives you the fp ops to all ops ratio.

shaderguy said:
So... bottom line, based on existing high-end PC 3D games, do you think MS was right to optimize for integer performance?
I don't think Microsoft optimized for integer performance. Their MPU is > 100GFLOPS, which is a lot, and they dumbed down their cores as well. Sony just took it to a completely different level.

Cheers
Gubbi
 
shaderguy said:
So... bottom line, based on existing high-end PC 3D games, do you think MS was right to optimize for integer performance?
As Gubbi says, they didn't optimise for integer performance. They removed integer enhancing features to enable small cores so they can have three of them, and boosted FP throughput.

Even if MS's claims that consoles need to use integer grunt more than FP grunt are true and sincere, they certainly looked to have felt differently when they were having their CPU designed!

And regards comparison with PS3 performance, which is where this GP vs. FP debate stemmed from, it' not known what the real-world differences in GP (and FP) capabilties of the two processors are. MS treated the SPEs as unable to contribute anything to GP work, which isn't true. How much can they contribute? No clear answers yet. A couple posts on this forum suggest both reasonable and poor contribution if I remember right. Comparing XB360 and PS3 game performance is a non-event with paper tech specs we have, especially when you factor in new approaches to old problems that may or may not appear.
 
Shifty Geezer said:
shaderguy said:
So... bottom line, based on existing high-end PC 3D games, do you think MS was right to optimize for integer performance?
As Gubbi says, they didn't optimise for integer performance. They removed integer enhancing features to enable small cores so they can have three of them, and boosted FP throughput.

Stated another way: while MS obviously felt FP performance was going to become more important next-generation, clearly, they don't think they crippled integer performance as much as Sony did.

To be fair, both companies at this point probably believe things are going to move towards more FP in next generation games, and the design of the hardware itself is kind of a forcing function pushing developers towards that.

The interesting question is, who struck the better balance given all the existing design constraints and budget limits...
 
aaaaa00 said:
Shifty Geezer said:
shaderguy said:
So... bottom line, based on existing high-end PC 3D games, do you think MS was right to optimize for integer performance?
As Gubbi says, they didn't optimise for integer performance. They removed integer enhancing features to enable small cores so they can have three of them, and boosted FP throughput.

Stated another way: while MS obviously felt FP performance was going to become more important next-generation, clearly, they don't think they crippled integer performance as much as Sony did.

To be fair, both companies at this point probably believe things are going to move towards more FP in next generation games, and the design of the hardware itself is kind of a forcing function pushing developers towards that.

The interesting question is, who struck the better balance given all the existing design constraints and budget limits...

Not disagreeing with this post but could we please keep 'integer' and 'GP' performance from being inter-changeable so freely! It's unnecessarily confusing! :p
 
Also, is either platform's GP performance 'crippled' or just reduced from most CPUs? Accoridng to one 'article', if XeCPU's only 2x as fast as a 700 MHz P3, yes, it's crippled. But if it's actually on a par with say a fast P4, it's just reduced.

they might be crippled - I don't know. But it's a strong term to bandy about if their GP performance actually isn't bad and just significantly underpowered only in relation to their FP performance.
 
Gubbi said:
I don't think Microsoft optimized for integer performance. Their MPU is > 100GFLOPS
It's kind off topic, but from what I can tell PPE FPU is being exagerated out of proportion by both sides. MS gets a bit more PR out of it since they use 3 of them, but either way, it's inflated by both.

Anyway back on subject at hand - SIMD being predominantly ignored on x86 titles isn't too surprising, there's still no decent compiler support for SIMD (intrinsics are Not what I consider decent support), and on PC there's little time to bother with hand optimizing things.

Shifty Geezer said:
Also, is either platform's GP performance 'crippled'
Well compared to PS2 - the PS3 basically has 8 cpus each with GP performance roughly 10x that of a R5900.
Admitedly that doesn't take effects of caches/memory performance into account, since that's still largely unknown for Cell, but it doesn't sound crippled to me.
 
Jaws said:
Not disagreeing with this post but could we please keep 'integer' and 'GP' performance from being inter-changeable so freely! It's unnecessarily confusing! :p

So to keep pace with these things:

"GP" performance = to branching + integer

But integer is a diferent thing so

integer = ?

So when MS say that integer performance is best for games that dont mean that they have better GP performance :?:

Please correct me :? :D .Thanks in advance.
 
Shifty Geezer said:
shaderguy said:
So... bottom line, based on existing high-end PC 3D games, do you think MS was right to optimize for integer performance?
As Gubbi says, they didn't optimise for integer performance. They removed integer enhancing features to enable small cores so they can have three of them, and boosted FP throughput.

I think the confusion is that you're thinking performance relative to PC CPU designs, and I'm thinking performance relative to Cell.

It seems that Microsoft analyzed the Cell design, and came up with an alternative design with half the floating point performance and 3 times the integer performance. If you look at it that way, then you would say that Microsoft chose to optimize for potential integer performance at the expense of potential floating point performance.
 
pc999 said:
Jaws said:
Not disagreeing with this post but could we please keep 'integer' and 'GP' performance from being inter-changeable so freely! It's unnecessarily confusing! :p

So to keep pace with these things:

"GP" performance = to branching + integer

But integer is a diferent thing so

integer = ?

So when MS say that integer performance is best for games that dont mean that they have better GP performance :?:

Please correct me :? :D .Thanks in advance.

Of course people are free to make up their own terms, but in the current standard CPU design engineering terminology, integer performance means "integer, branching, load/store". As opposed to floating point perf, or streaming perf. It's true that general purpose performance would be a less confusing term. But we're stuck with "integer performance" for historical reasons.
 
Back
Top