A bit of info on Cell's physic's abilities.

Titanio · Sep 29, 2005

Jawed said:
Except that at best the P4 in this comparison could be 10GFLOPs (see the graph I posted earlier).

So 15x or more supposed theoretical.

5x actual simply demonstrates, to me, what a poor architecture Cell must be. Running at 1/3 efficiency? Laughable. Particularly for something that is so strongly suited to it. Supposedly.

Jawed

Hold on a moment, you're comparing Cell's theoretical peak floating point performance to achieved floating point performance on a P4 in a particular task, and saying that represents the theoretical gulf between them? What?

A 3.2Ghz P4's theoretical floating point peak is about 12.5Gflops, I think (?). They achieved 8Gflops in that instance. Or ~64% of the peak. With specialised code, in the same task, a SPU came to ~74% of its peak. All this tells us is for this task, you'd want to write "specialised" code to get the best out of the SPU, but that even with non-specialised code, you're still doing better than the P4 (in absolute terms, not relative to peak. And you're going to have 6 or 7 or 8 SPUs). This is hardly surprising given how different the programming model on a SPU is vs what that library code would be used to.

Acert93 said:
But without knowing what a comparable CELL Blade costs it is hard to say. Are we comparing a server with 512MB of memory or 16GB? Is this a task where we need 4 dual cores, or will 2 dual cores (or 4 single cores) work, or even daisy chaining?

The question was how much it'd cost you, if you wish from a dollar perspective, to get similar performance to this. It's a theoretical of couse, you're not going to be running the same demo all day, we're just assuming for the purposes of this comparison that that's all you're interested in. And compare the cost of the processors alone.

Acert93 said:
And those are questions every IT guy has to ask.

You're addressing points for the sake of it, I clearly demonstrated that I was admittedly looking at this from a slightly different perspective than that of the IT guy later in that paragraph.

Acert93 said:
I would not even venture there because this benchmark doesn't tell us anything about gaming.

It tells you that if you ever have cloth simulation in your game - directly of this kind, or if you wanted to be a little speculative, of other, simpler kinds of cloth - it'll fly on a system with Cell vs an equivalently clocked P4. Not really surprising, but it's good to have a little more solid information. That'd be one part of a mix of tasks in the game, but it'd be one that'd contribute to any speed advantage Cell might have overall.

Acert93 · Sep 29, 2005

london-boy said:
Running Maya you mean? Alias|Wavefront is the company that makes Maya, the famous 3D package.

Rgr

I have only worked with 3DS Max and that is it--and that is only because a friend uses it. I defer all knowledge in this area to others

Anyway, one Cell chip (and they're talking about the same chip that will be in PS3 but running slower, which strengthens my point), will never cost 5 times as much as a P4 3.2Ghz. That's just crazy.

You would be surprised how that works. There is a huge premium in certain markets, like servers. Why?

Because they can.

STI's cost is not the same as the consumer cost. And with a CELL server you are paying for a server, not a chip. That includes extra XDR (and hopefully some other memory... it is not uncommon to have 16GB of memory on a server, and 4GB is pretty typical) and MB components, the blade, etc... The fact it is low volume is gonna jack up the price a TON. And then there is the server market support markup and warranty.

And we need look no further than MS's Xbox1. It had a PIII 733MHz. But MS was not paying anything near the retail cost.

A PS3 is going to be thin margins; a server is going to be profitable margins.

A GPU is a good example. How is MS/Sony able to put a cutting edge GPU with 512MB of memory into a console for $300 when you cannot even get a comparable GPU on the desktop market for $500?!

And that is not even including the optical drives, CPUs, networking, controllers, case, etc... Very different business models.

So a lot will depend on how much CELL Servers cost. And I really don't think you will be able to buy a CELL CPU--you are going to have to buy the whole shebang.

Still, this demo proves very little. Cloth simulation (and whatever else is being performed by Alias' demo) is just one thing that a processor needs to do. The fact that Cell happens to be 5 times faster than a P4 at this particular program doesn't mean much to us.

Correct. Obviously workstations will need to do more than just that; and even in a console environment games do more than that. Even then, as Shifty noted, it will be using something like Havok or Novodex which WONT be as accuracy dependant--it will be geared toward speed.

This demo is telling us that CELL works and has some advantages over a desktop chip in certain situations. While good to know, how practical that is is unknown. Interesting none the less

Heinrich4 · Sep 29, 2005

Jawed said:
Except that at best the P4 in this comparison could be 10GFLOPs (see the graph I posted earlier).

So 15x or more supposed theoretical.

5x actual simply demonstrates, to me, what a poor architecture Cell must be. Running at 1/3 efficiency? Laughable. Particularly for something that is so strongly suited to it. Supposedly.

Jawed

As I previously said for a first generation of software (either 1/3 or 50% of the potential)com about less than 6 disponibilizaÃ§Ã£o months of libraries will be cell sufficiently satisfactory if is compared with a P4 with inheritance of 3 decades of X86 architecture.

Acert93 · Sep 29, 2005

Titanio said:
The question was how much it'd cost you, if you wish from a dollar perspective, to get similar performance to this. It's a theoretical of couse, you're not going to be running the same demo all day, we're just assuming for the purposes of this comparison that that's all you're interested in. And compare the cost of the processors alone.

1. You wouldn't use a P4 though; without knowing how an Opteron/Xeon does and how well it scales it would be totally subjective--I would be pulling monkey's out my arse.

2. You cannot compare CPU costs necessary because you wont be buying CELL CPUs like you do P4 CPUs. CELL will come in server blades and you will buy them as a package.

You're addressing points for the sake of it, I clearly demonstrated that I was admittedly looking at this from a slightly different perspective than that of the IT guy later in that paragraph.

But drawing any console parallels is really stretching the data. It is pretty meaningly IMO.

It tells you that if you ever have cloth simulation in your game - directly of this kind, or if you wanted to be a little speculative, of other, simpler kinds of cloth - it'll fly on a system with Cell vs an equivalently clocked P4. Not really surprising, but it's good to have a little more solid information. That'd be one part of a mix of tasks in the game, but it'd be one that'd contribute to any speed advantage Cell might have overall.

1. This simulation is not for "game-class" physics.

2. This comparison is really of a workstation class. Further, you are comparing a product that wont be available in the desktop market--not to mention if it were you should be comparing an X2/Pentium D. And as I noted an X2 is 60% SMALLER than a CELL chip. Might as well compare a Quad Core Athlon 64 at that rate (which I might add may be on the market before we see CELL hit the desktop market).

3. Granted CELL has a performance edge in this area compared to 1 lowly P4. But this is not a gaming benchmark. Further, if we were looking holistically we would need to consider the tradeoffs in games. Is the acceleration in task (A) worth any tradeoff in task (B). So we really don't know much about any gaming performance--extrapolating such is kind of pointless.

4. Obviously from a price perspective a P4 with 512MB of traditional memory and such would be significantly cheaper from a manufacturing standpoint than a CELL Blade with 512MB of XDR. Really, comparing the two is REALLY out of context for anything practical.

Just a lot of apples and oranges.

I know you want to get something out of this in respect to the PS3. All you can take from this is that CELL outperforms a P4 in a cloth simulation intended for the content creation market.

That does not tell us much. Yeah, CELL is fast in this type of work. We expected that. How this translates to other markets (especially ones without a P4!) I dunno... seems kind of awkward. It really is not the intent of the benchmark.

But if it makes you feel any better: CELL would do cloth simulation better than a ficticious console with a Pentium D. I am sure we all knew that though.

dizzyd · Sep 29, 2005

Is this new? I could have sworn I've seen those screenshots in the PDF document months ago.

Lysander · Sep 29, 2005

_phil_ said:
it's IBM cell .They don't have a GPU

Yea, but mac has also dual G5, which can operate as ppe core (to cell spes), and pentium is incompatible with powerpc in mac.

Titanio · Sep 29, 2005

Acert93 said:
1. You wouldn't use a P4 though; without knowing how an Opteron/Xeon does and how well it scales it would be totally subjective--I would be pulling monkey's out my arse.

2. You cannot compare CPU costs necessary because you wont be buying CELL CPUs like you do P4 CPUs. CELL will come in server blades and you will buy them as a package.

True. The point is about the design choices made, though. Jawed is harping on about it being "rubbish" and "disappointing", but as far as this software is concerned, if you were building a chip for this software, would you build something like a Pentium, or something like Cell? And from that perspective, which would cost you more? That's my point, really. It's a justification, as far as this task goes, of the choices made by STI, the simplification of the core to increase the plurality of execution units within a similar resource budget as the "conventional" processor would take (yeah, it's about 60m more transistors than the latest P4, AFAIK, but it's x times the performance for this too). 1 SPU can be as good or better in some situations as one "conventional" core etc. etc.

Acert93 said:
Granted CELL has a performance edge in this area compared to 1 lowly P4. But this is not a gaming benchmark. Further, if we were looking holistically we would need to consider the tradeoffs in games. Is the acceleration in task (A) worth any tradeoff in task (B). So we really don't know much about any gaming performance--extrapolating such is kind of pointless.

I simply talked about this task, and that could have a part to play in a game as one task. And that it would be one input to overall performance. If you want to know about overall performance in a game, obviously that's something far less specific, and something that'll vary from game to game. And obviously your gains will depend on the proportion of your execution time spent in particular tasks. But comments from certain developers are encouraging from the perspective too (but obviously their comments are specific to their work).

Acert93 said:
I know you want to get something out of this in respect to the PS3.

And equally there are people who wish this could be as disassociated as possible as any kind of indicator of Cell performance in particular game tasks too.

But ultimately...

Acert93 said:
But if it makes you feel any better: CELL would do cloth simulation better than a ficticious console with a Pentium D. I am sure we all knew that though.

..this is my point. A point about design choices and justification of those choices. This particular instance is just one more indicator that they, you know, just might have taken a desireable route as opposed to sticking a P4 or a couple of "general" cores into the system.

dizzyd said:
Is this new? I could have sworn I've seen those screenshots in the PDF document months ago.

The PDF and info is new, I believe, but the demo is from E3.

one · Sep 29, 2005

Lysander said:
Yea, but mac has also dual G5, which can operate as ppe core (to cell spes), and pentium is incompatible with powerpc in mac.

Mac is just a dumb terminal, nothing more.

ERP · Sep 29, 2005

I'm surprised people haven't commented on the interesting part of the data to me.
How poorly the PPE did, relative to the SPU's and the Pentium.

On paper it has almost identical FP performance to an SPU, so why so poor?
I could guess, but I'll spare you the speculation.

Cloth simulation is pretty much the sort of thing the SPU's were designed for, it's the type of job I'd expect to see them doing in a game, and I'd expect them to perform well.
I'd be interested in seeing how well they dealt with larger datasets, or even if in this case they can, it's unclear if the entire dataset is resident on the SPU during simulation.

I can't imagine cloth simulation with self intersection being trivially seperable once the dataset exceeds the local memory. And the fact that they choose to run 8 distinct data sets on the seperate SPU's seems to support this.

Shifty Geezer · Sep 29, 2005

Okay, I can confirm from an email that the results are using single precision float calculations.

Jawed · Sep 29, 2005

ERP said:
I'm surprised people haven't commented on the interesting part of the data to me.
How poorly the PPE did, relative to the SPU's and the Pentium.

http://www.beyond3d.com/forum/showpost.php?p=580968&postcount=11

Jawed

Shifty Geezer · Sep 29, 2005

ERP said:
How poorly the PPE did, relative to the SPU's and the Pentium.

On paper it has almost identical FP performance to an SPU, so why so poor?
I could guess, but I'll spare you the speculation.

And performance-wise should this not be akin to a 'standard' CPU in performance too? Which would suggest the lacklustre performance is a matter of the differences between PPE and P4/PPC. Which I can only assume is a reliance on branching? :???:

What also surprised me is that the first SPE brings a considerable leap in performance (1 P4's worth) over the PPE, but subsequent SPEs only add half that (half a P4's worth). Yet additional SPEs scale linearly. I guess that's a matter of distributing workload between the SPEs which only one SPE won't have to contend with.

Shifty Geezer · Sep 29, 2005

Jawed said:
http://www.beyond3d.com/forum/showpost.php?p=580968&postcount=11

Jawed

Regards a DD1 cell, the pdf is dated 24th May so I guess that's a possibility.

Acert93 · Sep 29, 2005

Titanio said:
..this is my point. A point about design choices and justification of those choices. This particular instance is just one more indicator that they, you know, just might have taken a desireable route as opposed to sticking a P4 or a couple of "general" cores into the system.

?

I guess if you were doing a cloth similation game.

But this is the type of over the top conclusion--based on one benchmark that is obviously favorable to CELL--that really is unjustified. We have already heard from a few developers who are not too happy about the tradeoffs. I am not sure how you can be so aggressive about tradeoffs in GPUs (which, bear in mind, deal with a very limited data type and have extremely predictable and predefined use) and yet can correlate one benchmark--a very limited one at that--as an indication of CELL being better for such a broad task as game creation.

And this is from someone who thinks CELL is a much better choice in the console space for a gaming CPU. While some devs are not happy, I think on the whole CELL will bring more to the console space for gaming than a P4 or Pentium D could. But surely not in every situation, and I have a hard time cooresponding the limited data with the conclusion you are drawing.

What you are hinting at is your long held belief that CELL is a better chip. In the context of console gaming I would agree in most circumstances--but that is not a conclusion that should be reached from a really limited benchmark. Heck, reviewing the thread there are more questions about the benchmark than answers. For example, if this is running at a meager 10fps with all 8 SPEs working on it, then this is a poor tradeoff as neither a P4 or CELL are going to be pushing this task in realtime.

The real question is how well these chips would do in realtime gaming situations with the variety of code being tossed at them. Since the P4 is not a console CPU and this benchmark is totally aimed at the server/workstation sector I cannot see how you are arriving at your conclusion from this benchmark.

Doesn't mean I don't agree; only that the way we are arriving at that conclusion is really different. And as we both know a couple devs have already spoken up in favor of a x86 for gaming. So I am not going to put too much emphasis on a benchmark for a workstation class setup (especially when the reviewer leaves a lot of questions open and did not even both to run this code on a workstation class x86 rig!)

Jawed · Sep 29, 2005

Shifty Geezer said:
Regards a DD1 cell, the pdf is dated 24th May so I guess that's a possibility.

It'll be interesting to see if DD1 turned out to have no VMX functionality in the PPE. Perhaps DD1 PPE just has a non-vector FP pipeline.

That might explain the huge difference in die size between them, and all those scratched heads over what's so special about DD2 VMX.

Jawed

Titanio · Sep 29, 2005

Acert93 said:
?

I guess if you were doing a cloth similation game.

Titanio said:
This particular instance is just one more indicator that they, you know, just might have taken a desireable route

Acert93 said:
We have already heard from a few developers who are not too happy about the tradeoffs. I am not sure how you can be so aggressive about tradeoffs in GPUs (which, bear in mind, deal with a very limited data type and have extremely predictable and predefined use) and yet can correlate one benchmark--a very limited one at that--as an indication of CELL being better for such a broad task as game creation.

And this is from someone who thinks CELL is a much better choice in the console space for a gaming CPU. While some devs are not happy, I think on the whole CELL will bring more to the console space for gaming than a P4 or Pentium D could. But surely not in every situation, and I have a hard time cooresponding the limited data with the conclusion you are drawing.

Please re-read my posts.

Even devs who have complained about Cell have presented a model for usage of power "beyond one core" that matches well with what Cell offers (and btw, most complaints have been about difficulty etc. not how suitable the chip could be for games). E.g. Carmack, what is he using multi-core power for, beyond one core? "weâ€™ve got targets of opportunity for render surface optimization and physics work going on the spare processor, or the spare threads". Hmm. What is Crytek using power beyond one core for? "We scale the individual modules such as animation, physics and parts of the graphics with the CPU, depending on how many threads the hardware offers." What is a smaller dev's engine, the nFactor2 engine, using power beyond one core for? Physics, hair simulation, audio. What does Tim Sweeney think about this? "Fortunately these [things which are not suited to SPE acceleration] comprise a small percentage of total CPU time on a traditional single-threaded architecture, so dedicating the CPU to those tasks is appropriate, while the SPE's and GPU do their thing."

The only conclusion I made was with regard to the cloth simulation presented in the Alias demo. The rest is not a conclusion, but an observation from what we've seen discussed sofar by devs. To remind you of what I said:

Titanio said:
I simply talked about this task, and that could have a part to play in a game as one task. And that it would be one input to overall performance. If you want to know about overall performance in a game, obviously that's something far less specific, and something that'll vary from game to game. And obviously your gains will depend on the proportion of your execution time spent in particular tasks. But comments from certain developers are encouraging from the perspective too (but obviously their comments are specific to their work).

Acert93 · Sep 29, 2005

Titanio said:
The only conclusion I made was with regard to the cloth simulation presented in the Alias demo. The rest is not a conclusion, but an observation from what we've seen discussed sofar by devs. To remind you of what I said:

..this is my point. A point about design choices and justification of those choices. This particular instance is just one more indicator that they, you know, just might have taken a desireable route as opposed to sticking a P4 or a couple of "general" cores into the system.

The problem with your statement is that we have already heard numerous developers disagree. You are taking this and cherry picking other quotes to lead you to:

have taken a desireable route

Click to expand...

Which, by all means, cannot be substantiated universally. It obviously is NOT a more desirable route in the opinion of all developers.

And the VERY limited benchmarks we have seen are just that: limited. Cherry picking the data is not a proper way to arrive at a conclusion. The fact the tradeoffs are not good in some developers opinion indicates that your conclusion, that this is a desirable route (which has always been your stance--long before there was data, so lets not kid ourselves) is subjective.

Like I said, I tend to agree with this. But I don't think you can say they have "taken a desirable route" based on the VERY LITTLE we have seen. How CELL works in a total gaming environment is waaaay more important than how it can chew through a single benchmark that happens to be very compatible with its strengths. There are obvious tradeoffs in the design.

I think the tradeoffs are good, that on the whole the bottlenecks (i.e. the areas where the P4 would just plain spank a CELL--and those do exist) will be outweighed by the gains.

But who am I? Not all developers agree and it is THEIR opinion that matters. So I am not sure we can say that the tradeoffs are good--and surely that a couple benchmarks demonstrate what you think they do. The design choices will be very good for certain situations, no doubt, but there is questions in regards to whether it was on the whole.

Oh well, you have been a CELL proponent from day 1. You still are comparing apples and oranges and really trying to take a workstation benchmark and justify it in the context of a gaming console. That does not fly in my book--whether I agree or not.

Titanio · Sep 29, 2005

"The problem with your statement is that we have already heard numerous developers disagree. You are taking this and cherry picking other quotes to lead you to:"

Not so much for reasons of suitability as for reasons of difficulty/resources.

Acert, I think you're looking for an argument that's not there.

But I think you should take some of your own advice on the usefulness of individual statements from developers, and realise that mine are just opinions aswell. If we're talking about the suitability of Cell for games, I think mine is a well made argument, and there is commentary among those who are working with the hardware, even those who have been critical of Cell for various reasons, to support my argument. Obviously games are not all the same, obviously different games have different requirements, but of the sampling I have, of the devs I have read about discussing this issue, there is much to support the choices STI made (not always from a "difficulty" point of view, but from a "suitability"/relevance pov). There may not be any such thing as a universal truth, but if we stick to only making arguments that can be universally supported, we'd never make any posts here. Expecting arguments that can only be supported in that way is unrealistic, and such support is unnecessary. I've made points based on my own opinion, and supported that with the statements of others, if you disagree, hey, come back and make your argument, and highlight your own supporting statements. And please stop saying that I'm making conclusions here. I mean I could make a conclusion based on what we've heard sofar if I wanted, but I haven't even done that.

Again, what I said:

This particular instance is just one more indicator that they, you know, just might have taken a desireable route as opposed to sticking a P4 or a couple of "general" cores into the system.

Does desireable mean perfect in every way for every developer? No. You're suggesting I'm making that assertion, I am not.

Furthermore:

I simply talked about this task, and that could have a part to play in a game as one task. And that it would be one input to overall performance. If you want to know about overall performance in a game, obviously that's something far less specific, and something that'll vary from game to game. And obviously your gains will depend on the proportion of your execution time spent in particular tasks. But comments from certain developers are encouraging from the perspective too (but obviously their comments are specific to their work).

What I've said is very reasonable, and you're trying to find interpretation such that you can respond with any negative, IMO - in this instance, questioning the "universality" of my points, which we know is an impossibility for any point, instead of actually presenting substantive counter-points. You're suggesting we shouldn't post a point here unless we find agreement with all developers first, and thus that I shouldn't be making these points at all, or you're taking my points as if I am presenting them as absolutes, when I have quite clearly provided much qualified context for them. As I said earlier, you're arguing for the sake of it, and I think because I tend to occupy an opposite side of the fence to you on most issues, it's becoming in-built for you to disagree with me in whatever way you can (even when you, as you say yourself, don't necessarily disagree!). If you really must disagree, please find points other than "well some dev might disagree".

Lysander · Sep 29, 2005

one said:
Mac is just a dumb terminal, nothing more.

"The client-side of the application receives the streams of data coming from one or more Cell processors, reconstructs the mesh and generates the display." That does not sound dumb to me. You say that mac has nothing to do with performance benchmark.

ADEX · Sep 29, 2005

Youse are talking as if this was a benchmark release, read the PDF:

"This technology demonstration shows a prototype of a next generation cloth solver algorithm."

"It is a proof-of-concept..."

"At the current stage of our experiments..."

"...Cell Processor Based Blade prototype hardware and development tools..."

It's prototype algorithm on low clocked prototype hardware with a prototype compiler. That it manages to run 5X faster is actually pretty amazing. Give it a year or two then compare.

A bit of info on Cell's physic's abilities.

Titanio

Acert93

Artist formerly known as Acert93

Heinrich4

Acert93

Artist formerly known as Acert93

dizzyd

Lysander

Titanio

one

Unruly Member

ERP

Shifty Geezer

uber-Troll!

Jawed

Shifty Geezer

uber-Troll!

Shifty Geezer

uber-Troll!

Acert93

Artist formerly known as Acert93

Jawed

Titanio

Acert93

Artist formerly known as Acert93

Titanio

Lysander

ADEX

Similar threads