PS3 vs X360: Apples to Apples high level comparison...

Fafalada said:
Personally I would find it ironic if that's the case, given how little use that would have outside specsheets and how they harp on Sony all the time about pushing peak numbers.

Is there anything ironic about double standards in marketing?

Yeah, I did not think so! ;) The SNL Iraqi Public Relations Officer spoof is all you every need to know about PR departments :LOL:

I actually am really looking forward to the say where we can have devs actually talk about working with each machine and tell us what real advantages and disadvantages they faced on each and how well they worked for their specific game title... and then seeing then games! None of thus numbers add up until they make better games, and PR departments have a hard time telling the public a bad game is good. They can fool them with the numbers, but the end product is much more difficult.

Down with PR! :devilish:
 
PC-Engine said:
Anybody know where and when MS gave out the 115.2 GFLOPS number? It isn't in any of their official documents. :?

http://www.xbox.com/assets/en-us/xbox360downloads/FactSheets.zip

And also the 218 GFlops for PS3,

http://www.scei.co.jp/corporate/release/pdf/050517e.pdf


Fafalada said:
Jaws your integer numbers are all over the place and basically off on some points.

Among other things, SPEs are dual issue - so if you want to make sweeping generalizations about performance you need to count them as 2 integer instructions per clock (scalar or vector for that matter :p ).

Are you referring to permute on the SPU/VMX and complex/simple integer unit ops for the PPC core? If so that would be an extra 'integer op per cycle' for all those execution units, right?

I was low-balling btw! :p


archie4oz said:
-XeCPU, FP, 32 bit

115 GFlops

Should be 96GFlops unless they've got a sneaky instruction that adds another 19Gflops...

As Faf mentioned there's been speculation that the 115 and 218 GFlops at 3.2 GHz consists of the PPC/PPE core to do 12 Flop per cycle? Any truth to this? :)
 
Hhaahahahah. Why do people go bananas over hype every gen? I mean c'mon. This stuff is all hand-waxed by experienced marketers so you drool everywhere.. Comparing theorectical performances is like comparing horsepower. It's not how much you have, it's how efficient it all works together.

Here's what's gonna happen:

*Whoever launches first will have the slowest hardware. Period. Consoles down the road will have access to better manufacturing from the start, so they WILL be faster. Especially since the consoles are being designed mostly by companies that are extremely capable at what they do so there's not much chance of total F-up.

*Cell may fail to impress, just like all of PS2's half-assed hardware did. Don't fall into Sony's BS circle. It's a first-gen chip. That should tell you all right there. They may succeed, but I immensely doubt it will be superior much at all to the highly-refined PowerPC in Xbox360 or whatever Nintendo will use.

*The graphics chips will be similar in capability, but the later-down-the-road consoles will have higher clocks or more pipelines or shader units, and probably a couple extra features of questionable usability.

*Xbox360 is backed by a near-deity so you better believe it won't fail. Good luck N and Sony in round two.

*Nintendo better come up with something insanely new or they've got trouble up their creek. They don't have near the following they once had, and have a stigma attached to them cuz of their previous 2 consoles showings.

*PC's will surpass the new consoles in about 2 years. PC's will always have advantages over consoles, and different genres of games. Consoles have advantages too of course.
 
Jaws said:
Are you referring to permute on the SPU/VMX
There's more then just permute that can be coissued, but anyway. Like aaa said, the whole integer metric like this isn't very meaningfull - people stopped counting performance with MIPS a very long time ago for a reason.

If you wanted something close to peak/theoretical numbers dhrystone benchmark should do well enough. It pretty much fits into L1 cache/local store respectively so it basically gives you the peak you can expect from relatively general purpose code on either architecture.
Of course, if you want to compare hardware you'd need to run the same compiler tech on both platforms also.

Any truth to this?
Well if there's any people that know, they aren't talking. :( I stand by my previous statement though - aside for increasing paper spec, it's a very useless feature, if it really is there.
 
PC tech yes, PC games, no.

That's highly debatable. There will always be games on either side of the fence that are unique and equally good. The lines are merging though. Once we get keyboards and mice with consoles there may be little point to gaming PC's and I won't complain much considering the cost differences involved. Not a good conversation to start here though. :)
 
This discussion confuses me.. I was thinking about this the other day - when you calculate the max. # of shader ops each system can churn per second it sort of comes out like this:

Cell has 8 cores, 1 op per cycle each at 3.2GHz = 25.6 billion shader ops per second.
RSX capable of 136 shader ops per cycle at 0.55GHz = 74.8 billion shader ops per second.
total = 100.4 shader ops per second.

XeCPU has 3 cores, 2 ops per cycle each at 3.2GHz = 19.2 billion shader ops per second.
Xenox is capable of 192 shader ops per cycle at 0.5GHz = 96 billion shader ops per second.
total = 115.2 shader ops per second.

Isn't this more meaningful than calculating flops? After all these things will be used for playing games, and games are becoming more and more bound by shader op capabilities than a system's flop performance? Or am I just a misguided fool that's missing the whole point?
 
tahrikmili said:
XeCPU has 3 cores, 2 ops per cycle each at 3.2GHz = 19.2 billion shader ops per second.
Xenox is capable of 192 shader ops per cycle at 0.5GHz = 96 billion shader ops per second.
total = 115.2 shader ops per second.

Where are these numbers coming from? Why 2 ops per cycle per core on X360 and only 1 per core on Cell? And the spec for Xenos is 48bn shader ops per second, not 96.

And what is a shader op, exactly? A vec4 op? A scalar op? Depending on how you define these things, one can come up with all sorts of numbers.

And it's very arguable where games are bound these days. Obviously it depends from game to game, but lots of games are CPU bound (MS anyway seems to think this will be increasingly the case going forward, if you look at some of their GDC presentations).
 
tahrikmili said:
This discussion confuses me.. I was thinking about this the other day - when you calculate the max. # of shader ops each system can churn per second it sort of comes out like this:

Cell has 8 cores, 1 op per cycle each at 3.2GHz = 25.6 billion shader ops per second.
RSX capable of 136 shader ops per cycle at 0.55GHz = 74.8 billion shader ops per second.
total = 100.4 shader ops per second.

XeCPU has 3 cores, 2 ops per cycle each at 3.2GHz = 19.2 billion shader ops per second.
Xenox is capable of 192 shader ops per cycle at 0.5GHz = 96 billion shader ops per second.
total = 115.2 shader ops per second.


Isn't this more meaningful than calculating flops? After all these things will be used for playing games, and games are becoming more and more bound by shader op capabilities than a system's flop performance? Or am I just a misguided fool that's missing the whole point?

Please check the *offical* specs as it's been mentioned several times in this thread that,

Xenos ~ 48 Billion shader ops per SECOND

48/0.5 GHz ~ 96 Shader ops per CYCLE ~ 48 Vec4 + 48 Scalar

Also there's a link on the first page/post that explains shader ops.

In addition, it does not differentiate 1-way, 2-way, 3-way, 4-way execution units, much as Flops don't differentiate 16bit, 32bit, 64bit calculations.

E.g. is 20 GFlops the SAME power as a single precision GFlop or a double precision GFlop?

Or, is 20 Shader ops the same as 20 Vec4 shader ops or 20 scalar shader ops? Flops wise, the vec4 is doing more computation. Hence, these metrics need parameters to guage further accuracy.
 
Titanio said:
Where are these numbers coming from? Why 2 ops per cycle per core on X360 and only 1 per core on Cell? And the spec for Xenos is 48bn shader ops per second, not 96.

I thought ATI had spoken out about 192 ops per cycle after MS's specs? I may be wrong..
 
DaveBaumann said:
Whats the distinction between number of operations and number of instructions handled?

Glad you noticed! :)

I'm trying to be consistent as I can in this thread. If you look at any of the metrics being compared in this thread, I've stayed away from the ISA. Just like CISC instructions or RISC instructions can produce similar operations via different instructions.

E.g.

XeCPU ~ 6 instructions per cycle ~ 6*3.2 GHz ~ 19.2 Billiion instructions per second

Cell ~ 2 + 7*2 ~ 16 instruction per cycle ~ 16*3.2 ~ 51 Billion instructions per second

Now, depending on the ISA, you could achieve the same 'operation' on the data using different 'instructions'. If XeCPU has a dot product instruction and CELL doesn't, then you'd need to use different instructions to manipulate the data to achieve the same result. And this result may also take a different number of clock cycles to achieve too.

So, instructions are like a 'tool set' and you use the appropriate 'tools' to achieve the desired work (operation) on the data in a given mount of time.

Well, that's how I see it... :)
 
DaveBaumann said:
Whats the distinction between number of operations and number of instructions handled?

As I understand it, each "shader pipeline" can perform 2 shader instructions (e.g. Vec4+Scalar) totalling 5 32bit operations.

Am I right?
 
pjbliverpool said:
DaveBaumann said:
Whats the distinction between number of operations and number of instructions handled?

As I understand it, each "shader pipeline" can perform 2 shader instructions (e.g. Vec4+Scalar) totalling 5 32bit operations.

Am I right?

A vec4 or scalar is commonly referred to as an op.

It's a good question. There doesn't seem to be any standard definitions (?)
 
Fafalada said:
archie4oz said:
Should be 96GFlops unless they've got a sneaky instruction that adds another 19Gflops...
Some people have speculated that XCPU FPU could possibly have Gekko-esque 2-way SIMD mode in single precision, adding 2 more flops/cycle to peak numbers.

Personally I would find it ironic if that's the case, given how little use that would have outside specsheets and how they harp on Sony all the time about pushing peak numbers.

It is IBM's fault here, 218 GFLOPS for the 3.2 GHz CELL based Broadband Engine assumes the same trick for the PPE's FPU.
 
I've already shown before how NVidia counts shader ops (page 3 of the NVidia PDF):

http://www.beyond3d.com/forum/viewtopic.php?p=522890#522890

And the number, 105.6Gsops (giga shader operations per second) calculated in my message is one that was presented in the PS3 press conference. It's for pixel shading only.

So not only is RSX's shader op counting method consistent with this document, but NVidia has publically stated this number (actually stated as 106Gsops).

If you include, say, 10 vec4+scalar ALUs for RSX, that's an extra 10x(4+1)x550 = 27.5Gsops producing a total for RSX of 133.1Gsops.

Xenos appears to have 48 Vec4 and 48 scalar ALUs, so in NVidia's style of counting that's 120Gsops.

Plainly the difference between the two architectures is that one of them is designed to run all ALUs at 100% utilisation and the other partitions its ALUs into special functions meaning that some of them will be sitting idle some of the time.

This architectural difference between the two highlights why it's so pointless counting sops like this.

Jawed
 
Jawed said:
I've already shown before how NVidia counts shader ops (page 3 of the NVidia PDF):

http://www.beyond3d.com/forum/viewtopic.php?p=522890#522890

Thanks for link. This actually explains the source of the misinterpretations. Let me elaborate... :)

Jawed said:
And the number, 105.6Gsops (giga shader operations per second) calculated in my message is one that was presented in the PS3 press conference. It's for pixel shading only.

There isn't anywhere on page 3 of your link that explicitely mentions "shader operations", i.e.

Page 3 said:
The NVIDIA GeForce 6 Series introduces an innovative shader architecture that can double the number of operations executed per cycle (Figures 1 and 2). Two shading units per pixel deliver a twofold increase in pixel operations in any given cycle. This increased performance enables a host of complex computations and pixel operations. The result is stunning visual effects and a new level of image sophistication within fast-moving bleeding-edge games and other real-time interactive applications.

or

Page 3 said:
Figure 1. Traditional shader architectures provide one shader unit and only process up to four operations per cycle.

or

Page 3 said:
Figure 2. Each GeForce 6 Series GPU features a superscalar architecture, with a second shader unit, to double pixel operations per cycle.

There isn't anywhere they mention shader operations. As I mentioned in my post above to tahrikmili, it needs to be clear what is being measured. In this case, they are just measuring component operations or pixel operations. You can replace component with pixel, vertices, vector components etc. but not shader.

The component operations also get confused with FMADD Floating point operations etc...

Jawed said:
So not only is RSX's shader op counting method consistent with this document, but NVidia has publically stated this number (actually stated as 106Gsops).

If you include, say, 10 vec4+scalar ALUs for RSX, that's an extra 10x(4+1)x550 = 27.5Gsops producing a total for RSX of 133.1Gsops.

Xenos appears to have 48 Vec4 and 48 scalar ALUs, so in NVidia's style of counting that's 120Gsops.

See above.

Jawed said:
Plainly the difference between the two architectures is that one of them is designed to run all ALUs at 100% utilisation and the other partitions its ALUs into special functions meaning that some of them will be sitting idle some of the time.

That *seems* likely so a higher 'shader op' number would be expected. But we don't know how the 'independent pixel/vertex' shaders will work for RSX. Also we don't know if Xenos has to utilize *more* transistors devoted to logic in maintaining this high utilization for example.

Jawed said:
This architectural difference between the two highlights why it's so pointless counting sops like this.

I've already said on several occasions in isolation they are not a good indicator, nor any metric for that matter. But with a set of metrics, parameters etc, these metrics can be cross-referenced for validity, coherence and some inferences made. After all, it didn't stop us speculating on the Xenon 'leak' numbers and the R500/EDRAM patents... ... Or does nVidia/ Sony hurt! :p
 
We're still waiting for a detailed comparison of the theoretical shader performance profiles of NV40 and R420.

And we've had them in our "labs" now for a year...

Jawed
 
Back
Top