Jaws said:
PS3
claimed PS3 ~ 100 billion shader ops per second
Cell ~ 8 shader ops per cycle (7 SPU + VMX)
8*3.2GHz ~ 25.6 billlion shader ops per second
RSX ~ 136 shader ops per cycle
136*0.55GHz ~ 74.8 biilion shader ops per second
total= 74.8+25.6 ~ 100 billion shader ops per second
PS3 ~ 100 billion shader ops per second
-X360
xGPU ~ 96 Shader ops per cycle
96*0.5 GHz ~ 48 billion shader ops per second
xCPU
6*3.2~ 19.2 billion shader ops per second (3 VMX + 3 FPU)
total= 48+19.2~ 67.2 billion shader ops per second
X360 = 67.2 billion shader ops per second
Why are we counting the shader power of 7SPEs and the VMX units?
First, it is unrealistic to count those units (At least counting all of them) because they will be used for game data processing.
Second, if we want to count that power for shader ops we should not be double dipping. We should not be counting the SPEs or VMX units for CPU power if we are going to lump them in with shader op power.
They cannot do both at once.
Also, as the RSX is a traditional GPU (vertex and pixel shader units), you will not want the VS units sitting idle. While the CELL can surely take on some of the vertex load, the question I have is how much before it becomes counterproductive, i.e. You begin to have VS units sitting idle + using SPEs for vertex processing when those SPEs could be doing something productive, like phyics or AI! We do not know the answer to that question yet but we should not be making assumptions either.
PS3 ~ 2 TFLOPS
X360 ~ 1 TFLOPS
Cannot derive these figures but both companies have used peak total system flops
So why not look at the CPUs FLOPs, a certain figure with some relevance, instead of the rough "total system performance" numbers? We are already looking at the shader power in the GPU section, would it not be best to just isolate each part, determine its relevance and any bottlenecks, and THEN look at the big picture?
Also, in the list of the Top500 Super Computers there are computers with lower theoretical FLOPs performance that outperform computers with higher theoretical FLOPs performance. So while there is no doubt that the CELL has a superior theoretical max, we should see how that works in games (not just streaming some HD threads, which a streaming processor is designed for).
For example, is the flexibility of the VMX units going to give make up some room for the XeCPU, is the streaming architecture going to be difficult for games, is the 256K SPE cache too small, is the XeCPU just a rag tag 3 core general processor that will wilt away under TRUE multithread tasks.
For all we know is that the CELL may perform much closer to its theoretical compared to the XeCPU and thus widening the gap, or the reverse may be true. While these last few points are outside the apples-to-apples directly, they are very relevant to the point:
How will these chips perform in a gaming environment.
No where in here do I find anything about the bandwidth savings the Xenos has by using a very fast/small backbuffer and tiling the framebuffer.
You are not going to get apple-to-apple comparisons on systems with different designs. Leaving out the bandwidth savings for the eDRAM because there is no comparable part on the PS3 is like leaving out the SPEs on the CELL because there is no comparable part on the XeCPU.
Also:
Since when did FLOPs become the only valid metric for measuring processor performance? Floating point processing power is great for physics and vertex processing... but not all game code is of this type. And some game code, like AI, is going to be tweaked a bit to work on the SPEs.
Anyhow, both chips have PPC core(s). If the intent is to compare the systems "apples-to-apples" I find the lack of this information disconcerting. It is not sexy, but those general processing units is what have made PC and console games for the last 20 years what they are.
The designs are very different and balanced in different ways with different technologies and methods to arrive at the same conclusions.
5) Summary
...
This is as close an apples to apples comparison that can be made with available info.
No flames please, if they're are any mistakes or inconsistencies, then please let me know and I'll amend the data above. Also, I'm assuming equal efficiency across both systems with compilers, code etc.
I'll re-iterate, it's a peak, apples to apples comparison, or as close to what we can get with available info at the moment without isolating any single components like CPUs, GPUs, bandwidths, total RAM etc...it's a total system vs system.
I think you need to re-examine your methodology.
Different designs and philophies in the systems and without taking that into consideration we are not comparing apples-to-apples, just similar numbers that may or may not have the same effect on each design.
IMO, you did take into account the savings of memory bandwidth from the eDRAM and you have not given any comparison of the PPC cores. I would suggest adding it. If they do not work within your framework then I would have to conclude your framework is what we call in theological circles, "Frame setting" or "Forcing a world view".
To reiterate, they are different designs and design philosophies.
Just because we are comparing
some apple-to-apple metrics does not mean we arrive at an apples-to-apples conclusion, especially when we are discounting some apple-to-apple comparisons and when we are not taking design flow into consideration.
No offense, but I do not think this methodology is very helpful to arrive at any clear conclusions. At least not at this point. But I am glad it helped you to arrive at your own conclusion