PS3 vs X360: Apples to Apples high level comparison...

If we're at it, what's the difference between a VMX and an SPE? Someone told me an SPE is a somewhat enhanced VMX but I have a hard time believing him... and we know that XCPU's VMX is customized...
 
I can't comment as I never worked with VMX instruction set, but it seems X360 extended VMX has a lot of neat features to make developers life easier (dot product instructions, special packing/unpacking instructions and other stuff..)
 
Jaws said:
2) Dot products


-PS3

claimed PS3 ~ 51 billion dot products per second

Cell ~ 8 per cycle (7 SPU + VMX)

8*3.2GHz~ 25.6 billion dot products per second

RSX ~ 51-25.6 ~ 25.4* billion dot products per second

* deduced from claim

PS3 ~ 51 billion dot products per second
I think this is wrong, and despite panajev's explanations, I guess they counted half of the max ops per cycle for both, i.e. 12.8 billion for Cell and 37.4 billion for RSX. It's the same way NVidia balanced NV40, half the shader ops can be dot products.
 
Xmas said:
I think this is wrong, and despite panajev's explanations, I guess they counted half of the max ops per cycle for both, i.e. 12.8 billion for Cell and 37.4 billion for RSX. It's the same way NVidia balanced NV40, half the shader ops can be dot products.
Interesthing theory but NV40 can do 22 dot4 per clock cycle (16 ps + 6 vs),
whilst RSX should be execute 37.4*10^9/550*10^6 = 68 dot4 per clock cycle! this a lot more then just some pixel pipeline tweaking..
 
nAo said:
Qroach said:
Um were those calculations on how zenon would process data or Cell? You guys got me confused.
CELL, Xenon extended VMX unit would use one or more dot instructions instead.

actually i'd expect xenon to use exactly the same madd approach for bigger batches of vectors, as the custom dot may easily have a) lower throughput when paired with itself, or b) lower throughput when paired with other ops. IOW, i'd be surprised if the custom dot op was meant to be used for batch processing. but that's all speculations, of course.
 
Jaws said:
PS3

claimed PS3 ~ 100 billion shader ops per second

Cell ~ 8 shader ops per cycle (7 SPU + VMX)

8*3.2GHz ~ 25.6 billlion shader ops per second

RSX ~ 136 shader ops per cycle

136*0.55GHz ~ 74.8 biilion shader ops per second

total= 74.8+25.6 ~ 100 billion shader ops per second

PS3 ~ 100 billion shader ops per second


-X360

xGPU ~ 96 Shader ops per cycle

96*0.5 GHz ~ 48 billion shader ops per second

xCPU

6*3.2~ 19.2 billion shader ops per second (3 VMX + 3 FPU)

total= 48+19.2~ 67.2 billion shader ops per second

X360 = 67.2 billion shader ops per second

Why are we counting the shader power of 7SPEs and the VMX units?

First, it is unrealistic to count those units (At least counting all of them) because they will be used for game data processing.

Second, if we want to count that power for shader ops we should not be double dipping. We should not be counting the SPEs or VMX units for CPU power if we are going to lump them in with shader op power.

They cannot do both at once.

Also, as the RSX is a traditional GPU (vertex and pixel shader units), you will not want the VS units sitting idle. While the CELL can surely take on some of the vertex load, the question I have is how much before it becomes counterproductive, i.e. You begin to have VS units sitting idle + using SPEs for vertex processing when those SPEs could be doing something productive, like phyics or AI! We do not know the answer to that question yet but we should not be making assumptions either.

PS3 ~ 2 TFLOPS

X360 ~ 1 TFLOPS

Cannot derive these figures but both companies have used peak total system flops

So why not look at the CPUs FLOPs, a certain figure with some relevance, instead of the rough "total system performance" numbers? We are already looking at the shader power in the GPU section, would it not be best to just isolate each part, determine its relevance and any bottlenecks, and THEN look at the big picture?

Also, in the list of the Top500 Super Computers there are computers with lower theoretical FLOPs performance that outperform computers with higher theoretical FLOPs performance. So while there is no doubt that the CELL has a superior theoretical max, we should see how that works in games (not just streaming some HD threads, which a streaming processor is designed for).

For example, is the flexibility of the VMX units going to give make up some room for the XeCPU, is the streaming architecture going to be difficult for games, is the 256K SPE cache too small, is the XeCPU just a rag tag 3 core general processor that will wilt away under TRUE multithread tasks.

For all we know is that the CELL may perform much closer to its theoretical compared to the XeCPU and thus widening the gap, or the reverse may be true. While these last few points are outside the apples-to-apples directly, they are very relevant to the point:

How will these chips perform in a gaming environment.

4) Memory

No where in here do I find anything about the bandwidth savings the Xenos has by using a very fast/small backbuffer and tiling the framebuffer.

You are not going to get apple-to-apple comparisons on systems with different designs. Leaving out the bandwidth savings for the eDRAM because there is no comparable part on the PS3 is like leaving out the SPEs on the CELL because there is no comparable part on the XeCPU.

Also:

:?:

Since when did FLOPs become the only valid metric for measuring processor performance? Floating point processing power is great for physics and vertex processing... but not all game code is of this type. And some game code, like AI, is going to be tweaked a bit to work on the SPEs.

Anyhow, both chips have PPC core(s). If the intent is to compare the systems "apples-to-apples" I find the lack of this information disconcerting. It is not sexy, but those general processing units is what have made PC and console games for the last 20 years what they are.

The designs are very different and balanced in different ways with different technologies and methods to arrive at the same conclusions.

5) Summary

...

This is as close an apples to apples comparison that can be made with available info.

No flames please, if they're are any mistakes or inconsistencies, then please let me know and I'll amend the data above. Also, I'm assuming equal efficiency across both systems with compilers, code etc.

I'll re-iterate, it's a peak, apples to apples comparison, or as close to what we can get with available info at the moment without isolating any single components like CPUs, GPUs, bandwidths, total RAM etc...it's a total system vs system.

I think you need to re-examine your methodology. Different designs and philophies in the systems and without taking that into consideration we are not comparing apples-to-apples, just similar numbers that may or may not have the same effect on each design.

IMO, you did take into account the savings of memory bandwidth from the eDRAM and you have not given any comparison of the PPC cores. I would suggest adding it. If they do not work within your framework then I would have to conclude your framework is what we call in theological circles, "Frame setting" or "Forcing a world view".

To reiterate, they are different designs and design philosophies.

Just because we are comparing some apple-to-apple metrics does not mean we arrive at an apples-to-apples conclusion, especially when we are discounting some apple-to-apple comparisons and when we are not taking design flow into consideration.

No offense, but I do not think this methodology is very helpful to arrive at any clear conclusions. At least not at this point. But I am glad it helped you to arrive at your own conclusion :p
 
Acert93 said:
is the 128K SPE cache too small

Just a small point, but it's twice that, AFAIK. And it's local sram, not cache, you can implement a caching system in software though if you need it (and there'll prob be libs to do so). Further to that point, they can access the PPE's cache, but I'm not sure what penalty there is for that - more than local sram, I'm sure, but less than main memory.
 
Acert93 said:
For all we know is that the CELL may perform much closer to its theoretical compared to the XeCPU and thus widening the gap, or the reverse may be true.

Hardware spokeperson at Microsoft, in an interview, said that you're going to get all that teraflop out of the box, since we have pretty much eliminated bottlenecks, and indeed it is what most people is thinking right now, just looking at the system.
I do think that when it comes to bottlenecks, it's most likely that PS3 (unless they have something hidden)will have some, thus reducing the gap.
 
Mordecaii said:
Yes, the SPE's each have 256k mem attached to them.

Correct, I changed that. Thanks.

Btw Jaws, I am not bashing you work--excellent as usual. I just do not see it as meaningful at this point. It would be like if the Rev has a PPU and we did not mention it because there is no apples-to-apples. While true, that PPU would make a huge difference in the final outcome of games, and that is all that is relevant. Same analogy works with cars. Horse Power alone tells us little about a car's seepd, or the speed of a car and its mass may not be the only relevant information needed to know how fast it can brake (e.g. anti lock breaks and weather conditions play a role).

Feel free to disagree of course; my impression though is the designs are really different and without really looking at how the details impact the end result we are just comparing numbers on paper.
 
X-AleX said:
For all we know is that the CELL may perform much closer to its theoretical compared to the XeCPU and thus widening the gap, or the reverse may be true.

Hardware spokeperson at Microsoft, in an interview, said that you're going to get all that teraflop out of the box, since we have pretty much eliminated bottlenecks, and indeed it is what most people is thinking right now, just looking at the system.
I do think that when it comes to bottlenecks, it's most likely that PS3 (unless they have something hidden)will have some, thus reducing the gap.

They are indeed still having lots of things which is still hasn't been revealed like the final clock rate of the Cell,the FlexIO,the full breakdown of the 300million trannies that the RSX and many other stuffs.
 
X-AleX said:
Hardware spokeperson at Microsoft, in an interview, said that you're going to get all that teraflop out of the box, since we have pretty much eliminated bottlenecks, and indeed it is what most people is thinking right now, just looking at the system.

And that PR quote is full of bull. It is not going to be easy getting 6 HW threads to run in parallel and fully effecient, especially with 1MB of L2 cache. There is going to be wasted power, and a lot of it.

Even if the SPEs are not as flexible, you have 7 symmetrical units that are meant to stream. If you are talented enough to get 6 HW threads to run on 3 PPC cores I am willing to bet you can probably get the 7 SPEs fed and streaming. That may not be true for all situations, but both designs have bottlenecks and the type of code you are running is going to make the difference.

I do think that when it comes to bottlenecks, it's most likely that PS3 (unless they have something hidden)will have some, thus reducing the gap.

They both have bottlenecks. The CELL has more bandwidth to the RSX than the XeCPU does to the R500. The RSX has more memory bandwidth; yet the R500 isolates its framebuffer. In a game that is not graphically intensive on bandwidth, the CELL will have a lot more memory bandwidth than the XeCPU.

It totally depends on what type of game you are doing. Different games will hit different walls.
 
X-AleX said:
Acert93 said:
For all we know is that the CELL may perform much closer to its theoretical compared to the XeCPU and thus widening the gap, or the reverse may be true.

Hardware spokeperson at Microsoft, in an interview, said that you're going to get all that teraflop out of the box, since we have pretty much eliminated bottlenecks, and indeed it is what most people is thinking right now, just looking at the system.
I do think that when it comes to bottlenecks, it's most likely that PS3 (unless they have something hidden)will have some, thus reducing the gap.

I think it's interesting that they both say you get aa for "free" and say there are no bottlenecks in the system. It used to be said quite often around here that "nothing is free in 3d". The idea is that if you can get anything for free, it means that when you aren't using that particular feature, the system is underperforming relative to what it could be doing. (IE, if you use AA, you might get close to a teraflop, but if AA is turned off you don't gain anything).

This probably isn't important with regards to the xbox360, because they have mandated that all titles will use AA so the feature will never go unused. I tend to wonder though, if they are doing this because they know with AA turned off the PS3 is significantly more powerful. If they can make AA a necessary feature for next gen titles though, they can cover up their weakness (or exploit thier strength, however you want to look at it) to make things more even.

Over all it's a good thing though. Even if it's only MSAA, I'm really happy to see it being pushed as a next gen feature. Now, if it's a decision between MSAA and significantly more realistic graphics, it's a bit more of a tough call. We'll have to see if MS/ATI's focus on AA pays off for them.

Nite_Hawk
 
Where exactly ? Cause even using the search I am seeing a ton of Xbox2 threads and posts , I really only posted this because it is something I wanted to see discussed but didn't notice it on this forum . Can you give me a link ?
 
Back
Top