PS3 vs X360: Apples to Apples high level comparison...

Discussion in 'Console Technology' started by j^aws, May 22, 2005.

  1. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,909
    Likes Received:
    8
    [​IMG]

    We've recent mis-information flying around, I' thought I'd *try* to 'normalise' available metrics for both systems to give an apples to apples high level architectural comparison so you can make your own conclusions.

    I'm only going to provide 'normalised' total system metrics compared to the above image as this is all we can compare across both systems at the moment until more details are released.

    1) Shader ops

    Shader ops in isolation are not very meaningful, but I'll try to compare to the above

    Earlier discussion on a shader op,

    http://www.beyond3d.com/forum/viewtopic.php?t=23169


    -PS3

    claimed PS3 ~ 100 billion shader ops per second

    Cell ~ 8 shader ops per cycle (7 SPU + VMX)

    8*3.2GHz ~ 25.6 billlion shader ops per second

    RSX ~ 136 shader ops per cycle

    136*0.55GHz ~ 74.8 biilion shader ops per second

    total= 74.8+25.6 ~ 100 billion shader ops per second

    PS3 ~ 100 billion shader ops per second


    -X360

    xGPU ~ 96 Shader ops per cycle

    96*0.5 GHz ~ 48 billion shader ops per second

    xCPU

    6*3.2~ 19.2 billion shader ops per second (3 VMX + 3 FPU)

    total= 48+19.2~ 67.2 billion shader ops per second

    X360 = 67.2 billion shader ops per second



    2) Dot products


    -PS3

    claimed PS3 ~ 51 billion dot products per second

    Cell ~ 8 per cycle (7 SPU + VMX)

    8*3.2GHz~ 25.6 billion dot products per second

    RSX ~ 51-25.6 ~ 25.4* billion dot products per second

    * deduced from claim

    PS3 ~ 51 billion dot products per second



    -X360

    claimed xCPU ~ 9 billion dot products per second

    xCPU~ 3 dot products per cycle (3 VMX)

    3*3.2 GHz ~ 9.6 billion dot products per second

    xGPU ~ 48 dot products per cycle (48-way vec4)

    48*0.5 GHz ~ 24 billion dot products per second

    total ~ 9.6 + 24 ~ 33.6 billion dot products per second

    X360 ~ 33.6 billion dot products per second



    3) TFLOPS

    Some theory to the madness,

    http://www.beyond3d.com/forum/viewtopic.php?p=523362#523362


    PS3 ~ 2 TFLOPS

    X360 ~ 1 TFLOPS

    Cannot derive these figures but both companies have used peak total system flops which cannot be compared with single/double precision programmable flops. On their own they do not mean much but they are apples to apples between X360 and PS3, IMHO.


    4) Memory

    FYI, earlier bandwidth discussion,

    http://www.beyond3d.com/forum/viewtopic.php?t=23011

    I'm going to normalise bandwidths and memory so that they are more comparable. What I mean by this is that 25 GB/s access to 256 MB is equivalent to 50 GB/s access to 128 MB or equivalent to 100 GB/s access to 64 MB etc etc...and assuming the same latencies apply...

    Currently AFAIK,

    * The 256 GB/s is not a physical inter-connect bandwidth, it's the intra-EDRAM module bandwidth *within* the EDRAM module. The inter-connect bandwidths between xGPU and the EDRAM module are 32 GB/s write and 16 GB/s read. These are the numbers from the 'leak' and the 256 GB/s is the 'effective' bandwidth. Since both systems will use compression/ bandwidth saving techniques, I'm using physical inter-connect bandwidth to a better apples to apples comparison.


    Starting point,


    [X360: CPU<==21.6 GB/s==>GPU]----48 GB/s* ----[10 MB]
    |
    |
    22.4 GB/s
    |
    |
    [512 MB]



    [PS3: CPU<==35 GB/s==>GPU]----22.4 GB/s ----[256 MB]
    |
    |
    25.6 GB/s
    |
    |
    [256 MB]


    >>>>>memory b/w and memory amounts normalise for PS3 to match X360<<<<<<<<


    [X360: CPU<==21.6 GB/s==>GPU]----48 GB/s* ----[10 MB]
    |
    |
    22.4 GB/s
    |
    |
    [512 MB]



    [PS3: CPU<==35 GB/s==>GPU]----48 GB/s----[119.5 MB]
    |
    |
    22.4 GB/s
    |
    |
    [293 MB]


    >>>>>FSB, CPU-GPU normalise for X360 to match PS3<<<<<<<<


    [X360: CPU<==35 GB/s==>GPU]----48 GB/s* ----[10 MB]
    |
    |
    22.4 GB/s
    |
    |
    [316 MB]



    [PS3: CPU<==35 GB/s==>GPU]----48 GB/s ----[119.5 MB]
    |
    |
    22.4 GB/s
    |
    |
    [293 MB]


    It's now easier to compare physical bandwidths and memories across both PS3 and X360 to give a better sense of data flows and data access. If the 256 GB/s* effective bandwidth of the EDRAM replaces the 48 GB/s* physical bandwidth, then it's easier to map and compare both architectures data flows IMHO.


    [X360: CPU<==35 GB/s==>GPU]----256 GB/s* ----[10 MB]
    |
    |
    22.4 GB/s
    |
    |
    [316 MB]


    >X360 normalised total system + VRAM = 326 MB


    [PS3: CPU<==35 GB/s==>GPU]----48 GB/s ----[119.5 MB]
    |
    |
    22.4 GB/s
    |
    |
    [293 MB]


    >PS3 normalised total system + VRAM =412.5 MB


    5) Summary


    [​IMG]

    So normalising and apples to apples figures for the above total system spec for PS3 are,

    PS3 vs X360

    PS3 ~ 100 billion shader ops per second
    X360 = 67.2 billion shader ops per second

    PS3 ~ 51 billion dot products per second
    X360 ~ 33.6 billion dot products per second

    PS3 ~ 2 TFLOPS
    X360 ~ 1 TFLOPS

    PS3 normalised total system + VRAM =412.5 MB
    X360 normalised total system + VRAM = 326 MB

    Normalised,

    Code:
    
    [PS3: CPU<==35 GB/s==>GPU]----48 GB/s ----[119.5 MB] 
    | 
    | 
    22.4 GB/s 
    | 
    | 
    [293 MB] 
    
    
    
    [X360: CPU<==35 GB/s==>GPU]----256 GB/s* ----[10 MB] 
    | 
    | 
    22.4 GB/s 
    | 
    | 
    [316 MB] 
    
    
    This is as close an apples to apples comparison that can be made with available info.

    No flames please, if they're are any mistakes or inconsistencies, then please let me know and I'll amend the data above. Also, I'm assuming equal efficiency across both systems with compilers, code etc.

    I'll re-iterate, it's a peak, apples to apples comparison, or as close to what we can get with available info at the moment without isolating any single components like CPUs, GPUs, bandwidths, total RAM etc...it's a total system vs system.

    IMHO, they'll both have their strenghs and weaknesses and will both be great systems but the PS3 has overall balance and power suited to a games console.

    Hopefully this helps and you can make your own conclusions...
     
  2. Vaan

    Newcomer

    Joined:
    Mar 16, 2005
    Messages:
    115
    Likes Received:
    1
    Location:
    Zaragoza, Aragón, Spain, Europe, World...
    I wonder if your RAM normalisation is correct at all :?
     
  3. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,909
    Likes Received:
    8
    As I've mentioned, it's based on the assumption that 25 GB/s to 256 MB is equivalent to 50 GB/s to 128 MB or 12.5 GB/s to 512 MB etc...keeping latencies the same...
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Well you've just "proven" that caching and data compression don't work.

    You can't normalise memory bandwidths like that.

    Jawed
     
  5. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,594
    Likes Received:
    10,999
    Location:
    Under my bridge
    Why would that be? Surely a listed figure as 25 GB/s is 25 GB/s for the RAM amount it connects to. Neither part has provided bandwidth/pin or bandwidth/megabyte RAM figures.
     
  6. one

    one Unruly Member
    Veteran

    Joined:
    Jul 26, 2004
    Messages:
    4,823
    Likes Received:
    153
    Location:
    Minato-ku, Tokyo
    Why not add the bandwidth to Local Store in Cell :wink:
     
  7. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    Well..we could also add EIB bandwith and registers file bandwith too! :)
    register files : 4*16*7*3.2 Ghz = 1.4 TByte/s
    local store : 16*7*3.2 Ghz = 360 GByte/s
    EIB : 96 * 3.2 Ghz = 300 GByte/s

    Dont' worry guys, I'm kidding :)
     
  8. blakjedi

    Veteran

    Joined:
    Nov 20, 2004
    Messages:
    2,975
    Likes Received:
    79
    Location:
    20001
    one i dont think that that is accurate because we aren't counting all physical interconnects within the processing system but only in teh rendering system. I think you do have to account for the x360 edram bandwidth somewhere in order to normalise it to PS 3 bandwidths.

    From a laymen's point of view (and as usually I need someone to correct if I'm wrong)whereas PS has to use its 35GB/s bandwith to do all of its backbuffer work, the x360 eliminates that need by doing almost all of its back buffer work before it goes back to the system ram pool and cpu for display. You have to account for all the internal buses where that work is done to compare it to the PS3 35GB/s where that work is done...
     
  9. Laa-Yosh

    Laa-Yosh I can has custom title?
    Legend Subscriber

    Joined:
    Feb 12, 2002
    Messages:
    9,568
    Likes Received:
    1,452
    Location:
    Budapest, Hungary
    Could someone also compare the shading units in the GPUs? I'm still confused about it to an extent...
    Like, ATI has 48 shader ALUs, but what does it use to do texture adress calculations? Are those separate from these 48, or is it like Nvidia's architecture where the pixel pipes' shader ALUs have to do it as well? And just how many ALUs are there per pixel pipe in the RSX?
     
  10. Vaan

    Newcomer

    Joined:
    Mar 16, 2005
    Messages:
    115
    Likes Received:
    1
    Location:
    Zaragoza, Aragón, Spain, Europe, World...
    Anyways, the thing should be something like this...

    [​IMG]



    So I don't find any method to measure or compare memory bandwidths between the two systems. Let's see in a couple of months with the full final specs.
     
  11. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    40,594
    Likes Received:
    10,999
    Location:
    Under my bridge
    But the connection to Xenos's backbuffer, held in eDRAM, is a 32 GB/s bandwidth. The super-fast stuff is the LOGIC on the eDRAM. What can this logic do?

    I guess that's the ultimate question. What work is done in eDRAM, not by the conventional GPU unit? How much does that logic contribute to the rendering?
     
  12. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    There are three threads of shader programs being processed on those 48 ALU's at any one time, but only 16 texture processors - the texture pipes are clients to the shader pipelines, so it makes no sense for the shader pipes themselves to be handling the texture adress processing, but rather the texture pipelines.
     
  13. blakjedi

    Veteran

    Joined:
    Nov 20, 2004
    Messages:
    2,975
    Likes Received:
    79
    Location:
    20001
    vaan-

    i know the xenos is also the system memory controller... the cpu can access the gddr3 directly, no? otherwise its bottlenecked at 11gbs/11gbs, reads/ writes?
     
  14. Vaan

    Newcomer

    Joined:
    Mar 16, 2005
    Messages:
    115
    Likes Received:
    1
    Location:
    Zaragoza, Aragón, Spain, Europe, World...
    I suppose it is "bottlenecked" at this bw, yes :?
     
  15. Bohdy

    Regular

    Joined:
    Jun 9, 2003
    Messages:
    731
    Likes Received:
    4
    No, that's not right AFAIK. The CPU can directly access memory of course, as they are on the same bus.
     
  16. Panajev2001a

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,187
    Likes Received:
    8
    Vector 1: x1 x3 x5 x7
    Vector 2: x2 x4 x6 x8
    Vector 3: y1 y3 y5 y7
    Vector 4: y2 y4 y6 y8
    Vector 6: z1 z3 z5 z7
    Vector 7: z2 z4 z6 z8
    Vector 8: w1 w3 w5 w7
    Vector 9: w2 w4 w6 w8

    Assuming one can issue a Vector MUL/ADD/MADD each cycle (throughput being 1 and that the pipeline is full), we can draw the following for these 4 dot products.

    (x1*x2) + (y1*y2) + (z1*z2) + (w1*w2)

    (x3*x4) + (y3*y4) + (z3*z4) + (w3*w4)

    (x5*x6) + (y5*y6) + (z5*z6) + (w5*w6)

    (x7*x8) + (y7*y8) + (z7*z8) + (w7*w8)

    We have two ways of doing this:

    1st way... 4 vector MUL's and 3 vector ADD's (7-8 cycles)


    So, we first do 1 vector MUL (1 cycle):

    (x1*x2) = A0

    (x3*x4) = B0

    (x5*x6) = C0

    (x7*x8) = D0


    We do now 1 vector MUL (1 cycle):

    (y1*y2) = A1

    (y3*y4) = B1

    (y5*y6) = C1

    (y7*y8) = D1


    We do now 1 vector MUL (1 cycle):

    (z1*z2) = A2

    (z3*z4) = B2

    (z5*z6) = C2

    (z7*z8) = D2


    We do now 1 vector MUL (1 cycle):

    (w1*w2) = A3

    (w3*w4) = B3

    (w5*w6) = C3

    (w7*w8) = D3


    We then have:

    A0 + A1 + A2 + A3

    B0 + B1 + B2 + B3

    C0 + C1 + C2 + C3

    D0 + D1 + D2 + D3


    We have to do the 3 ADD's in parallel, it should be obvious how this is done: A0 B0 C0 D0 can be called vector AA, A1 B1 C1 D1 can be called vector BB, A2 B2 C2 D2 can be called vector CC and A3 B3 C3 D3 can be called vector DD.

    So we add the first two pairs:

    AA + BB

    CC + DD

    Then we sum the results:

    (AA+BB) + (CC+DD)


    About 7 cycles to do 4 dot products.

    2nd way... 1 vector MUL's, 3 vector MADD's


    So, we first do 1 vector MUL (1 cycle):

    (x1*x2) = A0

    (x3*x4) = B0

    (x5*x6) = C0

    (x7*x8) = D0

    We do now 1 vector MADD:

    A0 + (y1 * y2) = A1

    B0 + (y3 * y4) = B1

    C0 + (y5 * y6) = C1

    D0 + (y7 * y8) = D1

    We do now 1 vector MADD:

    A1 + (z1 * z2) = A2

    B1 + (z3 * z4) = B2

    C1 + (z5 * z6) = C2

    D1 + (z7 * z8) = D2


    We do now 1 vector MADD:

    A2 + (w1 * w2) = A3

    B2 + (w3 * w4) = B3

    C2 + (w5 * w6) = C3

    D2 + (w7 * w8) = D3

    This is much faster.

    It should be quite fast... hopefully I did not write the slowest approach possible.

    Edit: I already applied the fix ;).
     
  17. blakjedi

    Veteran

    Joined:
    Nov 20, 2004
    Messages:
    2,975
    Likes Received:
    79
    Location:
    20001
    when you look at the block diagram the northbridge is the connection between all three areas:

    GPU <- 33.2GBs R/22.4GBs W -> Northbridge
    (actually the read value includes a sum of read bandwidth from L2 Cache = 10.8Gbs, plus normal northbridge bandwidth = 22.4 GBs) see (7) on block diagram =55.6 GBs total

    CPU <- 10.8GBs R/10.8GBs W -> Northbridge 21.6 GBs total

    Northbridge <- 22.4GBs R/W -> 512MB RAM 22.4 GBs total

    99.6GBs total not including edram access

    The confusing thing is that the northbridge sits on the GPU... but the maximum bandwidth the northbridge supports at anytime seems to be 22.4GBs in any one direction...

    Question is this the CPU Read bandwidth is only half that of the northbridge... wouldnt it have been better to have the same bandwidth for reads and writes as the GPU?

    Again feel free to correct this as I may not be presenting this correctly.
     
  18. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    The second approach is quite fast, it's almost as fast as it can be, you can further reduce instructions count from 5 to 4 with a sequence of fmul, fmadd, fmadd, fmadd.
    Don't worry about latencies here cause in any reasonable lengthy 'shader' or inner loop you're going to do some other (non dependant) calculation that would be eventually interleaved in your dot4 product.
     
  19. Qroach

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,868
    Likes Received:
    49
    Um were those calculations on how zenon would process data or Cell? You guys got me confused.
     
  20. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    CELL, Xenon extended VMX unit would use one or more dot instructions instead.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...