PS3 vs X360: Apples to Apples high level comparison...

Discussion in 'Console Technology' started by j^aws, May 22, 2005.

  1. Nite_Hawk

    Veteran

    Joined:
    Feb 11, 2002
    Messages:
    1,202
    Likes Received:
    35
    Location:
    Minneapolis, MN
    I think people are getting really mixed up over everything. Between 32GB/s external bus throughput to 256Gb/s external bus throughput to 256GB/s internal bus throughput. People end up mixing up terms and buses. I think most people were of the opinion that the 256GB/s external bus width was probably fictional, but anything on the edram chip internally could talk much faster. Granted, I don't know if people expected the edram "processor" to be able to do blending/aa/etc.

    Nite_Hawk
     
  2. Lazy8s

    Veteran

    Joined:
    Oct 3, 2002
    Messages:
    3,100
    Likes Received:
    19
    Jaws:
    Microsoft claimed 'more than 1 TFLOP'. The X360 GPU probably rates well over 1 TFLOP by itself by counting its fixed functionality as floats in the same way nVidia did. It appears Microsoft was counting this way and just rounded to the nice 1 TFLOP spec, and Sony outdid them in their announcement by not rounding down.
     
  3. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    No official details but the 2-way SMT and 12 flops per cycle was inferred from 115 and 218 GFlops @ 3.2 GHz for XeCPU and CELL.


    Sorry your question and your quote don't seem related or I'm missing what your asking here? If your asking whether the fixed function logic/ALUs on the EDRAM module are included, then no...it's only shader ALUs.

    The fixed function stuff would be included in the 1TFLOP number of X360 though...

    Your above quote is the 51 Giga dots/sec for both CELL and RSX. I took 8 dots/cycle for CELL (VMX+7 SPU)...but the above assumes 7, excluding the VMX for CELL.

    This would suggest that the '52' number is 52 vec4 units contributing to the 136 shader ops per cycle for RSX, then 136-52 ~ 84 ALUs would be scalar ALUs or ones not capable of dot products on the RSX...i.e.

    52 Vec4 units + 84 vec?/scalar units?

    Vec4 + scalar units can be paired,


    RSX

    52 Vec4 + 52 Scalar + 32 vec? units?

    :?

    I agree Xenos is cool! 8)

    But some of these sites are really just confusing all these numbers.

    It's 48 Billion shader ops per second for Xenos in the *official* specs,

    http://www.xbox.com/assets/en-us/xbox360downloads/FactSheets.zip

    Also the "240 floating-point shader ops per cycle" they mention can be easily confused with single precision 240 floating-point ops per cycle (flops)! Which is not accurate as that would be 480 flops per cycle with FMADD! :p

    Anyway, the numbers on the first page of this thread are accurate from the info we have...and these random sites are throwing all sorts of conflicting numbers around...


    IIRC, from official specs,

    RSX ~ 1.8 TFlops
    CELL ~ 0.218 TFlops

    X360 is still quoted at system total ~ 1 TFlops
    XeCPU ~ 0.115 TFlops
    Xenos ~ 0.885 TFlops

    Not sure why one would 'round down' and the other 'round up' given the oportunity. But it could well be that the RSX has alot of fixed function logic on-board that counts to that number whilst the Xenos transistor count has 10 MB of eDRAM which wouldn't contribute to that number...
     
  4. blakjedi

    Veteran

    Joined:
    Nov 20, 2004
    Messages:
    2,985
    Likes Received:
    88
    Location:
    20001
    How in the world does the Nvidia rate at 1.8 Teraflops? Nomatter what I've read it just doesnt add up.
     
  5. ShootMyMonkey

    Veteran

    Joined:
    Mar 21, 2005
    Messages:
    1,177
    Likes Received:
    72
    Same way Xenos rates at 900 GFLOPS... it's called misleading the consumer. For instance, RSQ could be counted as 1 FLOP, but not in marketing-land. Instead, we'll count the lookup as one FLOP, and count all the FLOPs used in the NR refinement, and then you'd get something like 15-odd flops in a single shader instruction. Or perhaps you can imagine that it does SIN/COS using the first 4/5 terms of the Maclaurin Series and geometrically mirroring the results. That would amount to... what... 30 FLOPs per instruction? So all you have to do is consider how much computing power the GPU would have if you did nothing but SIN and/or COS and/or RSQ for every single instruction you'll ever execute. There's a few TFLOPs for you.
     
  6. AkiraX

    Newcomer

    Joined:
    May 25, 2005
    Messages:
    3
    Likes Received:
    0

    "ATI: The 2-terabit (256GB/sec) number comes from within the EDRAM, that’s the kind of bandwidth inside that RAM, inside the chip, the daughter die. But between the parent and daughter die there’s a 236Gbit connection on a bus that’s running in excess of 2GHz. It has more than one bit obviously between them."

    http://firingsquad.com/features/xbox_360_interview/page3.asp


    also, old diagram:
    http://www.xbitlabs.com/misc/picture/?src=/images/news/2004-04/xbox2_scheme_bg.gif&1=1
     
  7. AkiraX

    Newcomer

    Joined:
    May 25, 2005
    Messages:
    3
    Likes Received:
    0
    FiringSquad: What types of operations do the EDRAMs 192 processors perform?

    ATI: Well they do z-compares, they do alpha blends, they do blends of samples to make a pixel. That kind of thing. They do stencil operations also. And this is the first time memory has access to something like this, right in the memory, so it never leaves the memory die. The memory and the logic is all built into one die. And it’s also a power savings by the way.

    http://firingsquad.com/features/xbox_360_interview/page3.asp
     
  8. Lazy8s

    Veteran

    Joined:
    Oct 3, 2002
    Messages:
    3,100
    Likes Received:
    19
    Jaws:
    I don't think the total "targeted" FLOPS "power" of the Xenos graphics chipset has ever been disclosed. The PR rough guideline for total system performance is too vague to consider it an absolute quantity useful in deriving 885 GFLOPS for the GPUs. Considering the NV40 was already rated around 1 TFLOP by similar nVidia accounting, I suspect X360's next generation graphics chipset probably delivers something comparable and more.
    Microsoft probably felt claiming the magical TFLOP barrier would be spoiling enough, and Sony was left in the position to be more exact in order to show that there would still be some improvement in power for their system.
     
  9. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    NVidia claims 360 Gflops for NV40, counting PS, VS, texturing and blending. That figure is a bit on the high side, but probably not too far off.
    If we take the 136 to 53 shader ops comparison as RSX being "2.57 times NV40", we arrive at 920 Gflops. And btw, it could very well mean RSX has 28 of 32 pixel pipelines (a parallel to Cell ;))
     
  10. jvd

    jvd
    Banned

    Joined:
    Feb 13, 2002
    Messages:
    12,724
    Likes Received:
    9
    Location:
    new jersey
    which is 880gflops less than they claim
     
  11. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,400
    Likes Received:
    440
    Location:
    San Francisco
    Do you remember where you read those numbers? some official nvidia document?
    BTW, you have a PM :)
     
  12. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    That's where they counted the other parts: triangle setup, the whole Z subsystem, LOD calculation, interpolators, whatever.
    Given the emphasis on HDR, they have probably doubled the capabilities of the TMUs handling FP textures, so sampling a FP16 texture is very likely single clock. And texturing is more than 40% of that NV40 figure.
     
  13. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    http://developer.nvidia.com/object/xdc_2005_presentations.html
    It's in the slides pdf.
    But I'm not sure these numbers are entirely correct. The counting for texture and blend flops seems a bit off.
     
  14. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Isolating XeCPU and CELL isn't strictly a total system, apples to apples comparison but I've noticed a few peak metrics missing alongside GFlops. Namely integer and scalar meterics. I haven't seen official numbers on these yet but here's some peak numbers from what we know so far (please feel free to correct me),

    -XeCPU, integer, 32bit

    1 core ~ 1VMX + 1 IU ~ 4 + 1 ~ 5 integer ops per cycle

    3 cores ~ 3*5 ~ 15 integer ops per cycle
    15*3.2 GHz ~ 48 Billion integer ops per second

    -XeCPU, scalar

    1 core ~ FPU + IU ~ 2 scalar ops per cycle

    3 cores ~ 3*2 ~ 6 scalar ops per cycle
    6*3.2GHz ~ 19.2 Billion scalar ops per second

    -XeCPU, FP, 32 bit

    115 GFlops


    -CELL, integer, 32 bit

    PPE ~ 1VMX + 1 IU ~ 4 + 1 ~ 5 integer ops per cycle

    7 SPUs ~ 7*4 ~ 28 integer ops per cycle

    CELL ~ 33 integer ops per cycle
    33*3.2GHz ~ 105.6 Billion integer ops per second

    -CELL, scalar

    PPE ~ FPU + IU ~ 2 scalar ops per cycle

    7 SPUs ~ 7*1 ~ 7 scalar ops per cycle

    CELL ~ 9 scalar ops per cycle
    9*3.2 GHz~ 28.8 billion scalar ops per second

    -CELL, FP, 32 bit

    218 GFlops


    CELL vs XeCPU

    CELL~ 105.6 Billion integer ops per second, 32bit
    XeCPU~ 48 Billion integer ops per second, 32bit

    CELL~ 28.8 Billion scalar ops per second, 32bit
    XeCPU ~ 19.2 Billion scalar ops per second, 32bit

    CELL~ 218 GFlops, 32bit
    XeCPU~ 115 GFlops, 32bit

    Off course these are peak numbers...
     
  15. PC-Engine

    Banned

    Joined:
    Feb 7, 2002
    Messages:
    6,799
    Likes Received:
    12
    Anybody know where and when MS gave out the 115.2 GFLOPS number? It isn't in any of their official documents. :?
     
  16. Fafalada

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    2,773
    Likes Received:
    49
    Jaws your integer numbers are all over the place and basically off on some points.

    Among other things, SPEs are dual issue - so if you want to make sweeping generalizations about performance you need to count them as 2 integer instructions per clock (scalar or vector for that matter :p ).
     
  17. aaaaa00

    Regular

    Joined:
    Jul 24, 2002
    Messages:
    790
    Likes Received:
    23
    The things that consume the most integer execution time are generally not the integer math ops, but things like store, fetch, and branch.

    Comparing the # of instructions per second peak doesn't give you a meaningful number at all.
     
  18. Fafalada

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    2,773
    Likes Received:
    49
    Of course not, but if you do go writing it out at least you should make it accurate.

    For that matter the idea that dual-issue will double your instruction throughput couldn't be farther from the truth on in-order CPUs either. Especially in any kind of general purpose code.

    Actually the places where dual issue makes the most difference is what SPEs tend to be optimized for.
     
  19. archie4oz

    archie4oz ea_spouse is H4WT!
    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,608
    Likes Received:
    30
    Location:
    53:4F:4E:59
    Should be 96GFlops unless they've got a sneaky instruction that adds another 19Gflops...
     
  20. Fafalada

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    2,773
    Likes Received:
    49
    Some people have speculated that XCPU FPU could possibly have Gekko-esque 2-way SIMD mode in single precision, adding 2 more flops/cycle to peak numbers.

    Personally I would find it ironic if that's the case, given how little use that would have outside specsheets and how they harp on Sony all the time about pushing peak numbers.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...