PS3 vs X360: Apples to Apples high level comparison...

Discussion in 'Console Technology' started by j^aws, May 22, 2005.

  1. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Its a shame Rene knew shit about Xenos before he spoke. A lot of wat he said was actually wrong.
     
  2. MightyHedgehog

    Newcomer

    Joined:
    May 29, 2004
    Messages:
    43
    Likes Received:
    1
    Location:
    Tempe, AZ
    So, that means that Xenos actually does have a PPP for creating/modifying/deleting vertices, then?

    As a layman, I wanted to ask a question in one of the other threads where DeanoC spoke of Xenos' ability to read/write anywhere in main RAM...what purpose would that have for game visuals, other than the academic possibilities? Physics?
     
  3. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Hopefully I'll have some more up later next week.
     
  4. pc999

    Veteran

    Joined:
    Mar 13, 2004
    Messages:
    3,628
    Likes Received:
    31
    Location:
    Portugal
    So much time, yet :cry:
    Well at least it will worth the wait :D ( I hope that I am able to understand the article).

    BTW thanks in advance by the article :D
     
  5. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Curious, who's "we"? B3D or...?

    Anyway, just stating the obvious but those comparison's would be for a SM3.0 and SM2.0 architecture's and even more disparity...

    So...

    http://www.beyond3d.com/forum/viewtopic.php?t=13470

    So...you've read the 'leak' wrong. It seems 'Billions' and 'cycles' and 'seconds' and 'numbers' are being confused...

    What makes this more frustrating is that we've discussed these leaks many times in the other threads...

    See above. That's a peak 'component' operation and NOT a 'shader' operation as mentioned earlier.

    Please read my first post on the first page...

    If you have, then it would be obvious that, CELL + RSX ~ 100 Billion shader operations per second.

    And your quoting ONLY the Xenos GPU ~ 120 Billion shader operation per second ??

    Lets keep this logical here... ;)
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Jaws, we have reasonably detailed architectural diagrams for NV40 and R420 plus explanations on how they work. Care to explain, in detail, how they perform against each other, based purely on theory?

    In other words, can you convert the theoretical capabilities of these two architectures into a realistic prediction of the performance of them?

    NV40 has 2x the SM2.0/3.0 ALU capability of R420, which should overhaul its core-clock disadvantage. But it doesn't. etc.

    What I've learnt over the last few days is this is a road to nowhere. I'm aghast that you still think it's worth pursuing this.

    I'm quite happy to speculate on the architectures, but I'm going to stick to throwing around stupid performance numbers for the sake of taking the piss out of the marketing. ATI's now counting 120Gsops for Xenos. It's now time for NVidia to counter that.

    Jawed
     
  7. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Oh yeah, you're right, I confused the "96 ops per cycle" and "96 Gops per second" numbers.

    Sigh.

    Jawed
     
  8. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Saying something has "2 ALU's" doesn't mean anything in either cases. R420's primary ALU has all the instructions that is supports, whilst the 2 ALU's of NV40 has a distribution of instructions between the two - this means that it can opportunistically dual-issue some cases, but not necessarily two instructions of the same type.
     
  9. quest55720

    Regular

    Joined:
    Jun 6, 2003
    Messages:
    862
    Likes Received:
    14
    Dave why you posting? You should be working on your XGPU article :lol: . J/K
     
  10. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    If you've read the first post in this thread then you'd know that,

    I'm trying to put perspective to these numbers from *both* sides. The 'peak' metrics have been derived on the first page and are valid for both as they have been consistently derived. So when some random 'numbers' come along from different, conflicting sources, pulling numbers out of context, there's some reference and a persepective. I've made clear they are 'peak' numbers and as close to an apples to apples comparison as you can get with the info we have from *offcial* docs from E3 PR.

    This is essentially no different that putting the *offcial*, released spec metrics side by side. At least it has context and discussion in this thread.

    And NOBODY has *real* world numbers. My point? FACTS not BS that's been flying round recently. Even if these facts are *purely* theoretical but nevertheless can be derived logically.

    I've read some threads recently with people STILL crying foul at PS2 specs and yet conveniently forgetting to cry foul at XBOX specs and vice-versa. I smell fanb**s...

    As I've just mentioned above, I'm after FACTS, be they theoretical or not but nevertheless authentic and not conflicting. Once these FACTS can be agreed on, then we can have sensisble discussions on whether they can be realisied or not or how realistic they can be or where potential weaknesses may lie. But unless that's agreed on, all we'll see are pointless, WRONG numbers spreading misinformation...


    I don't care if technology is from ATI, nVIdia, Sony, MS, Nintendo or ACME...I care about interesting technology no matter who it's from. If the PR can't agree on consistency, then surely so called *smart* people on this forum can?

    Have you read my derivations of the 'peak' numbers on the first page? If you can dispute them then please feel free as I want FACTS that can be consistently and independently derived from *offcial* info. I've used a consistent method that's held solid so far with *official* numbers. It's explaind all the other, conflicting numbers, including yours. If you can't accept/dispute those numbers then I've learnt something new today...
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Maybe you want to look at page 13 of the PDF I linked:

    - Pixel shader operations/pixel 8
    - Pixel shader operations/clock 128

    These are the claimed numbers for NV40.

    51.2Gsops. Roughly half of what's claimed for RSX.

    How much more black and white do you want?...

    If only we could talk in terms of pixel shader instructions, comparisons would start to get meaningful. This example shows SM3 executing 102 instructions in 46.75 cycles, 2.2 instructions per cycle:

    http://www.beyond3d.com/forum/viewtopic.php?p=327176#327176

    It's also interesting to ask about the effect of RSX's likely SIMD pixel shader architecture. NV40 appears to be SIMD across all 16 pipelines, i.e. only one shader can be executing at a time:

    http://www.beyond3d.com/forum/viewtopic.php?t=23295

    R420 is 4-way MIMD across 16 pipelines, i.e. each quad can execute a different shader. Counting transistors, this means that R420 has prolly got a greater overhead in instruction decode logic than NV40.

    I wonder if Xenos will be 48-way MIMD, i.e. each ALU can be running a different shader. I'm sorta doubtful, to be honest, because that's an awful lot of decode-logic overhead - though I admit to not knowing what that amounts to in percentage terms. I aint got the foggiest!

    RSX and Xenos are looking as incomparable as NV30 and R300 did a few years ago.

    All of this still leaves us high and dry on Cell versus XB360 CPU.

    Jawed
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Dave, that's precisely my point. That's why I highlighted that specific nonsense comparison.

    Jaws is determined to compare architectures with absolutely no regard for their respective architectures.

    Jawed
     
  13. dukmahsik

    Banned

    Joined:
    May 19, 2005
    Messages:
    994
    Likes Received:
    9
    I appologize for being such a noob here, but where is this article from Dave? thanks much. :lol:
     
  14. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Let see...128 * 0.4 GHz ~ 51.2 GSops/sec

    It's another component operation specific to pixels as I've already pointed out to you earlier in the thread with your 'page 3' reference. And if your going to use 'components' again, you've missed out the 'vertices' too for the total...

    RSX ~ 136 shader ops per second ~ 136 *0.55 ~ 74.8 GSops/sec

    Considering you've also missed out 'vertex' ops too from the '51.2 GSop', it's nothing near "half" of what was claimed and would infact be similar.

    Perhaps you should try disputing my numbers on the first page instead of clutching at straws and throwing random numbers into the mix. But it doesn't really matter now because you've answered my question from my previous post....


    From papers I've read, they suggest nVidia would move to a complete MIMD architecture. Also for Xenos, MIMD would suit it's GPGPU nature and would make sense.

    Yep. But we have no low level details for RSX yet...they could still share similarities...

    Been discussed to death on these forums...but I'm pretty clear on them...

    I suggest you read/understand the whole thread first before making any further ignorant comments! :roll:
     
  15. Carl B

    Carl B Friends call me xbd
    Legend

    Joined:
    Feb 20, 2005
    Messages:
    6,266
    Likes Received:
    63
    Not out yet.

    I have a feeling you'll have no way of not knowing once he actually posts it. :wink:
     
  16. dukmahsik

    Banned

    Joined:
    May 19, 2005
    Messages:
    994
    Likes Received:
    9
    hehe thanks, see you on txb :wink:
     
  17. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    136 shader operations per cycle is what, exactly?

    24 pixel pipelines doing 4 operations?

    plus

    10 vertex pipelines doing 4 operations?

    Should we be making allowances for texture blending? Texture address calculation? What else?

    Unluckily we have two different claims from ATI for Xenos, 48Gsops (two ops per cycle) and 120Gsops (five ops per cycle).

    Which are you going to use in your comparison?

    Why?

    Jawed
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    In the code I linked to earlier:

    http://www.beyond3d.com/forum/viewtopic.php?p=327176#327176

    which in SM3 is 102 instructions, at an average of 2.2 instructions executed per cycle. A 6800 Ultra would shade 137 million pixels per second.

    Assuming RSX operates in the same way, at 550MHz across 24 pipelines, this shader would shade 282 million pixels per second.

    The same shader executed on Xenos would need to operate at 1.2 instructions per cycle to shade 282 million pixels per second.

    But I have no idea if Xenos could run this shader at more than 1 instruction per cycle.

    Jawed
     
  19. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    That metric represents exactly for Xenos, what it represents for RSX. You will find this in the Xenon 'leak' and my calculations/links on the first page, i.e..

    Or

    RSX ~ 136 shop/cycle
    Xenos ~ 96 shop/cycle

    These numbers/ metrics on there own are meaningless without further parameters. But both numbers also cross-reference with other metrics that I calculated on the first page without any conflicts. So they are consistent but need further analysis.

    All this is essentially telling us (with the per CYCLE) is the number of execution units that run shaders, i.e. the number of shader execution units. It is not telling us the amount of work/computation being done per clock cycle nor the precision of the data being worked on.

    E.g. it's not differentiating between 1-way, 2-way, 3-way or 4-way execution units. i.e. all of those shops/cycle can be from 136 scalar units or 136 vector units or a combination. Also these vector units can be vec(2-4) units! So we can't go into any further detail without further information.

    However, from *official* MS spec from xbox.com,

    Dave also mentions that they are 48 5D ALUs,

    http://www.beyond3d.com/forum/viewtopic.php?p=526112#526112

    The 'leak' above also mentions the 48 ALUs consisting of a vector + scalar unit,

    ===> Xenos> 48 vec4 + 48 scalar units> 96 Shop/cycle

    The 4-way vector components of vec4 units are not included in the definition.

    ===> RSX> x + y units ~ 136 Shop/cycle*

    * we need more info to determine more detail...and this is deduced from the DOT products information below.


    From this information, we get, 24+10~ 34 Dot products per cycle ~ 18.7 GDot/sec*

    *Vec4 unit is assumed to provide a 1 Dot/cycle and this means Dot product per cycle is an 'integer' number, e.g.. 34 Dot/cycle.


    And more importantly, falls way short of the claimed CELL+RSX ~ 51 GDot/sec

    Taking the contribution of DOT products from CELL, either from 7SPUs or 7SPUs+1VMX, we get,

    RSX ~ 25.4 OR 28.6 GDot/sec

    Which one is accurate?

    25.4/0.55 GHz ~ 46.18 Dot product/Cycle?

    or

    28.6/0.55 GHz ~ 52 Dot product/Cycle?


    The 46.18 Dot/cycle is rejected in favor of the 52 Dot/Cycle because it's not an 'integer' from above assumption.

    From our earlier definition of a Shop/cycle, this then suggests 52 Vec4 units contribute to RSX's 136 Shops/cycle.

    RSX~ 52 Vec4 units + 84 units not contributing DOT products.

    http://www.beyond3d.com/forum/viewtopic.php?t=23228&start=0

    http://www.beyond3d.com/forum/viewtopic.php?p=531473#531473

    ===>RSX~ 28.6 GDot/sec

    Jawed-RSX'~ 18.7 GDot/sec is way short of my (Jaws*) RSX~ 28.6 GDot/sec and does not have enough Dot product computation to match the CELL+RSX claim. Therefore 18.7 GDot/sec and it's pipeline arrangement is unlikely.

    * Yes, as if we don't have enough confusion, Jaws and Jawed is now officially confusing the shit out of me too(Jaws)! :p


    From the above definition of a Shader operation, we don't include these metrics in Shop/cycle numbers. However, more than likely, these metrics have been included in the 'total system TFLOP' metric.

    The 48 GShop/sec is used for Xenos here and I've cross-referenced that for valididty in my calculations for 96 Shop/cycle on the first page.

    The 120 GShop/sec for Xenos is greater than BOTH CELL+RSX ~ 100 GShop/sec. We can reject the 120 GShop/sec number for Xenos here for being inconsistent. Even though that '120' number is a valid number, the 'unit' of the metric is not consistent. It would be more accurate to call it Xenos~ 120 Billion component (5D) operations per second and leave out 'shader' from the metric. And also, 120*2FMADD ~ 240 GFlop/sec, (32bit because of SM3.0).

    If it's not obvious still, it's Xenos ~ 48 GShop/sec from *offcial* spec. doc.! :p

    See above. You have to use consistent units of measurement when comparing throughout.

    Taking this consistency, the following was derived,

    RSX ~ 136 Shop/cycle ~ 52 Vec4 units + 84 units NOT contributing Dot products.

    Xenos ~ 96 Shop/cycle ~ 48 Vec4 + 48 Scalar units.


    Those 84 units for RSX can ALL be scalar for all we know or ALL be Vec3. So the measure of computation performed per cycle can vary. In that sense the aforementioned, 'component operation per cycle' metric will give more detail. But we don't have that for BOTH systems.


    [​IMG]

    From what I've derived above,

    RSX ~ 136 Shop/cycle ~ 52 Vec4 units + 84 units NOT contributing Dot products.

    I'd be guessing now on the following, usually, scalar units are paired with Vec units so,

    RSX ~ 136 Shop/cycle ~ 52 Vec4 + 52 Scalar + 32 Other units

    32 Pixel Shaders ~ 32 Vec4 + 32 Scalar + 32 Other units*
    20 Vertex Shaders ~ 20 Vec4 + 20 Scalar

    *Other units can be Vec3 or Scalar etc...

    Looking at that RSX image above, I don't think we can extrapolate pipelines and ISA from what we know of NV40 to RSX, any more than what we know of R420 to Xenos. In addition both Xenos and RSX are likely to have all their 'legacy' PC logic removed for consoles. E.g. SM1.0, SM2.0, etc. as they don't need to support the extra code paths like PC games. On second thoughts, not sure about Xenos now with B/C with Xbox-NV2a and PC games with XNA?

    In any case, both Xenos and RSX will have assembly level, to-the-metal access on both consoles, irrespective of whether Xenos uses SM3+ or RSX uses OpenGL|ES.

    Looking at the Xenon 'leak' text above, it suggests that one Xenos ALU ~ vec4 + Scalar, and those ALUs can dual issue to a Vec4 and a scalar unit. So,

    Xenos ~ each ALUs(vec4+ scalar) can dual issue per cycle
    48*2~ 96 instructions per cycle
    96*0.5 Ghz ~ 48 Billion INSTRUCTIONS per second*

    * Not SHADER ops per second and so another number to get confused with! I'll stop right here! :p
     
  20. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Doubtful.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...