Xenos as Physics Processor?

Discussion in 'Console Technology' started by expletive, Oct 6, 2005.

  1. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    *cough*

    Xenos has 3 pipes, so it must suck compared to R520!

    :!:

    Geez, it's like a time warp here...

    How about comparing R520 32bit programmble shader Flops to Xenos?


    R520 estimate for XT (can someone veryfy...?),

    VS ~ [8 x Vec4-madds x 8 Flops/cycle + 8 x scalar-madds x 2 Flops/cycle ] x 0.625 GHz
    ~ 50 GFlops

    PS ~ [16 x vec3-adds x 3 Flops/cycle + 16 x vec3-madds x 6 Flops/cycle + 16 x scalar-adds x 1 Flop/cycle + 16 x scalar-madds x 2 Flop/cycle] x 0.625 GHz

    PS ~ [48+96+16+32]x 0.625 GHz
    PS ~ [192]x 0.625 GHz
    PS ~ 120 GFlops

    R520 XT ~ 50 + 120 ~ 170 Gflops, 32bit peak
    Xenos ~ 216-240 Gflops, 32bit peak

    That's nowhere near twice!
     
  2. expletive

    Veteran

    Joined:
    Jun 4, 2005
    Messages:
    3,592
    Likes Received:
    69
    Location:
    Bridgewater, NJ
    I'm not sure how it *could* be done, but do you account for the higher efficency and EDRAM mentioned by Jawed in these calculations?

    J
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    What makes you think the Xenos pipeline is less capable? We don't have a detailed description of the Xenos pipeline :cry:

    I'd like to provide a detailed transistor audit of the two - but I can't. I'm trying in various other threads, but it's slow work.

    So it's just a guess really. I think you're over-emphasising the ALU counts and forgetting that architecture counts for a lot now, particularly when you have out of order scheduling and unified shading.

    Jawed
     
  4. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    You won't be able to determine efficiency from these numbers as these are peak. Real world benchmarks would determine efficiency...
     
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Jaws, until you can show that there's no mini-ALU in Xenos's pipeline, you're out on another one of your pseudo-science limbs.

    I don't see any reason why Xenos doesn't have the mini-ALU. So if you can come up with a convincing argument...

    Jawed
     
  6. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    Err...this is nothing new. The peak shading power i.e. 32 bit programmable GFlops has been known for ages. It's 216-240 Gflops depending on who you ask. It's irrelevant if it has mini ALUs or not as they would've been included in those figures. The peak R520 and Xenos compared is nowhere near twice from your hyperbole claim.

    Back to basics, comparing pipe numbers is useless with these architectures.
     
  7. Bobbler

    Bobbler Shazbot!
    Veteran

    Joined:
    May 22, 2005
    Messages:
    1,827
    Likes Received:
    29
    Location:
    Minneapolis, MN
    Simple logic dictates that -- what we've seen from the Xenos hasn't been 2x the capabilities of R520 (the "devs haven't had the time!" card doesn't really work -- if Xenos was truly 2x, or anywhere near, the power it would be doing a lot more than 720p at 30fps with 2x AA), and "2x" the power from 2/3 the transistors is a bit absurd (and amazing if true on some planet). Even with ~60% efficiency vs 100%, that would only account for the transistor budget being reduced, not a 2x power gain. It just seems transistor for transistor the theorectical power is going to be about the same -- there is no magic wand to get 2x the capabilities out of the same transistor budget (especially when you have some of the best engineers working on it). It just seems absurd that anyone would think Xenos would be substantially more powerful than stuff in the same generation (or availible in the same 6month window -- R520, G70, RSX) -- I'll grant the efficiency card making up for the transistor difference (and maybe a bit extra even), but I cannot see where you get the colossul power difference outside of that. Logic dictates that 48 "pipes" in 232m transistors (with ~15% redundancy by your calculations) shouldn't beat a ~320m transistor monster (at a higher clockspeed as well)... especially when its from the same company and engineering talent.

    Don't get me wrong -- I would love to be wrong in this case (who wouldn't love a system you could get in 1.5 month that has 2x the power as most high end gpu you can't even buy for another 3 weeks??), but I just can't believe it. I think part of it might because in the past 20 years we've never had a power increase of 2 fold in the same transistor count (often much more like 10-20% if you're lucky) in a given field -- the technology field usually works in evolutions, not revolutions (I'd call 2x the performance increase in the same transistor count -- counting efficiency as evening the transistor counts -- a revolution). Call me cynical though, please!
     
  8. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Like I said, until you can show that Xenos doesn't have a mini-ALU you're barking up the wrong tree.

    The mini-ALU is a integral part of the GPU pipeline. It's just not normally counted. I don't know why, but there it is.

    Until recently no-one outside of ATI apparently understood that there was the extra ADD capability in R3xx...R4xx due to the mini-ALU.

    Have you been counting the mini-ALU in RSX? You do realise it can MUL and ADD (perhaps MAD?), don't you? And that there's two of them?

    Acert dug this up earlier today:

    http://www.hardspell.com/newsimage/2005-6-21-16-10-14-654986702.gif

    I don't see any mention of the mini-ALUs on there. Whoops Jaws, back to square one :twisted:

    Also, while you're at it, would you care to explain how a 170GFLOPs X1800XT is as fast as a 313GFLOPs 7800GTX? Or faster? :shock:

    Jawed
     
    #48 Jawed, Oct 7, 2005
    Last edited by a moderator: Oct 7, 2005
  9. j^aws

    Veteran

    Joined:
    Jun 1, 2004
    Messages:
    1,992
    Likes Received:
    137
    *shakes head*

    Stop clinging onto your mini alu theory. Don't you think if ATI/MS could claim some more PR GFLOPS they would! They have NOT. It's 216-240 GFLOPS 32 bit. PEEEEAAAAKKKKK.

    Like I already said, it would be included if it was valid!

    A) THIS has NOTHING to do with RSX.

    B) IF you read by post, you'd realise 32 BIT programmable PEAK flops being explicitely stated.

    *Shakes head again*

    What has that diagram got to do with my post?

    They haven't stated what BIT! That diagram has been analysed many times on this forum...AND their mini alus are the SCALAR units!

    See above. They include 16bit Flops with 32bit flops.

    Geez, talk about going over old ground again...this forum has no memory...

    Just to reiterat again, NO, Xenos is nowhere near TWICE the R520 from another of your baseless, hyperbole claims.

    Get back to reality dude...
     
  10. Acert93

    Acert93 Artist formerly known as Acert93
    Legend

    Joined:
    Dec 9, 2004
    Messages:
    7,782
    Likes Received:
    162
    Location:
    Seattle
    Anyone dealing with theoretical numbers with any sense of due diligence would note that theoretical peaks are just that, theoretical, and they should be taken with a large grain of salt. Architecture and its impact on real world utilization are the most important aspect of any design. It is what you can use that is not important, not what is there. If you cannot realistically utilize performance on the chip then for all practical purposes counting it as some fantastical metric of what chip is better is really useless.

    Interestingly, ATI has been quoted as saying current GPUs architectures, including their own, are only 50-70% effecient.

    50-70% of 170GFLOPs is 85-119GFLOPs of "real world utilization". And that does not begin to take into consideration of the ROPs.

    Last time I checked 119GFLOPs was about half of 240GFLOPs.

    This assumes that the bottleneck on titles is the GPU. I would say this assumption is wrong. We have already heard of a number of cases of developers offloading tasks to the other CPUs and the framerate improving dramatically.

    Xenon is a tricore in-order PPC chip with shared cache. This is a very different environment than the PC/Xbox--where most of the devs come from--which had a single large OOO x86 processor and on the PC had a bit more cache per core.

    As for their kits, they got final Beta Kits in Augest as confirmed by IGN's recent editorial and there were numerous delays after E3 in getting material and transition kits out.

    Almost all the launch titles are Xbox or PC ports to some degree. Xenos, like any specialized hardware, needs to be taken into consideration in the design stages to get the best performance.

    The fact a number of Xenos features, like hardware tesselation, are not being used by many devs is kind of indicative of the state of affairs: They don't have the time to create custome engines from the ground up, testing what works and does not work with the real hardware and then choosing the right engine path to exploit the strengths of the architecture. We just are not seeing that for obvious reasons.

    Xenos ran the ATI R520 demos quite well. I think as developers transition to shader heavy code (which is Xenos' forte) that takes its unique design features into consideration that we will see it perform well.

    Judging any console on launch titles is kind of scary. They have been developers with paper specs in hand and not much else. I still remember the PS2 launch which anyone remembers would not be classified as a launch that really showcased what was later to be produced from the system.

    I guess the proof on who is right will be in the future games in late fall 2006 and into 2007 when we see the first games written for the Xbox 360 architecture from the ground up appear. In this regards I must give some praise to Sony to partnering with an IHV that had SLI with GPUs of similar features to design on. This gives PS3 devs a good heads up on the architecture to get the most out of their launch titles.
     
  11. LunchBox

    Regular

    Joined:
    Mar 13, 2002
    Messages:
    901
    Likes Received:
    8
    Location:
    California
    But isn't the Xenos more pixel output limited than the R520???
    Since the Xenos technically have 8 output pipes???

    500Mhz(8 pipes) = 4 Gigapixels/sec
     
  12. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    Pixel throughput is no longer an issue, especially with a fixed resolution. It's all about math now.
     
  13. ihamoitc2005

    Veteran

    Joined:
    Sep 21, 2005
    Messages:
    1,181
    Likes Received:
    15
    GPUs

    Assuming RSX is overclocked G70 using your comparison method:

    7800GTX
    PS = 139.3-195.02 Gflops
    Total = 156.5-219.1 Gflops
    Peak = 313.4 Gflops

    RSX
    PS = 178.2-249.48 Gflops
    Total = 200.2-280.28 Gflops
    Peak = 400.4 Gflops

    Xenos (assuming 100% efficiency)
    Total = 240 Gflops

    Also, why is R520 transistor count so high? If Xenos = 240Gflops with 232M, 320M if only 170Gflops peak sounds inefficient no? Maybe some information missing?

    Has anyone read this?

    http://www.anandtech.com/video/showdoc.aspx?i=2552&p=10
     
  14. Nemo80

    Banned

    Joined:
    Sep 5, 2005
    Messages:
    128
    Likes Received:
    3
    Full ACK. Even Microsoft does not claim that it's that fast (only faster than 2x6800 U iirc). And they cartainly would if there was any indication that it is the case.

    And once again, the 48 "Shader units" are nowhere close to the performance of a fixed pixel or vertex pipeline (per "unit), that's the tradeoff of an unified architecture.
     
  15. The GameMaster

    Newcomer

    Joined:
    Feb 9, 2005
    Messages:
    109
    Likes Received:
    1
    In regards to those benchmarks it shows the classic strengths of the nVidia Geforce cards and that they have a superior implementation of OpenGL not to mention the benefit of the ultrashadow technology that is in use in Doom3. In any case...

    I was wondering about the transistor count also... and looking at the diagrams and die shot it occurs to me that there are two things that may be taking up a lot of that transistor count... one being that "General Purpose Register Array" and the other being that "Render Back End".

    But yes I would say that there is some information that is missing.

    I would have to disagree... a shader unit is a shader unit and both the Geforce 7800 and XENOS uses a Vec4+scalar shader unit, what is different is the arrangement of those shader units between XENOS and the R520 as well as the NV50 GPUs.
     
  16. ihamoitc2005

    Veteran

    Joined:
    Sep 21, 2005
    Messages:
    1,181
    Likes Received:
    15
    R520 and G70 differences

    If R520 VS & PS units similarly capable as G70 VS & PS, then R520 = 320Gflops no? If similar to G70, then @ 50-70% average utilization R520 = 160-224Gflops

    Peak
    VS=50 Gflops
    PS=270 Gflops

    But if not similar what are differences? If VS and PS units very different, then for what functions are so many transistors utilized?

    As for USA, most of what is called "efficiency", or better termed "utilization", in USA model merely resistance to slow-down when vertex or pixel shader load increases but not both. When both increase, then slow-down inevitable. No substitute for size. USA merely makes small size less of a liability no?

    I think better thread handling should make peak performance more likely. Maybe that is why sometimes R520 even outperforms 7800GTX in some situations, although it could also be because those games need more vertex shader power and R520 VS much faster than G70 VS. OTOH, sometimes even 7800GT outperforms 625mhz R520. This I do not understand. Maybe R520 PS units very different from G70 PS units afterall?

    7800GT @ 400mhz
    VS= 14-19.6 Gflops (using 50-70% quote from below post)
    VS = 28 Gflops (peak)
    PS= 108-151.2 Gflops (using 50-70% quote from below post)
    PS = 216 Gflops (peak)
    Total= 122-170.8 Gflops (using 50-70% quote from below post)
    Peak= 244 Gflops
     
  17. ihamoitc2005

    Veteran

    Joined:
    Sep 21, 2005
    Messages:
    1,181
    Likes Received:
    15
    Thank you for furthering my understanding. Seems API makes a big difference. What are implications of larger general purpose register and render back end?
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Like I said, no-one counts mini-ALUs. NVidia hasn't counted the mini-ALUs in NV40/G70/RSX.

    Nope, I've never seen it included. It might be because mini-ALUs have such limited applicability and are hardly ever used.

    I just wanted to show you how, according to your crazy pseudo-science a G70 which has twice the GFLOPs (not including mini-ALUs) of X1800XT (including mini-ALUs) is not twice as fast.

    When will you get over the fact peak GFLOPs are meaningless. You've been peddling this nonsense for 6 months now.

    Nope, the scalar part of the vec4+scalar (VS) or two vec3+scalar (PS) is not the mini-ALU. You really need to pay attention.

    FP16 normalise is not a function of the two mini-ALUs in G70/RSX pipeline. It's an entirely separate function.

    Well if you insist on polluting discussions with irrelevant GFLOPs nonsense...

    The first evidence will come with R580...

    Unification of the shader architecture is going to increase utilisation further.

    Xenos will be texture-bandwidth limited to the same degree as R520/R580 as both architectures have the same texturing capability (although R520/580 may have 20-40% faster caches). So any texture-limited games will not show any improvement in Xenos.

    But games that are not texture bandwidth limited (going forwards this should be the norm for next-gen games) will easily get 100% faster in Xenos over R520. The combination of unified shader efficiency and twice the total pipelines will see to that.

    Jawed
     
  19. ihamoitc2005

    Veteran

    Joined:
    Sep 21, 2005
    Messages:
    1,181
    Likes Received:
    15
    twice the pipelines?

    Xenos has twice the pipelines as R520 but 2/3 transistor count?
     
  20. LunchBox

    Regular

    Joined:
    Mar 13, 2002
    Messages:
    901
    Likes Received:
    8
    Location:
    California
    I think it's because they took out the unimportant parts...

    like what Nvidia is rumoured to be doing with the RSX...

    like take out the video accelerator thingamajig...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...