pjbliverpool said:
Umm... why? R420 has 16 pixel shaders each with two Vec4 ALU's
Only ONE Vec4 ALU - the other ALU is some kind of mini-ALU - there is no dual-issue in R420 like there is in NV40/G70 (well that's the current theory - per-clock performance comparisons of R420 and NV40 tend to agree, as far as I can tell).
and then another 6 vertex shaders with 1 Vec5 ALU each (I know they arn't single ALU's but for the sake of simplicity...)
I'm quite happy calling them Vec5 ALUs, for the sake of this comparison - as long as we remember the scalar part of both VS and PS ALUs provide the option for a co-issued scalar instruction.
So that gives us (for the sake of simplicity) 38 Vec 4 ALU's vs 48 Vec5 ALU's in the R500. So its not even close to 100% faster.
Teehee, well as you can see I'm working on the basis of R420 being 6VS+16PS = 22. R420 is effectively one ALU per pipe with the ALU being capable of co-issues. There are some extra co-issue capabilities in there - e.g. it's documented that R420 can co-issue 5 scalar operations in a VS ALU.
Where do those efficiency figures come from?
ATI says that current GPUs are 50-70% efficient - I plumped for 60% for the sake of simplicity. 95% is Xenos's efficiency according to ATI.
As I understand it, the USA is more efficient because all the "pipelines" are always active. In a normal architecture which is pixel shader rather than vertex shader limited, you would have.. what... 33% of your vertex shaders idle? In the R420 that 36 ALU's active and 2 inactive (from the vertex shaders), thats a lot greater than 60% efficiency.
No, you're missing the fact that Xenos never waits for dependent instructions, whether that's dependent texture operations or simply serially-dependent ALU instructions. There's also no waiting for surface texturing (e.g. with anisotropic filtering) of pixels, regardless of batch size - even 1 pixel triangles can be textured without a stall.
So Xenos is in the ballpark of 150% of the performance of R420. And that's purely in shader code.
Which would be far higher than any graphical leap in the same time period ever.
The concensus is that Xenos is 2 generations ahead, not one. Less than 150% over two generations would be embarrassing, to say the least.
Anyway, Xenos doesn't have a "base". You can't say its starting point is R420.
Given that a console has power, heat, and cost restrictions compared to the PC, and ATI have been "apparnetly" unable to match that performance in another one of their parts without those restrictions, wouldn't you agree that these performance predictions are a little optimistic?
No. The power/heat/cost thing is taken care of by Xenos being fairly slow on 90nm, and not being a huge device. Bear in mind that back when M$/ATI started XB360, they were prolly expecting it to launch on 65nm tech. 90nm is way "late" by guesstimates from way back when. Also, we don't know what proportion of Xenos is redundant (for yield purposes) - which also affects heat. Is the quoted transistor count for Xenos including the redundant parts?...
R600 is going to be a 90nm part too. It is prolly 6-9 months behind Xenos (Vista was going to be mid-2006, not end, for quite a while). It'll have to be much bigger than Xenos (parent die) due to full SM4 functionality and prolly requiring backward compatibility (DX8 fixed-function hardware, etc.).
Jawed