Its a shame Rene knew shit about Xenos before he spoke. A lot of wat he said was actually wrong.
DaveBaumann said:Hopefully I'll have some more up later next week.
Jawed said:We're still waiting for a detailed comparison of the theoretical shader performance profiles of NV40 and R420.
And we've had them in our "labs" now for a year...
Jawed in the earlier link said:...The leak for XB360 claims 96G ops per second. It seems to me that the leak is counting ops in the same way that NVidia does, rather than how ATI does.
So...
Leak said:The Xenon GPU is a custom 500+ MHz graphics processor from ATI. The shader core has 48 Arithmetic Logic Units (ALUs) that can execute 64 simultaneous threads on groups of 64 vertices or pixels. ALUs are automatically and dynamically assigned to either pixel or vertex processing depending on load. The ALUs can each perform one vector and one scalar operation per clock cycle, for a total of 96 shader operations per clock cycle. Texture loads can be done in parallel to ALU operations. At peak performance, the GPU can issue 48 billion shader operations per second.
Jawed said:This talk by ATI in London "confirms" a number of things:
http://www.driverheaven.net/showthread.php?t=75843
- Xenos cannot create vertices (no tesselation) (13:20)
- the ALUs are not organised into quads (or any other group size) (14:30)
- 120 Gsops (12:23)
NV40 has 2x the SM2.0/3.0 ALU capability of R420, which should overhaul its core-clock disadvantage. But it doesn't. etc.
DaveBaumann said:NV40 has 2x the SM2.0/3.0 ALU capability of R420, which should overhaul its core-clock disadvantage. But it doesn't. etc.
Saying something has "2 ALU's" doesn't mean anything in either cases. R420's primary ALU has all the instructions that is supports, whilst the 2 ALU's of NV40 has a distribution of instructions between the two - this means that it can opportunistically dual-issue some cases, but not necessarily two instructions of the same type.
Jawed said:Jaws, we have reasonably detailed architectural diagrams for NV40 and R420 plus explanations on how they work. Care to explain, in detail, how they perform against each other, based purely on theory?
In other words, can you convert the theoretical capabilities of these two architectures into a realistic prediction of the performance of them?
Jaws said:...
I'm only going to provide 'normalised' total system metrics compared to the above image as this is all we can compare across both systems at the moment until more details are released.
...
Jawed said:What I've learnt over the last few days is this is a road to nowhere. I'm aghast that you still think it's worth pursuing this.
Jawed said:I'm quite happy to speculate on the architectures, but I'm going to stick to throwing around stupid performance numbers for the sake of taking the piss out of the marketing. ATI's now counting 120Gsops for Xenos. It's now time for NVidia to counter that.
DaveBaumann said:NV40 has 2x the SM2.0/3.0 ALU capability of R420, which should overhaul its core-clock disadvantage. But it doesn't. etc.
Saying something has "2 ALU's" doesn't mean anything in either cases. R420's primary ALU has all the instructions that is supports, whilst the 2 ALU's of NV40 has a distribution of instructions between the two - this means that it can opportunistically dual-issue some cases, but not necessarily two instructions of the same type.
Jawed said:Maybe you want to look at page 13 of the PDF I linked:
- Pixel shader operations/pixel 8
- Pixel shader operations/clock 128
Jawed said:These are the claimed numbers for NV40.
51.2Gsops. Roughly half of what's claimed for RSX.
Jawed said:How much more black and white do you want?...
Jawed said:...
I wonder if Xenos will be 48-way MIMD, i.e. each ALU can be running a different shader. I'm sorta doubtful, to be honest, because that's an awful lot of decode-logic overhead - though I admit to not knowing what that amounts to in percentage terms. I aint got the foggiest!
Jawed said:...
RSX and Xenos are looking as incomparable as NV30 and R300 did a few years ago.
Jawed said:...
All of this still leaves us high and dry on Cell versus XB360 CPU.
Jawed said:Jaws is determined to compare architectures with absolutely no regard for their respective architectures.
dukmahsik said:I appologize for being such a noob here, but where is this article from Dave? thanks much.
xbdestroya said:dukmahsik said:I appologize for being such a noob here, but where is this article from Dave? thanks much.
Not out yet.
I have a feeling you'll have no way of not knowing once he actually posts it.
Jawed said:136 shader operations per cycle is what, exactly?
Jaws in the other thread said:1 shader op per cycle ~ 1 shader execution unit
1 shader execution unit ~ vector unit or scalar unit
e.g. ALU = 1 scalar unit + 4-way SIMD unit ~ 2 shader ops per cycle
Leak said:The Xenon GPU is a custom 500+ MHz graphics processor from ATI. The shader core has 48 Arithmetic Logic Units (ALUs) that can execute 64 simultaneous threads on groups of 64 vertices or pixels. ALUs are automatically and dynamically assigned to either pixel or vertex processing depending on load. The ALUs can each perform one vector and one scalar operation per clock cycle, for a total of 96 shader operations per clock cycle. Texture loads can be done in parallel to ALU operations. At peak performance, the GPU can issue 48 billion shader operations per second.
MS spec said:48-way parallel floating-point dynamically scheduled shader pipelines"
Dave said:ALU's are 5D - Vec4+Scalar
Jawed said:24 pixel pipelines doing 4 operations?
plus
10 vertex pipelines doing 4 operations?
Jawed said:Should we be making allowances for texture blending? Texture address calculation? What else?
Jawed said:Unluckily we have two different claims from ATI for Xenos, 48Gsops (two ops per cycle) and 120Gsops (five ops per cycle).
Jawed said:Which are you going to use in your comparison?
Jawed said:Why?
Jawed said:In the code I linked to earlier:
http://www.beyond3d.com/forum/viewtopic.php?p=327176#327176
which in SM3 is 102 instructions, at an average of 2.2 instructions executed per cycle. A 6800 Ultra would shade 137 million pixels per second.
Assuming RSX operates in the same way, at 550MHz across 24 pipelines, this shader would shade 282 million pixels per second.
The same shader executed on Xenos would need to operate at 1.2 instructions per cycle to shade 282 million pixels per second.
But I have no idea if Xenos could run this shader at more than 1 instruction per cycle.
RSX ~ 136 Shop/cycle ~ 52 Vec4 + 52 Scalar + 32 Other units