PS3 vs X360: Apples to Apples high level comparison...

Panajev2001a said:
Ok, fair assumption :D.
Another assumption one can make if we don't want to believe nvidia extended their pixel pipeline design is they count the 2 (indipedent) co-issued Dot2 ops the first ALU can execute and they summed them with Dot4 from VS pipelines.
I don't even want to consider this option :)
 
nAo said:
Panajev2001a said:
Ok, fair assumption :D.
Another assumption one can make if we don't want to believe nvidia extended their pixel pipeline design is they count the 2 (indipedent) co-issued Dot2 ops the first ALU can execute and they summed them with Dot4 from VS pipelines.
I don't even want to consider this option :)

That option would still make 2 full Dot4's/cycle because if they counted them all as just Dot products then how could we comparatively count the Dot Products coming from the Broadband Engine ?
 
Panajev2001a said:
That option would still make 2 full Dot4's/cycle because if they counted them all as just Dot products then how could we comparatively count the Dot Products coming from the Broadband Engine ?
We can't, it doesnt' make sense, that's why I refute this hypothesis.
 
Where is everyone getting the Cell chip dot-product information from? i didn't think that the Cell had a dotproduct function? I'm confused.
 
blakjedi said:
Where is everyone getting the Cell chip dot-product information from? i didn't think that the Cell had a dotproduct function? I'm confused.
SPEs and PPE's VMX unit haven't a dot product instruction AFAIK, but four vec4 dot products can be calculated at the same time with 4 fmadd instructions, so the average troughput it's one dot4 per clock cycle.
To be fair things are more complex than that as on SPEs fmadd instructions have a 6 cycles latency AFAIK..
 
nAo said:
blakjedi said:
Where is everyone getting the Cell chip dot-product information from? i didn't think that the Cell had a dotproduct function? I'm confused.
SPEs and PPE's VMX unit haven't a dot product instruction AFAIK, but four vec4 dot products can be calculated at the same time with 4 fmadd instructions, so the average troughput it's one dot4 per clock cycle.
To be fair things are more complex than that as on SPEs fmadd instructions have a 6 cycles latency AFAIK..

So in other words it does have the equivalent of a dotproduct function... just with fairly high latency OK. Ok so then when you say average throughput is one dot4 per clock cycle are you talking per SPE or the entire chip?
 
A dot4 per cycle per SPE and even PPE's VMX unit should provide one dot4 per cycle.
PS3 CPU would peak at 8 dot4 per cycle.
 
blakjedi said:
nAo said:
blakjedi said:
Where is everyone getting the Cell chip dot-product information from? i didn't think that the Cell had a dotproduct function? I'm confused.
SPEs and PPE's VMX unit haven't a dot product instruction AFAIK, but four vec4 dot products can be calculated at the same time with 4 fmadd instructions, so the average troughput it's one dot4 per clock cycle.
To be fair things are more complex than that as on SPEs fmadd instructions have a 6 cycles latency AFAIK..

So in other words it does have the equivalent of a dotproduct function... just with fairly high latency OK. Ok so then when you say average throughput is one dot4 per clock cycle are you talking per SPE or the entire chip?

He is talking about each SPE.
 
PC999, as PSINext staff I might as well take point here and ask what it is exactly you want looked at? That's two pages of posts - some of them pretty long - and I think it would help if your question consisted of more than just 'take a look,' since there's two different conversations going on there.
 
P1010276.jpg


Well some more thoughts...with my earlier derivation that 52 Dot/cycle is required from the above, which vec4 units can provide and a further 84 'other' units are needed to account for 136 Shop/cycle. And looking at that diagram again and the fact that there is no distinction between pixel and vertex units but only 'vector ALU' and SFUs, the following is also a possibility, especially as the 'shader instruction processor' that seems to be issuing to *all* those units, i.e.

RSX ~ 136 Shops/cycle ~ 52 vec4 + 84 SFU ~ 2*(26 vec4 + 42 SFU)

Kinda like a 'unified' shader units that can execute either vertex or pixel instructions and there are 'two pools' of these...?
 
blackjedi said:
So in other words it does have the equivalent of a dotproduct function... just with fairly high latency OK.
6cycles instruction latency in a 3+Ghz chip is actually Very low - not high.
For that matter I kinda expect PPE/XCPU instruction latencies to be higher then that.
 
Back
Top