Vince said:
I disagree. I know that nAo and Faf are both interested in the aspect
I notice you leave me off the list, prehaps because you don't like my thoughts on the subject? Which is kind of ironic...
Vince said:
and I've been anything but silent on talking about how an S|APU is more than equivalent to an X2 ALU in it's ability to mask latency via multiple outstanding transactions and the arbitrary flexibility of the quering system implimented in the SPU concerning execution of these groups, either autonomously based on priority or by a deterministic means such as arbitrary command rules.
I've been through (several times), the difference between them but you choose to ignore it as it doesn't fit you world view. But lets try one more time.
A SPU executes a single instruction stream, it issues DMA request which are placed into an async overlapping queue. When the program requires the data, it either stalls if its not ready or continues.
A GPU works the other way, you have N lots of data and N-M execution units. Its also has a predicted read system, similar to the DMA in Cell is working in the background, but whenever it would stall, a switch occurs to another peice of data not currently being worked on. The important thing to notice, is that its a data centric model, but an ALU/SPU centric model is instruction centric.
You also have the middle ground of multi-thread stall switching CPUs, usually with 2 threads, which aren't enough to hide all latency but some.
The key factor is something else to do whenever there is a stall. Stalls occur not only on memory reads, but also things like divides etc.
We know clever programming can sometimess eliminate some stalls, but without an effective stall reduction mechanism, real-world performance will suffer.
Vince said:
nAo has been quite active on these topics especially. It's just that this debate is never followed up on and never accepted or understood as such -- it's much to easy to revert to dumb arguments... which brings me to an example:
I find it highly ironic that people can take a throw-away PR-line about being designed for a game console and turn that into any form of substantive argument. And what's even more ironic is the recurrent argument about Cell being more general purpose (and inferred as more ineffecient) which just flies in the face of everything we know about it about the achitecture's inclusion of SPCs... its like people just don't learn. Actually, I think that is the case.
And the same thing can be said of you, you seem to ignore the worries that all developers have expressed (Marco, Faf and I) with regard SPUs latency hiding techniques. Thankfully with a NVIDIA GPU now doing at least the pixel shading, lots of these concerns have dimished. The SPUs look to be a very good at specialist vector ops, just as the GPUs ALU are good at specialist vector ops. Having this kind of power right next on CPU is a great idea, its clear that the SPU design will give significant vector processing power in an much easier to use format than trying to coax a GPU into that work.
You see you don't have to be one side of the fence or the other, I like both the PS3 system design and the Xenon system design. Both have strengths and weaknesses, your refusal to acknoweldge any good points about non Cell architectures, however makes the discussion pointless.
I would love to have a good discussion of the advantages and disadvantages the PS3 and Xenon architecture poses, but if its just going to descend into "its great cause Sony are the best" than why bother?