Shifty Geezer said:
Shaderguy :
1 : How could MS look at Cell and base their decisions on it, when Cell's design was unknown during XeCPU's development?
Because Sony and IBM publicly announced much of the Cell architecture and PS3 performance targets several years ago. There were also the various papers and patent applications publicly available which gave many details. It's true that many of the details were not revealed, but overall performance targets, as well as the overall architecture, were well known. (For example, Sony was saying that 4 Cells == 1 Terraflop, from which one could determine 1 Cell == 256 GFlops. And that a Cell would be 1 CPU + a large number of DSPs. And that there was a high-speed RAMBus-designed interconnect between Cells, and that one Cell would be used as a CPU, while the other would be used as a GPU, and so on.)
Shifty Geezer said:
2 : Can we really say XeCPU has 3x the GP performance? Faf points out the SPE's aren't any slouches in this regard (though we don't know how cache/LS management impacts things), and MS's statement that they have 3x the performance is based on 1 PPE cores vs. 3 (though they aren't the same cores in all respects) and totally discounted the worth of the SPEs in GP. I think what we're hearing regards GP performance is rather nonsensical and unfounded FUD and shouldn't be taken as valid, unless someone can present some hard facts on the matter.
Since the SPEs don't have general-purpose access to main memory, I think their performance on general purpose code has to be discounted quite a bit. Wouldn't you have to implement some form of software cache? Wouldn't that make a main memory read access look like this:
int Peek(void* address)
{
TLBEntry e = TLB[TLBHash(address)];
if(e.base != address & PAGE_MASK)
{
// schedule DMA transfer here...possibly context switch while waiting.
}
return * (int*) (e.cache + (address & PAGE_OFFSET_MASK));
}
That seems like it would take at least 20 instructions, including several branches, even for the in-cache case. That seems slow enough that people wouldn't really want to use general-purpose algorithms on SPEs. Instead, devs will write custom SPE code, that reads and writes data in a more stream-oriented fashion.
Given that a single PPC core is not too wimpy, I predict many PS3 devs will just ignore the SPEs, and tune thier game to use the single PPC core and the GPU. That's effectively what Epic is doing with their Unreal Engine. (Of course, they're very diplomatic about ignoring the SPEs, saying "we're leaving the SPEs free for middleware to use".)