I don't care if Xenon CPU or PS3 PU(s) are a clean sheet design or not (whatever it means).
I think game developer cares about CPU that are fast in general purpose code, that are easy to program for and that expose 'custom' functionalities that can boot special computations.
Even if I don't know a single thing about Xenon CPU I'm sure IBM did a great job. They have the know-how,
very smart people..and Microsoft founding
We already debated 10 times the memory latency problem and how SPUs don't seem to be designed to hide very big latencies (such as latencies that can appear while doing textures sampling).
Obviously STI guys are well aware of this kind of problems, neverthless they decided to NOT go toward the fine-grained multithreading route.
I believe they preferred to have an array of very powerful and flexible stream processors under the controlo of a PU, than an set of (bigger) multithreaded processors. For a given process the STI/CELL choice should give us more (theoretical) flops per die area.
This is from a Mr. Billy Dally's work:
"Stream Processors vs. GPUs"
DeanoC wrote a good summary and moreover added a good number of educated guesses mostly driven by good common sense, so even if we'll end to do texture sampling on the SPUs, in the main case textures stuff will be addresed by the GPU in its pixel shaders engines. SPUs will provide a very powerful base for vertex processing and just everything we can think of (yeah..MRM too
) as we know SPUs can adress external memory and own several mechanisms to allievate latency problems.
I'm quite excited to know the final Xenon and PS3 specs (that's a hint for someone!!
) but I'm even more excited to think how to exploit all that power in novel ways!
I can't really comment on this.
But think about worst cases, I can write a single piece of code that would require multiple overlays to complete a single iteration. or I could just split up the code in the address space to do basically the same thing.
What's the big deal? We can do the same on every pc ..or on the XBOX, just switch the vertex shader every couple of drawindexedprimitive()
Once you have code and data overlays supports doesn't mean you (as a good programmer) are going to act as local mem were infinite. In the end you would sort per overlay switch , but it woull stilll be much more simple for the programmer than doing manual code/data chunking (I hate doing that on the Vu0/Vu1), imho.
ciao,
Marco