Yeah, the differences in the design approaches are interesting.
Memory
-MS: The slides note the irregular load the backbuffer creates and how it sucks up a ton of bandwidth. Solution? Isolate it so it never bottlenecks and then use a simple to use UMA for the system.
-Sony: Large bandwidth needs require a lot of bandwidth resources. CELL needs low latency memory and RSX needs a lot of bandwidth. Solution? A NUMA with 2 256MB segments, effectively doubling the bandwidth of the 360's UMA. Yeah devs still need to deal with the irregular needs of the backbuffer and the extra work of dealing with the memory segmentation... but boy, ~50GB of bandwidth?! w00t!
Graphics
-MS: We want something to dovetail with our next gen OS to create a new graphics platform... so what features do we want? How about something more effecient and maximizes the hardware and tears down the barrier between VS and PS. Solution? Unified shaders. At ~257M transistors for logic it looks to be a pretty capable chip.
-Sony: Opps, our experimental VPU is not working out for some reason or another. Solution? nVidia, what is your fastest/feature rich chip? Ok, can you shrink that down to 90nm and up the frequency 25% or more? Oh, it has vertex shaders? Well, CELL can already do that but the more the marrier!
Same approaches come to mind in the CPUs, but a more complicated way. MS obviously wanted more FP performance from their CPU. The Cache Lock features clearly indicates they expect the CPUs to be generating 3D models. Add in the beefed up VMX units. But they also wanted to keep things simple. Symetric cores, shared cache, etc. And then there was the issue of the balance. MS has noted like 80% of game code is general purpose, but on the other hand your game engine may spend 80% of its execution time on FP type tasks. Sony obviously is going a totally different approach with a stream processor. Just a lot of power there. Obviously more to juggle with more cores and an asymetric design, but also a ton of potential. In some ways it is hard to compare the CPUs because they are so different... they are also different sizes. Without knowing die size, from a transistor perspective the DD2 CELL is 50% larger than the XeCPU (250M vs. 165M). Looking at the die it seems that the 3 cores and cache each take up about 25% of the space. Heat, power consumption, etc... all aside, to bring it up to a similar transistor count you could get 5 cores and 1MB of cache (~190GFLOPs) or 4 cores and 2MB of cache (150GFLOPs). Total BS numbers, but makes you go hmmm Same issue of more cores being more complicated AND a stream processor is an attempted solution to the problem of making a lot of cores effecient... but I do think people would look at it differently if the XeCPU had 4 cores and 2MB of cache. So in that regards I DO expect the CELL to be more powerful in general because it is larger... and as DeanoC noted, the CELL is gonna FLY with procedural tasks. It is really designed with that type of work in mind! Yet the Xenon is not chop liver either... very effecient, very streamlined.
The Xbox 360 really reminds me of a super GCN. Obviously the Xbox 360 hardware compares more favorably to this-gen PC hardware compared to the GCN, but the same philosophy of streamlining and maximizing potential all seem to be there. The fact MS did not splurge on the HDD, media ports, and so forth also reminds me of MS. Quick, some ring Redmond... did Yamauchi resign at Nintendo to secretly take control of MS's Game division?!
The PS3 sounds a lot like the PS2... but this time instead of the GS Sony hired NV and got a NV2a
They also decided to put a lot of RAM in with a lot of bandwidth to offset the limitations of no eDRAM. And we all heard how hard the PS2 was to program for, yet it did alright I guess
Guden said:
('only' 8 pixel pipes in GPU, in-order CPU cores with just 1M cache, 128-bit UMA memory etc)
Only nit pick... it does not have 8 pixel pipelines... really no pipelines at all. I think 3 shader arrays is more accurate... really no use comparing it to older architectures since it is really different and WONT be used in the same way. If it were in the PC space we could say it could be limited like a 8 pipe part at times, but overall it is pretty clear that never applies.
Xenos has 257M transistors and has no video decoding or output (scaler chip is external). The RSX (if like the G70) is 300M, but also has ~25M for the PureVideo. That puts it at ~275M for logic. To compares
RSX 275M
Xenos 257M => 7% difference
--------------
NV40 200M (222M with video)
R420 160M => 25% difference
R420 kept up with NV40 pretty good, yet the difference between Xenos and RSX is very marginal. ~20M transistors and 50MHz. And if we are to take ATI's claims at face value, Xenos is about 95% effecient compares to 50-70% effecient compared to their current gen offerings. Xenos is more adaptable to different games, different parts in a game, and even different stages in the rendering pipeline.
If there is a V8 anywhere in the Xbox 360 it is definately the GPU!