Jawed
Legend
dcforest said:Given Barry counts the PPE at 25.6 GFLOPS, it is likely that Xenon comes in at 77 GFLOPS, unless their is some special scheduling/execution magic in the Xenon VMX/FPU units that no one has publically talked about.
Dr. Nick said:The VMX units in the PPE in the Cell and the VMX units in the PPEs in the Xenon are not the same so his math can't be applied to the Xenon without some tweaking of the numbers.
Jaws said:No, I think you're getting mixed up. We are CLEARLY referring to the parent die and its shader ALUs. If you've followed any of the derivations, then you'd realise that were are using 48 ALUs from the shader array and not referring to the fixed function logic in the daughter die...
Entropy said:I'm not a games programmer, and thus I can't really say what one of those would find difficult. It would depend on their background and their personality, I guess.
But judging from my own experience, I'd guess that the PS3 takes a little more in the way of rearranging your gears if you come from a typical PC-like background, whereas the 360 will start giving you headaches as you try to exploit its three cores more fully.
Both consoles can be programmed straight on along PC patterns, you simply use the PPE on the PS3 or one of the cores on the 360, and then talk to the GPU as per normal. Both consoles allow themselves to be used like that, and it doesn't limit the GPU much from what I can see, the limitations are mostly on what you can achieve on the CPUs. So that would produce nice looking pixels on the screen for both consoles. If your game requires more in the way of CPU performance, for physics, game logic or graphical processing reasons, then but only then will you have to dig in deeper.
Looking at the designs from the outside implies that slightly different challenges will present themselves.
The 360 is architecturally very similar to, say, the XBox or an integrated graphics PC. It has advantages to both though, in substantially higher bandwidth CPU to GPU, and GPU to memory (as well as the bandwidth saving feature of the intelligent buffer memory on the GPU). It also has three cores, operating in a traditional symmetrical multiprocessing/ uniform memory architecture. The problem with this layout is contention for memory. The three cores have relatively small private L1 caches, they share the L2, and they share the main memory with the GPU. So while it presents a rather straightforward programming model, actually getting CPU to perform well is going to require dealing with three cores thrashing each others cache, and stepping on each others feet in trying to access the same memory pool as the worst memory hog of them all, the GPU. Additionally, the internal data traffic between the CPU and the GPU will also load the CPU memory path. As a programmer this situation is typically really nasty, because you don't really have much in the way of tools to control/synchronize the different threads and the GPU. (Lockable cache areas can help. A little bit.) These issues can basically only be alleviated by making the constrained resource really ample. But doing that with the memory path is expensive. So while the 360 is better off than a typical integrated chipset PC, you can still see that bandwidth and memory contention is going to be a significant problem, and a difficult to manage one at that. The very design principles that makes the transition to multiprocessing easy both from a programming and from a hardware point of view comes back to bite you, and make actually extracting high utilization rates from the additional resources difficult.
The PS3 requires you to take a step back from typical PC procedure, and take a broader look at what you want to achieve. (One reason is that you might want to utilize the CPU for some graphics related task, shifting bandwidth and processing capablilities around for optimum yield. I won't go there, as I'm not qualified to comment.) Not only do you want to partition your problem into blocks that can be farmed out to the SPEs, but you'd also want to adapt your in-thread algorithms to be partioned and distributed to the SPEs. The Cell processor offer additional flexibility in that the SPEs can also pass/pipe tasks between themselves, and basically you have a bunch of options there that to a PC programmer is new and thus both a bit difficult and hopefully exciting. What is really good about the PS3 compared to the 360 is the resources that has been dedicated to manage memory and communication. The SPEs have 256 KBytes of local memory, which they can access without any risk of having their data flushed or needing to cache snoop or any such. There are fast data paths within the chip to transfer data to and from the PPE/SPEs, and between them. The CPU also has its own dedicated path to memory and a completely separate very high bandwidth connection to the GPU, that in turn has its own dedicated path to graphics memory. And not only does the PS3 sidestep the nastiest contention issues by providing separate datapaths, these separate datapaths also provide higher bandwidth individually than the shared resources of the 360. For someone with a background in scientific computing like me, the data flow model of the PS3 looks much better. I can't speak for games programmers.
So which console offers the steeper learning curve? I'd say that depends on where on the curve you are. Not all games require cutting edge utilization, and at that point I'd say both should actually be fairly easy to deal with. If you want to squeezee more out of the respective consoles, the PS3 departs more significantly in its architecture and possibilities from a PC, and thus most programmers would need to study the architecture, their algorithms and the available tool carefully in order to build an application that is well suited to the console. In contrast, the vanilla SMP/UMA of the 360 is really simple conceptually and doesn't suffer much of a learning curve at all apart from managing a few threads. In that respect the 360 is much simpler. It's memory and communication limitations will range from being non-issues to presenting insurmountable problems depending on what you want to achieve, but wringing really good performance from the 360 will require you to try to balance the different processes that need to access the memory paths very very well, because you want to wring maximum utilization out of this limited resource. And that will definitely not be easy.
Steeper learning curve is in all probablility the PS3, but in no way, shape or form does that imply that the 360 will cause its programmers fewer gray hairs.
Again, all of the above from someone without a games developing background, but with practical experience of non-PC type architectures. YMMV.
Edge said:It seems some of the most well written and intelligent posts around here get no responses. Let me be the first to say, very nice post!
One of the more balanced views I have seen on the respective achitectures. Thank you for that.
It's unfortunate that it was in response to a fishing expedition by dukmahsik and may explain the lack of further comments.Edge said:It seems some of the most well written and intelligent posts around here get no responses. Let me be the first to say, very nice post!
One of the more balanced views I have seen on the respective achitectures. Thank you for that.
Edge said:It seems some of the most well written and intelligent posts around here get no responses. Let me be the first to say, very nice post!
One of the more balanced views I have seen on the respective achitectures. Thank you for that.
Entropy said:I'll explain what I mean, starting with what I percieve as the largest problem with the 360 design, the CPU-to-GPU interface. All memory traffic of the CPUs has to pass through, as well as all CPU-to-GPU internal communication. Additionally, some of the memory traffic will be congested due to the memory bus being busy processing GPU memory transfers. Add contention between the cores.
The theoretical throughput of the CPU-to-GPU channel is 10GB/s. Real throughput will be lower for many reasons among them latencies, both electrical and protocol. For the sake of argument, estimate that the effective bandwidth is halved to 5GB/s. Now, we've seen tons of FLOPS analysis on these boards. But while some may find this interesting in its own right, the numbers produced should be contrasted with the data paths available. The memory channel of the 360 CPU can, if we are generous, sustain roughly 5GB/s bidirectional or 2.5 GB/s in either direction assuming symmetric load in and out and negligeable CPU-to-GPU internal traffic, corresponding to roughly 0.5 billion single precision floating point numbers.
ROG27 said:That's what I'm saying Jaws...I believe the 240 number comes from MS including fixed function ops on the daughter die. The 216 number is purely derived from the 48 ALUs on the parent die.
I think they included those extra fixed ops in their number because programmable ops on a traditional gpu would be responsible for those types of post-rendering functions (anti-aliasing, blurring effects, etc.)
Inside the Smart 3D Memory is what is referred to as a 3D Logic Unit. This is literally 192 Floating Point Unit processors inside our 10MB of RAM.
Lysander said:
expletive said:Who dseigned the entire system layout of the 360? ATI, IBM, or 'other'?