scooby_dooby said:
As long as you realize that's an extremely optimistic outlook based only on handpicked benchmarks from IBM, and one-sided technical documents also from IBM.
Everyone is just guessing, and the combined sales pitch from IBM, and the EE-like hype from Sony have sold many people this amazing power.
Just remember it still has to be proven in the real world.
Not at all. It's not impossible to make predictions given various known quantities. Indeed that's the basis of scientific advancement. In this case my belief in Cell's potential efficiences isn't a case of taking IBM's word for it, but looking at the architecture, the ISSCC coverage, and various articles and considerations on how CPU's work.
Like if someone says to you they're building a car that'll go 180 MPH, maybe you won't believe it. But if they tell you they've got a 250 BHP engine will they'll sports modify, and a 600 kg tubular steel frame, you can see that their target is in the realm of possibility. Likewise when an architect designs a bridge and sets a weight limit of 5 tonnes, you don't need to put 5 tonnes on the bridge to be sure it won't collapse. A thorough understanding of material properties and forces means engineering limits can be known without having to test.
To get a CPU to do useful float maths, you need to get data into the logic circuits. That's the limiting factor of efficiency. If you have no local storage or cache, all that data has to be shipped in from RAM, which has a huge latency, and so the APU stalls often. If you can provide data from a 'nearer' memory space, you can keep the APU busier. If the data is extremely local and you have a pipeline that's prefetching the data, you can keep the APU running full tilt. That's what the design of the SPE is very good at. That's why it was developed the way it was. The efficiencies have also been shown in a few real-world applications, ranging from marginal to excellent across a few different applications which is what we'd expect seeing as no solution is perfect for all applications.
As for the concerns whether these potential efficiencies will find their way into games, I answer 'yes' as I know developers aren't incompetant and won't try to run code unsuited to the hardware (well, most anyway!). They'll use data structures and routines that work as best they can with the prefetch batch processing model where possible. As long as the tools aren't unfathomable and the hardware isn't complex to the point it's beyond their capacity to adjust to it, devs will make the most of it. Of course Cell won't be running at 200+ GFlops as no program is nowt but linear float maths. My point wasn't that peak float performance will be obtained, but that high relative efficiency will. Looking are the architecture, what reason have you to doubt that? If you compare Cell's capacity to feed it's floating point APUs versus say XeCPU, why do you think it hasn't an efficiency advantage? Or at least, where do you think the limits are that would prevent the theoretical efficiencies being achieved?
In the context of the thread, what advantages does Cell have pver XeCPU, it's memory access model has to be counted as one of the key strongpoints, fundamental to sustained float performance.