JF_Aidan_Pryde
Regular
Tim said:Point 1: The workload has to fit the architecture to get anywhere optimal performance.
Right on.
Point 2: It is easier to make workloads that fit general purpose processors like the ones in the Xbox CPU than specialized processors like in Cell.
Point 3: As long as we know as little as we do about the real life workloads on the next gen consoles it is hard to make conclusions.
I'm not sure about point 2 and 3. As Jaws said, we do know the nature of the work. They will be games. Games are generally pretty FP intensive. And they exhibit much greater data level parallelism than your typical desktop application. As a result, they should map to SIMD architectures fairly well.
While it's difficult to map a game to a main thread and a bunch of SIMD streams, I don't think it's much better in the case of the Xenon. To get the maximum performance out of the Xenon, one has to map their game to six or more logical threads.
So given the maximum performance of both architectures is strongly contingent on how well they facilitate multi-threaded programming, I'd say Cell is better equipt due to the following:
- The PPE can handle all the job queue maintence and provide global sync. You'll need to devote a single core of the Xenon to equate it (they are probably identical anyway).
- Each SPE has its own local memory. The programmer has total control over what goes in there. A well organised program that overlaps loads and stores with execution (as facilitated by the Cell archiecture) can expect to come close to saturating the SPE's peak performance.
For Xenon on the other hand, even the most optimised program can't expect that its data is in the cache. As such, it can never saturate it's peak performance. Given that six threads are sharing the same 1MB cache, and 2 threads will be using the cache for OS related tasks, there's bound to be a lot of thrashing in the cache.
- Each SPE has its own DMA and MMU unit. In other words, each SPE has its own hardware to get the data it needs onto the CPU. The threads of the Xenon on the other hand are at the mercy of the L2 cache.
I like the Cell architecture. I like it not because it's Sony or it's going to be in the PS3; I just think it's got the right organisation to help multi-threaded SIMD applications.
The Xenon is a huge improvement over the XBOX 1 CPU. But it still relies on too many of the old tricks from the days of single threaded programming. I just don't think the CPU is going to come close to its peak performance when six threads are all hoping the data they need will be in that 1MB L2 cache.
The Cell architecture has really taken some thought. The designers realised you can't execute unless the data is there. And guessing (as with cache and speculation) just isn't good enough. That's why each SPE has its own memory and its own DMA and MMU hardware. It's designed to get the data onto the chip so it has a chance of comming close to its peak performance.