Every engineering design has weaknesses. Since Engineers have to make cost, size, power, performance, & time-to-market tradeoffs, you can never design something to be "perfect" or the "best". Often times weaknesses aren't fully apparent at the design & test stage, but become visible during actual customer usage.
With the benefit of 20/20 hindsight, we can reflect on the design weaknesses of the PS2 and see how Sony is addressing those weaknesses in the PS3. This is not a criticism of the PS2 architecture, which was state-of-the-art at the time of its design and delivery, but rather a look at how to learn from and build upon past designs. Note: All information is from publically available sources.
CELL vs. EE
------------------
FACT: EE core CPU was a MIPS 64-bit derivative clocked at 300MHz
FACT: CELL contains a Power 64-bit derivative clocked at 4+GHz
EE Weakness: The Core CPU in the EE was slower than state-of-the-art PC microprocessors of the time (Intel was shipping a 500MHz Pentium III in Feb. 1999 and demoed a 1GHz Pentium III on Feb. 23, 1999) and ended up being slower that either the XBOX core (733 MHz Celeron) or the GameCube (PowerPC 750 485MHz). Note the XBOX & GameCube CPU's did ship later in time than the PS2.
CELL Fix: The 4+GHz clock rate of the PPU is competitive with the fastest shipping PC microprocessors (Pentium 4 3.8GHz or Athlon 64 2.6GHz). (Note: While it is silly to compare microprocessors based solely on clock rate, Power/PowerPC vs. x86 comparisions have been extensively done for the past 5+ years based on actual application code, so relative comparisions can be made). The CELL PPU is a simplified in-order design, so it won't have quite the performance of a fully out-of-order design, but the balance between number of instructions in flight vs. clock rate seems to have been well managed. XBOX 360 and Revolution are expected to use Power/PowerPC based derivative microprocessors and are unlikely to be clocked faster than the CELL PPU.
FACT: EE core CPU has a 16K Instruction and 8K Data cache with a 16K scratchpad and zero L2 cache
FACT: CELL PPU has a 32K Instruction and 32K Data cache and 512K L2 cache
PS2 Weakness: The small size of the L1 caches on the EE and the lack of L2 cache made the MIPS based core actual performance much slower than it should have been.
EE Fix: CELL has a proper L2 cache, which at 512K size will be sufficient to extract good performance from the PPU core. CELL has an integrated memory controller which will allow for lower latency access to memory. AMD has used an integrated memory controller in Athlon for sometime, with has contributed to it's performance advantage vs Intel. The combination of a properly sized L2 and integrated memory controller will yield performance dividends.
FACT: EE had 2 vector units VU0 & VU1, but they were architecturally different (i.e., different resources and instruction sets).
FACT: CELL has 8 SIMD units (SPE's), all identical in resources and instruction sets
EE Weakness: Because VU0 & VU1 were programmed differently and implemented different instruction sets, creating code and reusing that code proved to be more difficult than expected.
CELL Fix: The SPE's in CELL all use a consistent programming model, any SPE code can execute on any of the 8 SPE's. The instruction set and large register file has been designed to be more easily programmed in a higher level language (e.g., C, C++). The SPE's can also be virtualized.
FACT: EE had very small cache/scratchpads, 8K total for VU0 and 32K for VU1
FACT: CELL uses 256K local store for each SPE
PS2 Weakness: The small size of the scratchpads for VU0 and VU1 made scheduling of data more difficult and made effective bus utilization harder than expected.
CELL Fix: CELL has greatly increased the size of the local store to avoid the problems of the VU's. The original CELL patents referenced 128K local store, which was upped to 256K during implementation phase.
FACT: EE had a relatively slow I/O bus to the graphics processor (GS). The EE to GS bus was 1.2 GB/s performance, both downstream or upstream.
FACT: CELL uses Rambus FlexIO technology to deliver 44.8GB/s downstream and and 32GB/s upstream from the CELL to the GPU. (Note: some of the FlexIO lanes may be used for other I/O, not GPU traffic, but the majority will be used for the GPU).
PS2 Weakness: The EE to GS bus (GIF) was relatively slow at 1.2GB/s. It was half the speed of the internal EE bus and much slower than the 3.2GB/s memory bandwidth. This greatly limited how much triangle and texture information you could upload to the GS every frame.
CELL Fix: The FlexIO bandwidth to the GPU will be greater than the bandwidth from XDR memory (25GB/s) ensuring the CPU to GPU bus does not become a bottleneck as it was in the PS2.
GPU vs. GS
------------------
FACT: Sony used the in-house developed GS for graphics processing. The GS contained an embedded DRAM frame buffer (4MB) and supported a fairly limited set of graphics processing features.
FACT: Sony has partnered with NVIDIA to deliver a GPU for the PS3.
GS Weakness: The GS had very high pixel fill rate, with a limited set of graphics functions/effects. Some effects could be done in software, but others were impractical due to performance considerations. GS did not offer vertex processing as that was expected to be done by the EE.
CELL fix: Sony has partnered with NVIDIA to deliver a GPU that has the same rich functionality (vertex/pixel shaders, anti-aliasing, scaling, etc) as today's latest PC GPU's. CELL will be able to focus on gameplay with the GPU doing the graphics processing.
PREDICTION: PS3 GPU will not use embedded DRAM. The PS3 will use external highspeed memory for best performance.
FACT: None of the highest performing graphics cards produced by NVIDIA or ATI during the past 5 years have used embedded DRAM.
COMMENTARY: Graphics cards for PC's can afford to pay for the highest performing GPU and memory combination. ATI and NVIDIA have looked at embedded DRAM multiple times over the past several years, but an embedded DRAM design has always yielded lower performance than the alternative external memory design. Embedded DRAM does offer two significant advantages: 1) cost and 2) power consumption. The power consumption advantage is why you see embedded DRAM in most mobile/portable designs (e.g., PSP, DS, PDA's, cell phones) The cost advantage is why you often see it consoles (PS2, Gamecube, XBOX 360). NVIDIA was able to use an external memory design (NV2A) in the XBOX to yield better graphics than PS2 and they will use the same technique in PS3.
PREDICTION: PS3 GPU will offer better performance than the XBOX 360 GPU. The difference will not be 2X or some other very large number, but rather 20-50% faster depending on the application.
COMMENTARY: Both ATI and NVIDIA have been reasonably close to each other on performance of their latest generation of PC graphics cards. ATI is faster on some benchmarks (e.g., HL2) and NVIDIA on others (e.g., Doom 3). The GPU's for the XBOX 360 and PS3 will be based on the designs of the upcoming PC GPU's. ATI, NVIDIA, Microsoft and Sony will all do their best to deny that the console GPU's are based on the same microarchitecture as the PC GPU's, but that will be the case. Just as the NV2A in the XBOX was a derivative of the NV20 and NV25 designs, so the XBOX 360 GPU will be a derivative of the R520/R600 and the PS3 GPU will be a derivative of the G70. In both cases, ATI and NVIDIA will make significant modifications to the designs to adopt them for consoles, but fundamentally they will be based on the respective PC GPU. The PS3 GPU will be a half-generation ahead of the XBOX 360 GPU timewise (i.e., it will ship later) and it will take advantage of the tremendous bandwidth of the CELL architecture (both Rambus memory and FlexIO).
SUMMARY: Sony has clearly learned from the design decisions it made for PS2 and has made appropriate improvements to the PS3 design. Again, no design is "perfect" or the "best" and the PS3 design has had to make it's share of tradeoffs (cost, size, power, performance, & time-to-market) as well. Over time, we will learn where the "weaknesses" and "bottlenecks" in the PS3 architecture are and learn what improvements Sony should make in PS4.
With the benefit of 20/20 hindsight, we can reflect on the design weaknesses of the PS2 and see how Sony is addressing those weaknesses in the PS3. This is not a criticism of the PS2 architecture, which was state-of-the-art at the time of its design and delivery, but rather a look at how to learn from and build upon past designs. Note: All information is from publically available sources.
CELL vs. EE
------------------
FACT: EE core CPU was a MIPS 64-bit derivative clocked at 300MHz
FACT: CELL contains a Power 64-bit derivative clocked at 4+GHz
EE Weakness: The Core CPU in the EE was slower than state-of-the-art PC microprocessors of the time (Intel was shipping a 500MHz Pentium III in Feb. 1999 and demoed a 1GHz Pentium III on Feb. 23, 1999) and ended up being slower that either the XBOX core (733 MHz Celeron) or the GameCube (PowerPC 750 485MHz). Note the XBOX & GameCube CPU's did ship later in time than the PS2.
CELL Fix: The 4+GHz clock rate of the PPU is competitive with the fastest shipping PC microprocessors (Pentium 4 3.8GHz or Athlon 64 2.6GHz). (Note: While it is silly to compare microprocessors based solely on clock rate, Power/PowerPC vs. x86 comparisions have been extensively done for the past 5+ years based on actual application code, so relative comparisions can be made). The CELL PPU is a simplified in-order design, so it won't have quite the performance of a fully out-of-order design, but the balance between number of instructions in flight vs. clock rate seems to have been well managed. XBOX 360 and Revolution are expected to use Power/PowerPC based derivative microprocessors and are unlikely to be clocked faster than the CELL PPU.
FACT: EE core CPU has a 16K Instruction and 8K Data cache with a 16K scratchpad and zero L2 cache
FACT: CELL PPU has a 32K Instruction and 32K Data cache and 512K L2 cache
PS2 Weakness: The small size of the L1 caches on the EE and the lack of L2 cache made the MIPS based core actual performance much slower than it should have been.
EE Fix: CELL has a proper L2 cache, which at 512K size will be sufficient to extract good performance from the PPU core. CELL has an integrated memory controller which will allow for lower latency access to memory. AMD has used an integrated memory controller in Athlon for sometime, with has contributed to it's performance advantage vs Intel. The combination of a properly sized L2 and integrated memory controller will yield performance dividends.
FACT: EE had 2 vector units VU0 & VU1, but they were architecturally different (i.e., different resources and instruction sets).
FACT: CELL has 8 SIMD units (SPE's), all identical in resources and instruction sets
EE Weakness: Because VU0 & VU1 were programmed differently and implemented different instruction sets, creating code and reusing that code proved to be more difficult than expected.
CELL Fix: The SPE's in CELL all use a consistent programming model, any SPE code can execute on any of the 8 SPE's. The instruction set and large register file has been designed to be more easily programmed in a higher level language (e.g., C, C++). The SPE's can also be virtualized.
FACT: EE had very small cache/scratchpads, 8K total for VU0 and 32K for VU1
FACT: CELL uses 256K local store for each SPE
PS2 Weakness: The small size of the scratchpads for VU0 and VU1 made scheduling of data more difficult and made effective bus utilization harder than expected.
CELL Fix: CELL has greatly increased the size of the local store to avoid the problems of the VU's. The original CELL patents referenced 128K local store, which was upped to 256K during implementation phase.
FACT: EE had a relatively slow I/O bus to the graphics processor (GS). The EE to GS bus was 1.2 GB/s performance, both downstream or upstream.
FACT: CELL uses Rambus FlexIO technology to deliver 44.8GB/s downstream and and 32GB/s upstream from the CELL to the GPU. (Note: some of the FlexIO lanes may be used for other I/O, not GPU traffic, but the majority will be used for the GPU).
PS2 Weakness: The EE to GS bus (GIF) was relatively slow at 1.2GB/s. It was half the speed of the internal EE bus and much slower than the 3.2GB/s memory bandwidth. This greatly limited how much triangle and texture information you could upload to the GS every frame.
CELL Fix: The FlexIO bandwidth to the GPU will be greater than the bandwidth from XDR memory (25GB/s) ensuring the CPU to GPU bus does not become a bottleneck as it was in the PS2.
GPU vs. GS
------------------
FACT: Sony used the in-house developed GS for graphics processing. The GS contained an embedded DRAM frame buffer (4MB) and supported a fairly limited set of graphics processing features.
FACT: Sony has partnered with NVIDIA to deliver a GPU for the PS3.
GS Weakness: The GS had very high pixel fill rate, with a limited set of graphics functions/effects. Some effects could be done in software, but others were impractical due to performance considerations. GS did not offer vertex processing as that was expected to be done by the EE.
CELL fix: Sony has partnered with NVIDIA to deliver a GPU that has the same rich functionality (vertex/pixel shaders, anti-aliasing, scaling, etc) as today's latest PC GPU's. CELL will be able to focus on gameplay with the GPU doing the graphics processing.
PREDICTION: PS3 GPU will not use embedded DRAM. The PS3 will use external highspeed memory for best performance.
FACT: None of the highest performing graphics cards produced by NVIDIA or ATI during the past 5 years have used embedded DRAM.
COMMENTARY: Graphics cards for PC's can afford to pay for the highest performing GPU and memory combination. ATI and NVIDIA have looked at embedded DRAM multiple times over the past several years, but an embedded DRAM design has always yielded lower performance than the alternative external memory design. Embedded DRAM does offer two significant advantages: 1) cost and 2) power consumption. The power consumption advantage is why you see embedded DRAM in most mobile/portable designs (e.g., PSP, DS, PDA's, cell phones) The cost advantage is why you often see it consoles (PS2, Gamecube, XBOX 360). NVIDIA was able to use an external memory design (NV2A) in the XBOX to yield better graphics than PS2 and they will use the same technique in PS3.
PREDICTION: PS3 GPU will offer better performance than the XBOX 360 GPU. The difference will not be 2X or some other very large number, but rather 20-50% faster depending on the application.
COMMENTARY: Both ATI and NVIDIA have been reasonably close to each other on performance of their latest generation of PC graphics cards. ATI is faster on some benchmarks (e.g., HL2) and NVIDIA on others (e.g., Doom 3). The GPU's for the XBOX 360 and PS3 will be based on the designs of the upcoming PC GPU's. ATI, NVIDIA, Microsoft and Sony will all do their best to deny that the console GPU's are based on the same microarchitecture as the PC GPU's, but that will be the case. Just as the NV2A in the XBOX was a derivative of the NV20 and NV25 designs, so the XBOX 360 GPU will be a derivative of the R520/R600 and the PS3 GPU will be a derivative of the G70. In both cases, ATI and NVIDIA will make significant modifications to the designs to adopt them for consoles, but fundamentally they will be based on the respective PC GPU. The PS3 GPU will be a half-generation ahead of the XBOX 360 GPU timewise (i.e., it will ship later) and it will take advantage of the tremendous bandwidth of the CELL architecture (both Rambus memory and FlexIO).
SUMMARY: Sony has clearly learned from the design decisions it made for PS2 and has made appropriate improvements to the PS3 design. Again, no design is "perfect" or the "best" and the PS3 design has had to make it's share of tradeoffs (cost, size, power, performance, & time-to-market) as well. Over time, we will learn where the "weaknesses" and "bottlenecks" in the PS3 architecture are and learn what improvements Sony should make in PS4.