Now Hiroshige GOTO and Zenji NISHIKAWA are running respective series of articles about the Cell at PC Watch and ASCII24 after they visited ISSCC 2005 as you may find in the other thread, and in addition to the pretty pictures they contain words from the Cell project members. As it contains many new info, for the convenience of discussion I transcribe those words here as many as possible with the compressed speculation by respective writers.
In one of Nishikawa's articles about the Cell at the ISSCC 2005 (he is a GPU-oriented journalist usually writing about new technologies from nVIDIA and ATi and DirectX, while Goto often writes about CPU and occasionally about GPU and game consoles tech) he picks up the Cell-based GPU (VS) which is found in the patent.
At ISSCC 2005 he asked Cell project members about the Cell-based GPU. They told him that the Cell-based GPU was actually in development but they gave it up for the PS3 eventually. They didn't give away the reason why it's discontinued. The interesting thing is that the all Cell project members Nishikawa contacted told him that even they were surprised when they heard nVIDIA was chosen as the partner (When they actually knew the partnership is unknown, but Nishikawa assumes the very secretive 1-2 years Sony-nVIDIA partnership as nVIDIA suggested).
Though the Cell-based GPU was not adopted in the PS3, according to the Cell project members, a test program to render basic 3D graphics is actually running on the Cell processor presented in the ISSCC 2005, in the lab. So Nishikawa concludes the current configuration (1PPE+8SPE) already has the enough potential as a GPU and a Cell-only cost-effective system without a dedicated GPU is doable for applications like a car-navigation system.
In the latest article in the series, Nishikawa speculates the possible configuration of the PS3. He thinks it uses SPEs in the CPU as programmable vertex shaders and the nVIDIA GPU contains only pixel rendering pipelines and eDRAM. The memory configuration is UMA with eDRAM in the GPU as a cache, since to support VS 3.0+ SPEs as vertex shaders have to be able to access texture memory. If all 8 SPEs are used as vertex shaders, it can reach over 6-8 billion vertexs/sec on 3-4 Ghz (the overhead by the EIB of the Cell is not included) and is too much compared to the expected performance of the GPU part, so programmers can use some of them as pipelines for other goodies, such as tessellator, geometry shader or LOD processing.
Goto has a report about SPE with some comments from the Cell architects. In the SPE, the 128 128-bit physical registers are mapped as logical registers.
The instruction set of the SPE is totally different from the Power/PowerPC ISA of the PPE.
In the latest article, Goto discuss the DRM feature of the Cell.
In one of Nishikawa's articles about the Cell at the ISSCC 2005 (he is a GPU-oriented journalist usually writing about new technologies from nVIDIA and ATi and DirectX, while Goto often writes about CPU and occasionally about GPU and game consoles tech) he picks up the Cell-based GPU (VS) which is found in the patent.
At ISSCC 2005 he asked Cell project members about the Cell-based GPU. They told him that the Cell-based GPU was actually in development but they gave it up for the PS3 eventually. They didn't give away the reason why it's discontinued. The interesting thing is that the all Cell project members Nishikawa contacted told him that even they were surprised when they heard nVIDIA was chosen as the partner (When they actually knew the partnership is unknown, but Nishikawa assumes the very secretive 1-2 years Sony-nVIDIA partnership as nVIDIA suggested).
Though the Cell-based GPU was not adopted in the PS3, according to the Cell project members, a test program to render basic 3D graphics is actually running on the Cell processor presented in the ISSCC 2005, in the lab. So Nishikawa concludes the current configuration (1PPE+8SPE) already has the enough potential as a GPU and a Cell-only cost-effective system without a dedicated GPU is doable for applications like a car-navigation system.
In the latest article in the series, Nishikawa speculates the possible configuration of the PS3. He thinks it uses SPEs in the CPU as programmable vertex shaders and the nVIDIA GPU contains only pixel rendering pipelines and eDRAM. The memory configuration is UMA with eDRAM in the GPU as a cache, since to support VS 3.0+ SPEs as vertex shaders have to be able to access texture memory. If all 8 SPEs are used as vertex shaders, it can reach over 6-8 billion vertexs/sec on 3-4 Ghz (the overhead by the EIB of the Cell is not included) and is too much compared to the expected performance of the GPU part, so programmers can use some of them as pipelines for other goodies, such as tessellator, geometry shader or LOD processing.
Goto has a report about SPE with some comments from the Cell architects. In the SPE, the 128 128-bit physical registers are mapped as logical registers.
Why it has that many registers is because SPE can be assumed not to switch threads as often as the PPE that runs an OS.Masakazu SUZUOKI said:The 128 registers can be seen (usable) from a program too.
In the SPE which is a SIMD processor, it required a larger register set to make operations more parallel.Jim Kahle said:We've been experienced in registers. The set of 32 registers (of Power) has the advantage that it reduces saving (registers into memory) at a context switch and keeps (thread switching) in a small slice. But it's known that (software model on) the SPE continues execution till the end once started. So it has less context switchs therefore it can have a larger register set.
In the PS2 EE this kind of optimization was done by hand, but in the Cell it 's supposed to be compiler based.Jim Kahle said:To make a larger register set is also for loop unrolling. To fill pipelines without techniques such as register renaming, it requires loop unrolling. By the experiences on other processors, we decided about 100 registers are appropriate for loop unrolling.
Masakazu SUZUOKI said:The 128 registers set is a kind of brute-force approach. But the SPE is an in-order machine and runs at 4Ghz so the latency is not small. Hence compiler-based scheduling becomes very important. Loop unrolling is a good example. For advanced scheduling based on a compiler, we thought we needed 128 registers.
It means the architects assume compiler-based software scheduling for the Cell and are developing a powerful optimization compiler for the Cell, unlike the Emotion Engine.Jim Kahle said:In the environment with 128 registers, a compiler can do many optimizations. It'll trully push up the resource usage efficiency. The 32 registers set so far was the bar for optimization by a compiler.
The instruction set of the SPE is totally different from the Power/PowerPC ISA of the PPE.
Jim Kahle said:Though there's a certain base, it's a totally new ISA. We started from 32bit instructions and did RISC-type organization. But it became a new instruction set to address 128 registers. We also tried to make it as simple as possible. Integer instruction, single-precision floating point instruction, double-precision floating point instruction, load-store instrucion, branching instruction, all of them were formatted in the 32-bit instruction format.
In the latest article, Goto discuss the DRM feature of the Cell.
Masakazu SUZUOKI said:The SPE basically has a protection mechanism. It's called "isolated mode" and if an SPE gets in this mode Local Store is completely locked. It's protected from the outside of the SPE and all other elements in the Cell. Even the OS can't browse the content of the LS. Of course it's very dangerous, for example a program designed to run on a protected SPE may stop with a bug. In such case, the PPE sends a reset signal and the SPE clears the memory then dies. It may look tricky, but reliable protection of contents is very important to be supplied wonderful contents from contents providers.